I never realized the detail that the Google SRE book provided on the production environment at Google, including such things as D, Colossus, Jupiter, BNS, and GSLB.

Also there’s now The Site Reliability Workbook, a hands-on implementation guide.