Cover image by MichaelKirsh
For those of you not keeping track, a single AWS region, US-East-1, suffered a major outage last week. Many of the teams affected were on holiday for Thanksgiving within the US, leaving many red-rimmed eyes by the time the issues resolved.
While there have been some critiques that absolutely hit the mark correctly, such as this from Forrest:
It's fair to say that a number of critics are 'punching above their weight' and don't really understand the complexities involved. While I certainly don't understand the ins and outs of this failure I think the follow things are pretty clear:
- a single failure, no matter how severe, does not mean you shouldn't be using a particular cloud
- there is no simple way to completely uncouple services, so there will always be issues of cross dependency
- you probably shouldn't host your status dashboard on your own services (does this mean the AWS status page should be hosted on Azure? yes.)