Member-only story

Roadmap to Chaos Engineering

Multi-Cloud is NOT the solution to the next AWS outage.

Alexandre Couëdelo

Published in

FAUN — Developer Community 🐾

5 min readJan 3, 2022

There are many things you can do first in terms of disaster recovery.

The latest AWS outage sadly enough encouraged me to keep digging deeper into the subject of disaster recovery. In my Roadmap to chaos engineering, I was planning to dedicate only one article to the matter and move on to less critical scenarios and discover some great tools like Gremlin or Chaos Monkey.

An Amazon server outage is causing problems for Alexa, Ring, Disney Plus, and others

Problems for some of Amazon's AWS cloud servers are causing slow loading or failures for significant chunks of the…

www.theverge.com

So thanks Amazon lets disclose strategies to not be affected by the next outage. In my previous article, I went through the steps required to prepare yourself and handle a crisis situation. But I did not really dig into what strategy you can actually use to achieve your goal. I focused on what to do and not so much on How to do it.

While you may be thinking about going Multi-Cloud or at least read over the internet about it being the “ultimate solution”. I’d like to argue that it is not the way forward especially if you are only getting started with recovery strategies.

This time let's talk solution, let's talk architecture, and recovery plan. Here is the previous article for reference:

Road Map to Chaos Engineering: Preparing for Major Disruption

Major disruptions that impact everyday business operations do happen. To convince yourself you can have a look at the…

faun.pub

The reliability problem caused by an outage can be solved following one of 3 strategies. While these strategies can be mastered independently, I think it is best to see them as a logical progression.

Active-Recovery (Backup and Restore)

In this mode, you recreate all the resources in the new regions after the disaster occurs. The prerequisite…