Member-only story
Roadmap to Chaos Engineering
Multi-Cloud is NOT the solution to the next AWS outage.
There are many things you can do first in terms of disaster recovery.
The latest AWS outage sadly enough encouraged me to keep digging deeper into the subject of disaster recovery. In my Roadmap to chaos engineering, I was planning to dedicate only one article to the matter and move on to less critical scenarios and discover some great tools like Gremlin or Chaos Monkey.
So thanks Amazon lets disclose strategies to not be affected by the next outage. In my previous article, I went through the steps required to prepare yourself and handle a crisis situation. But I did not really dig into what strategy you can actually use to achieve your goal. I focused on what to do and not so much on How to do it.
While you may be thinking about going Multi-Cloud or at least read over the internet about it being the “ultimate solution”. I’d like to argue that it is not the way forward especially if you are only getting started with recovery strategies.
This time let's talk solution, let's talk architecture, and recovery plan. Here is the previous article for reference:
The reliability problem caused by an outage can be solved following one of 3 strategies. While these strategies can be mastered independently, I think it is best to see them as a logical progression.

Active-Recovery (Backup and Restore)
In this mode, you recreate all the resources in the new regions after the disaster occurs. The prerequisite…