Imagine you are responsible for a system: the system breaks, the user realizes it, informs you, you do not have a clue what is going on and the clock is ticking … Welcome to operations hell!
Two years ago we have been exactly there and not only once. Yes, it is as horrible as it sounds and nothing we would like ever to be in again. So we started our journey to escape from there. For sure, our system still breaks, but we learned and improved a lot. Sometimes, we can prevent that the user sees any impact at all or at least, have a shorter meantime to recover.
Join me on this exciting journey from being called to preventing outages and see what helped us to escape from there and to defeat all the demons we met on that way.