4 Instructive Postmortems on Data Downtime and Loss
Mar 01, 2024
Data Security / Disaster Recovery
More than a decade ago, the concept of the 'blameless' postmortem changed how tech companies recognize failures at scale. John Allspaw, who coined the term during his tenure at Etsy, argued postmortems were all about controlling our natural reaction to an incident, which is to point fingers: "One option is to assume the single cause is incompetence and scream at engineers to make them 'pay attention!' or 'be more careful!' Another option is to take a hard look at how the accident actually happened, treat the engineers involved with respect, and learn from the event." What can we, in turn, learn from some of the most honest and blameless—and public—postmortems of the last few years? GitLab: 300GB of user data gone in seconds What happened : Back in 2017, GitLab experienced a painful 18-hour outage. That story, and GitLab's subsequent honesty and transparency, has significantly impacted how organizations handle data security today. The incident began when GitLab's secondary datab