According to Merriam-Webster, resilience is defined as “an ability to recover from or adjust easily to misfortune or change.”
Toggle thought this topic was a good follow-up to last week’s conversation about productivity. We are currently having to adjust expectations about what being productive is. We’ve had to adapt to new remote working situations quickly. Systems are being pushed to the limits as large numbers of people quickly moved to cloud-based solutions for meetings, social gatherings, and educating students. How well you adjust to these changes requires resiliency.
Questions we posed on resiliency:
- How do you define resilience?
- How do you build resiliency in your systems?
- How do you increase your own tolerance for disruption and failure?
- What value can we derive from critical events?
Highlight reel
Whole vs. parts
If you are looking for resilience, you have to look at the big picture. From a technology perspective, if you are striving for five-nines availability, you have to look not just at the technology but the people, the processes, and the organization as a whole.
Liquid error: internal
Sociotechnical models do just this and help when it comes to resiliency. Sociotechnical theory looks at the interrelationships between the social and technical aspects. Consider how people will use the software, who will be using it, who will be supporting it. This can help you build resilience as you adapt to the changing social aspects.
VisArch@ruthmalan@richburroughs @dparzych If we view our systems as sociotechnical systems, the boundary shifts — to include people, hence adaptive capacity.21:17 PM - 15 Apr 2020
When looking at the social aspects, remember that resources are finite. And people are not resources. People do not have an infinite ability to respond to and recover from failures. We use metrics to track the health of individual elements of our systems. We can also use metrics to track our own health and ability to respond to failures.
Liquid error: internal
Humans are resilient, systems are robust
One aspect of resilience is sustained adaptability. This is where humans come in. People make decisions about what to build, how to build it, and how and when to change them. Systems will not adapt without humans. It isn’t possible to separate the human from the tech.
Liquid error: internal
Liquid error: internal
Surprise!
I love the framing of incidents as surprises. It takes away some of the negative stigma of incidents being bad. If we frame incidents as surprise learning opportunities, it helps us figure out what the best response is.
Liquid error: internal
Mental models
The conversation about resilience and surprises seemed to naturally lead to a discussion of mental models. A mental model is an explanation of someone’s thought process of how something works. Mental models help us understand and interpret the relationships between things. When we encounter an obstacle, we may have to update our mental models. The solution that worked previously may not work the second time around. Our ability to continually update our mental models is part of our resiliency.
Liquid error: internal
Liquid error: internal
Summary
During #ToggleTalk, we touched on all four concepts for resilience as outlined by David Woods (see article below):
- Ability to rebound
- Robustness
- Extensibility
- Adaptability
We need to look at technology from a sociotechnical perspective for true resiliency.
Thanks to everybody that joined in this week’s discussion on resilience. See you next week on #ToggleTalk!
Want more?
There is an upcoming conference (next week on April 21st!) if you want to learn more about resilience engineering and the process for building systems that can withstand unexpected failures. You can register here: FailoverConf (free of charge!). We will be there, and so will one of our Developer Advocates, Heidi Waterhouse.
Or you can check out these recommended reads and talks:
Recommended Reads
Resilience is a Verb
Four concepts for resilience and the implications for the future of resilience engineering
Report from the SNAFUcatchers Workshop on Coping with Complexity
Above the line, below the line
Recommended Talks
OOPS! Learning from Surprise at Netflix
How did things go right? Learning from incidents
A Few Observations on the Marvelous Resilience of Bone & Resilience Engineering