Risk identification, the acquisition process, and system resiliency

“We want to expand people’s view about what technology is; it includes people and process and is part of a sociotechnical system,” Mineweaser said. “The canonical example of a program in the military has contractors handing systems off to maintainers and moving on to build the next system. But we need to have a resiliency mindset, which calls for better strategies for transferring technology, knowledge, and skills to the operators…

 

“Part of this mindset includes rethinking long development timelines, though ingrained they may be into how government programs are funded. “Apple, for example, doesn’t announce the 2040 iPhone in 2020, whereas the Department of Defense might get funding from Congress for things planned over the next one to two decades,” Mineweaser added.

That was from a write-up of the MIT Lincoln Labs workshop on building resilient systems from March 2019. The excellent Jeremy Mineweaser recognizes how social constructs like budgeting and the acquisition system profoundly affect resiliency. I was pleased to have been invited to speak as a panelist. Here’s another good part:

Another cultural shift the team sees as vital to ensuring resiliency is leveraging “chaos engineering.” The idea is to break a system in its operational environment to understand how the system responds to disruptions, to learn from these disruptions, and to work out how to fix it quickly.

 

Netflix, for example, developed a tool called the Chaos Monkey that the developers liken to “unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables” and is run while Netflix operates its services. This way, engineers can identify, manage, and learn from weaknesses in their systems and then build automated recovery mechanisms to deal with them if they happen again when no one is watching.

 

Eric Lofgren, a fellow with the Mercatus Center at George Mason University and a panelist at the resiliency workshop, agreed that exposing systems and networks to threats “is the best way to identify, contain, and overcome downside risk while still allowing the system to move towards new structures using the principles of combinatorial innovation.”

You can get the jist of what I was saying from two of my blog posts: Resilience, the other side of risk and Improving resilience through falsification.

Be the first to comment

Leave a Reply