Podcast: Mission resilience with Trey Herr and Simon Handler

Trey Herr and Simon Handler from the Atlantic Council’s Cyber Statecraft Initiative joined me on the Acquisition Talk podcast to discuss how the Department of Defense can improve the mission resilience of its systems. The three pillars of resilience are robustness, responsiveness, and adaptability. In that description, resilience is more than about responding to adversity, but capitalizing on opportunity. Oversight agencies should take note that that adherence to plan is nowhere in that definition. During the episode, we discuss:

How mission resilience metrics differ from CMMC
The costs of excessive classification to security
How Netflix uses the chaos monkey to find failure modes
Comparing CIA’s Corona satellite development to that of F-35 ALIS
How the BattleLab idea can increase recombinatorial innovation

During the episode, we dive into a recent paper Trey and Simon wrote in conjunction with folks from MIT Lincoln Labs and Boston Cybernetics called “How do you fix a flying computer? Seeking resilience in software-intensive mission systems.” They recommend a new Center of Excellence for Mission Resilience in the DoD. The purpose would not be to duplicate cybersecurity initiatives, but rather to create metrics which can be put on contract to better verify that firms are using modern development processes like DevSecOps.

In order to have adequate status, such a Center of Excellence require a Senate-confirmed position, a dedicated budget account, and quick access to the DepSecDef. But ultimately, it shouldn’t be a Top Secret project creating DoD-unique rules and processes. Instead, the Center should adopt the thought leadership from the commercial and academic sectors as to what makes organizations resilient.

Metrics for Resilience

As a continuous process for deploying software to production, DevSecOps allows organizations to adapt far more quickly than traditional deployments measured on the other of months. While successful implementation of DevSecOps improves system resilience, it has often been used as a branding term in proposals where the contractors have not undertaken the significant change in organizational structure required. Here’s Trey:

And we’ve absolutely heard complaints from some of our partners that they see a lot of discussion and a lot of phrasing and a lot of framing. And then things are delivered to them in a quarterly waterfall and nobody ever talks to the user.

While there are a number of good metrics out there, many of them depend on the organization or project. In order to be useful for contract requirements, the Center of Excellence for Mission Resilience would have to do some refining:

One measure of CICD [continuous integration, continuous delivery] adherence is the number of commits you do on a code base in a day. That’s interesting, but it’s pretty raw. And when we talk about different kinds of programs, different levels of sophistication, that may not be a great measure.

And so part of what we’re looking for that that center of excellence to do is to actually define ways to measure some of these constructs and that’s leveraging work, being done in academia and in industry and, the FFRDC and national lab community.

My general view is that in order for a metric to be useful, the analyst has to have a decent understanding in which it was generated, including the team, tools, and project. Perhaps there are rules of thumb for making the translations, but I’m doubtful they can be reported out and rank-ordered by the business manager from afar. I’ll be interested to see what comes of it and whether we start seeing anything on translation.

Chaos Engineering

The way DoD usually tests weapon systems is by first defining a Test and Evaluation Master Plan that outlines the criteria by which the system will be tested over the lifecycle. The role of test and evaluation is then to see how well the developmental test articles met the program’s pre-planned list of objectives before fielding.

Netflix, however, takes a somewhat different approach where it subjects its production system to a variety of stresses at the extremes of what can be expected to occur. As Trey explained:

It give them the opportunity to see that system operating under unique failure modes and unusual conditions as a way of learning about not only your own organization, but actually the system that you’re trying to maintain.

And here’s Simon with some evidence of the results:

There was a big AWS outage at one of their data centers in a few years ago. I think it was in 2015. And Netflix didn’t experience much if any service related interruptions because they had gone through this chaos engineering process that they took those lessons to to overcome that outage.

Listen to the whole episode for more.

Development Styles

Here’s a brief way of describing the failed development of the F-35’s logistics information system ALIS from what characterizes successful projects:

The intensity with which that development effort has pursued a specific defined outcome, as opposed to the kind of rapid experimentation and messy attempts to satisfy a user community that we profile in the report.

Trey continues:

When we’re talking about, it’s not that there’s no outcome, right? We’re not asking DOD to start spending hundreds of billions of dollars on a journey without a, a destination. It’s it would be impractical at best. I think what we’re really doing is to echo a couple of voices… saying the way that we’re acquiring systems right now, where all that I care about is the satisfaction of a requirements sheet, irrespective of costs, irrespective of usability, irrespective of security and flexibility down the line, is not a good model. It’s a somewhat hyperbolic way of describing the current acquisitions process. But unfortunately it’s not as far off as it should be with a lot of the systems that we’re talking about

Thanks Trey and Simon!

I’d like to thank Trey Herr and Simon Handler for coming on the podcast. Be sure to read their paper, “How do you fix a flying computer?” and find out more about their Cyber Statecraft Initiative. Here’s Trey on the Supply Chain Security show and at USENIX Enigma 2021. Here’s an article from Simon on Questioning basic assumptions in the cyber domain and watch him on C-Span discussing the future of the NATO alliance. Follow Cyber Statecraft and Simon on Twitter @CyberStatecraft and @SimonPHandler.

Full-Text Transcripts

Eric Lofgren: [00:00:00] I’m pleased to be speaking with Dr. Trey herr and Simon handler. They are the director and assistant director, respectively of the Atlantic council’s cyber statecraft initiative, along with some folks at MIT Lincoln labs and Boston cybernetics. They recently released a paper called how do you fix a flying computer seeking resilience in software intensive mission systems.

And we’re here to talk about that today. So Trey Simon, thanks for joining me on acquisition

Trey Herr: [00:00:24] talk. Thanks for having us.

Eric Lofgren: [00:00:25] So I wanted to introduce mission resilience, by way of the example that you guys did. So your paper started off with a discussion of the F 35 ALIS system, and then it ended with a discussion of the CIA’s Corona satellite program.

So can you just, give us, what are the lessons for mission resilience that you were really getting out of each of those cases?

Trey Herr: [00:00:44] that’s a great question for us. I think it was interesting to see how to start a story like this. It’s such a, it’s a big meaty topic and we’re talking about people process and technology.

What are the challenges for us is not only getting into what is a massive leak, complex environment with just the hardware and the software and the actual systems in place, but also really trying to weave that together with incentives and this idea of people making policy and organizational behavior.

What we wanted to do was to start with something, to frame the narrative. And as we. Looked at the landscape for mission systems and complex mission systems at that, where there’s a strong sort of technology component. The F 35 is a big part of that discussion right now. It’s difficult to really survey the defense landscape and not see that as one of the major parts of that environment, not just for the U S but I think importantly for us at the Atlantic council also for allies.

And one of the things that jumps out of this multi-generational acquisitions. Process, I’ll leave others to say boondoggle. Is the Alis system. It is this sort of software driven intelligence behind, or at least model for intelligence behind the operating model that the embraces.

So thinking about resilience the process of developing Alice for us exposes two things that became really important as we went through this first. The intensity with which that development effort has pursued a specific defined outcome, as opposed to the kind of rapid experimentation and messy attempts to satisfy a user community that we profile in the report.

For us was emblematic of an older model of thinking in this sort of more brittle model of thinking in defense acquisitions for technology. And so it was a good show piece to really say, Hey, what is it about this? And I should say, our verdict is not that it’s not working well, it’s the us government’s verdict right.

In entirely replacing the system and going to a new one with odin. And so I think that was a significant virtue for us. It’s not to cast judgment on it, but really just to support what we’d seen from DOD in the policy community. So the first was. Really pushing for a defined set of requirements versus trying to experiment to what the user needed and what the user wanted and being tightly joined with the user community.

But the second is, I think, as we, as we develop in the paper, these principles of resilience, excuse me, a mission of resilience and the practices associated with them. There’s a really significant emphasis. Over and over again on learning and on understanding, not just what works, but what doesn’t and feeding that really deriving a feedback loop for that back into your development process.

And for us, I think Alis was. Because it, because there appear to be some really significant parallels between Alison Odin, that there was the same vendor that it was working under the same kind of contract that there was this again, he was repetitive model. That was something we wanted to call out because it’s one of the real significant dangers that’s been observed.

In defense acquisitions for technology is the failure to learn the lessons of what does not work is incredibly costly. To make those same mistakes over and over again is the kind of unnecessary, but crippling costs, that no entity can afford regardless of how large our defense budget is.

And certainly not the United States in a time where there’s going to be significant downward pressure on that budget. So those two pieces, first was his willingness to tolerate failure and experiment. But the second was, that would seem like a limit on the learning process and a failure to learn effectively.

Alis kind of defined some of the, don’t be like this model for us, for resilience, I’ll let Simon talk about Corona, because I think it’s, in some ways, despite being much older in a much more significant technical challenge to really make an effective photo reconnaissance platform, it’s, it exhibits a lot more of these core mission resilience principle.

Yeah,

Simon Handler: [00:04:06] sure. So after we talked about the F 35, which like Trey said is incredibly dependent on software. We then spent the bulk of the paper outlining four principles of resilience, which we identified as embracing failure, improve your speed, always be learning and manage complexities.

And trade-offs. With these principles, we outlined some recommendations. And then put forth the Corona example at the end as the exemplar of resilience. So like Trey said project Corona was a joint CIA and air force project. Which launched the first spy satellites during the cold war.

And while it was a much different type of technology we saw it as a project that really encapsulated our four principles of resilience. it was plagued by a ton of problems. And every time they. Tried to solve old problems new problems with surface in the new solutions.

But this constant embrace of learning and and failure was really critical to growing the program building that adaptive capacity that we emphasize so much through the paper. and though it had its problems the commitment to continuous. Improvement made the program in overall success and had some great results for the U S strategically in the cold war.

Eric Lofgren: [00:05:35] Yeah. And when you’re describing the Corona, it makes me think about, that program actually started before the modern acquisition system. So it got started in the late fifties. So it already got off on a foot. That was different than a paradigm that you would see today. And I think it more closely related to what you guys had three pillars of mission resilience, robustness, responsiveness, adaptability.

And I think what you guys were just saying, there are a lot of it revolves around that, but what it doesn’t revolve around is a defined outcome. And I think it really comes back to this, static versus dynamic view of systems and what we have today. The model that the F-35 grew up under, but the Corona did not was that you’re supposed to define your, all of your outcomes right up front.

And then the measure of success is, how’d you execute compared to that plan and that’s nowhere in your definition of mission resilience. So did you guys want to just comment on that?

Trey Herr: [00:06:31] Yeah. You hit the nail on the head, Eric. When we’re talking about, it’s not that there’s no outcome, right?

We’re not asking DOD to start spending hundreds of billions of dollars on a journey without a, a destination. It’s it would be impractical at best. I think what we’re really doing is to echo. A couple of voices former assistant secretary, Dr. Will Roper is one that jumps out. Ellen Lord has also made some serious efforts in this space trying to echo those voices and saying the way that we’re acquiring systems right now, where all that I care about is the satisfaction of a requirements sheet, irrespective of costs, irrespective of usability, irrespective of security and flexibility down the line.

Is not a good model and it’s a high, it’s a somewhat hyperbolic way of describing the current acquisitions process. But unfortunately it’s not as far off as it should be with a lot of the systems that we’re talking about. And so I think for this. Having a defined end state is important, but being willing to update that, and most importantly, embracing users as a part of the design process, so that what’s driving, this is how it’s going to be employed.

Not what a committee at some point 10 years ago decided was the right combination of specifications and requirements is what makes this both harder, but in the longterm, much more useful. And so I think the, a takeaway for us and one of the, the way that we frame the recommendations is.

There is a, at the moment, a tendency towards risk avoidance and risk minimization, and a lot of the acquisitions process that is there over years of experience and layered oversight and risk mitigation and the product, unfortunately, is something that really, isn’t very satisfactory from the standpoint of, the technology that’s produced for the people that have to use it or the process that it creates.

So again, I think these changes are to your point. Trying to form something that is trying to produce technologies and mission systems that are more useful, more flexible over time and engage with the needs that requirements set or that end state is trying to realize not ignores it.

Eric Lofgren: [00:08:19] Yeah. Sometimes I feel the department of defense is all about minimizing risk rather than resolving risks. And I think there’s just two very different aspects to that because when you resolve risk, you actually have to go out, speculate into the unknown and then get back into that trial and error.

And then there’s that balance of, as you were saying, I can’t just start out with hundreds of millions or billions of dollars for this program with no end state in mind. So where’s that proper mix in? How do we get that? When you guys talk about mission resilience, I want you to kind of break this down.

Cause you you know, how much of that is really about technology implementation and how much of that is really organizational or cultural?

Simon Handler: [00:08:54] Yeah, sure. So this was a big point of emphasis in the report. We think that it’s important to understand a system as more than just the technology it’s built on three elements, which are people, the organizational processes and the technology.

And we really stressed the people in the organizational processes in this especially the people because they set the missions design the technologies and they can improve the organizational processes. Think all three are really critical. But we addressed all of them as equals in this report.

Trey Herr: [00:09:29] And I’m just, I ended that quickly. I think that the reason an organization ends up rising above. This is really just to say that the devices that you produce to the systems that you’re engineering they’re important, but they are shaped by the people and they’re shaped by the incentives around them and for somebody, for an organization, the size of DOD, there’s a lot of policymakers making good faith efforts to try and address these sorts of systemic deficiencies and the acquisitions pipeline for this sort of technology intensive mission systems.

That do not have a realistic hope of actually touching the hardware and the software itself. That’s okay. There’s no organization that big is going to be able to have everyone playing at every level of abstraction at the same time, but they have a huge role to play in shaping the kind of risk tolerance that these program managers are.

For example, allow to embrace in providing the kind of financial pathway. Lifeline in some in some cases, and just expectation management for offices that are trying to work with new technologies or work with different vendors and contractors that allows them to take risks or embrace some of these cultural changes that we’re talking about.

So I think there’s, it’s not, the technology is not important. It’s a big part of the conversation, but there’s a lot of people working in this space that have to be part of these sorts of changes that are not going to be, tinkering and hammering on things themselves.

Eric Lofgren: [00:10:39] I often hear like from the commercial sector, they often talk about, people in teams and then, if you get those people in organizations, then the technology follows from that. And it seems to be some of the principle that you guys were also talking about in mission resilience, as opposed to, in the department of defense, it’s you set up the program and then the organization, you get like a program office, you bring the people together and then they go.

Execute that. And I think that potentially, when you stove, pipe it that way, then that creates a different set of incentives. Rather than if you have an organization and kind of define how do they interact with other organizations, what are those teams and responsibilities and, guess your anticipations of what they’ll be doing

Trey Herr: [00:11:21] the expectation or the expert. I think again, I think you have it right. The expectation shaping is a big piece of this. Are you empowered to take a risk of you’re empowered to change. You’re empowered to try something new, or are you being given the very clear message that avoiding failure at all costs is your purpose.

And we think about the rotational dynamics of some of these offices not screwing up for the 18 months, you’re on the job that you own, the contract that you own, the project, creates a really averse atmosphere to this kind of change that we’re talking about. So I think you’re right. And again, it works at multiple levels, right?

It’s almost like that command culture question.

Eric Lofgren: [00:11:55] Yeah. One of your report was also that I guess when you have people that are making these decisions and they can get into, they have more discretion, they can get into their own agile kind of feedback loops and people often talk about, Oh, it’s all about speed.

Like software is about speed, delivering, learning, improving and not just getting like the one big bang thing, but many folks, I think like the traditional way of looking at it, they’ll be like okay. I hear about you speeding up the technology process, but really, I’ve seen this before people have said let’s do it faster.

And it usually means cutting corners, taking risks in either costs or performance. So when you guys talk about speed and how that kind of balances out with the mission resilience, are you breaking the iron triangle or is there something else going on?

Simon Handler: [00:12:41] So I think you’ll note in our report, we titled the principle, improve your speed.

It’s not just go faster. So we’re not suggesting cutting corners, but more cutting out the bottlenecks. We think there are a number of ways to do this including measuring and analyzing everything to ID those points of inefficiency. And one of the points of emphasis here that we made is prioritizing the continuous delivery and feedback and learning.

We gave the example of Kessel run. Which has that objective to deliver software anytime, anywhere. And they speed up the process of of developing and delivering their software to a point, I think it was 24 times faster than the rest of the air force was able to do it. So the faster the organization can patch and fix the bugs and detect.

Vulnerabilities are really important to resilience and then other ways of improving the speed without cutting corners include automation and cutting out those just really labor, intensive and repetitive tasks so that you can free up important personnel for other tasks.

Eric Lofgren: [00:13:59] we’re revolving a little bit around I guess some of what I would almost call like project Calvinism, because it seems like, we often think, and I’ll call it project Calvinism, because I think a lot of people often have the view of it being, there’s some kind of predestined view of what the cost, the schedule and the performance is, and we just don’t have a perfect view into what that is, but it’s out there. And I think, when you try to predict that you actually. Are creating some of the vulnerabilities in your mission because you can’t predict everything perfectly. So an error in the plan becomes a vulnerability, and then it’s really hard to close that loop because you don’t have this predefined feedbacks.

So as opposed to just like stumbling through, but then closing these things rapidly when you find them. Do you believe in projects, Calvinism? Cause if you can predict it, then that makes a whole ton of sense.

You can’t then you’re really like you’re building in all of these vulnerabilities that will be exploited by an adversary.

Trey Herr: [00:14:55] It’s interesting. And I think it, one example recommendation that we made in here was embraced the chaos monkey. The cast engineering as a discipline invented breasts invented really captured by Netflix as a way of road testing production systems and subjecting them to a variety of stresses that they could see it at the outer likely bands of an operational environment.

As a way, not just to try and understand how the systems would perform, but also how they would fail and in how far of a degraded state could they function over time. And so it was a very intentional effort to break and not in a test environment. So to play with live ammunition. And that I think is a little bit of a way of addressing a question, which is.

What we’re talking about in some ways is not to try and ignore the natural human instinct and the substantially natural organizational instinct to try and know the unknown and to manage and minimize risk. What we are suggesting is that a lot of the existing mechanisms. Perceive or assume a knowability to a lot of this uncertainty, which is false.

And we could take various theological positions on that. But I think really realistically, we just can’t know a lot of the things that we’d like to, especially about a 10 or 15 year pipeline for high technology systems development. Given the pay. We think look at 10 years ago, the state of the iPhone, the state of, mainframe computing, the state of cloud computing, it’s it pales in comparison to what we have today.

And I think is on the edge of any forecast for foresight experts, likely chart of what’s coming next. So really what this is to try to build systems that in some ways are more humble. About their own shortcomings and about our own very human shortcomings and the ability to project forward, what’s happening with them.

And really just to say, enable this connective tissue in these organizations to function more effectively. So back to that notion of chaos engineering, expose to your own organization problems that they are ready and willing to manage. That they are capable in some ways, advantaging. If they have the ability to see them coming down the road don’t ask them to put on a magic hat and predict the future.

Give them the opportunity to see that system operating under unique failure modes and unusual conditions as a way of learning about not only your own organization, but actually the system that you’re trying to maintain. So I think a lot of it is it’s interesting. We could do a, maybe do a philosophical version of this paper, but a lot of this is acknowledging our lack of control and the unknown in the universe around us, and really just trying to sharpen our own tools to address that.

If, and when we’re confronted with it in the real world, rather than trying to engineer it or spreadsheet it out of existence.

Eric Lofgren: [00:17:20] Yeah. I love that you guys had a great little bit in there about chaos engineering and the chaos monkey that Netflix uses and potentially for the DOD, it needs a little rebranding because that might be scary for people maybe like.

Mission resilience testing or something boring. So that they’ll do it. I guess when I think about like the testing does that just involve, it seems like a completely new paradigm for testing systems because what we have in the department fence with the test and evaluation master plan, you’re basically supposed to say before I develop it, how I’m going to test this thing.

And then of course, like, All the incentives are to make sure the thing test that way, but then you’re not looking at all the other ways that the thing could have been tested. So it almost seems then there’s also this movement to like automated testing of code and that’s great, but it also just seems like a more automated way of the old style, as opposed to this chaos engineering with really requires some kind of creativity almost in terms of, poking at something in different ways and seeing where it stands up or doesn’t stand up.

Trey Herr: [00:18:17] I think it’s twofold. The first is systems and Toto, right? So this decomposed version of testing, you mentioned dynamic analysis for code static analysis for code, right? It’s one thing to take a product and evaluate it on the bench. When I can segment it to its various parts and control their inputs and outputs in the functionality.

Relatively cleanly. Part of the philosophy of chaos engineering is to test on production systems for two reasons. First. If you are not willing to subject it to a test, why are you willing to put it out there into the world and have it function with real users? But second, a lot of the a lot of the challenge that I think the notion of mission resilience is trying to address is not the failure of isolated components.

I, if I received FIPs certification for a well-known cryptographic algorithm, I can have some confidence that the design of that algorithm matches what is expected to be best practice in cryptographic design. My issue is if that algorithm is placed into a sequence of modules where w the data that it’s passed is not secured well, and whatever is called from the result is handled in an improper fashion.

And so the attacker is able to take advantage of the system rather than just the magic collection of X as excuse me, as boxes. And so for this, it’s the. The edges between modules. It’s the complexity that emerges from a large and multifaceted system that we’re trying to surface and address. And again, when we say adaptive and avoiding brittleness, right?

It’s about making the connectivity between those systems greater such. They can tolerate something bad happening or something unexpected happening between them, but really it’s the edges between those links that I think in some ways are most important. And we’re trying to expose with that.

Eric Lofgren: [00:19:52] Simon, did you have anything to chime in

Simon Handler: [00:19:54] for pretty much chip most of it, but we’ve seen this work well for Netflix and the private sector. There was a big AWS outage at one of their data centers in a few years ago. I think it was in 2015. And Netflix didn’t experience Much if any service related interruptions, because they had gone through this chaos engineering process that.

And they took those lessons to to overcome that outage. Think by adapting the practice the DOD could start to learn more about their systems. Brace for those disruptions and then become a more resilient as a result.

Eric Lofgren: [00:20:35] So I want to move on to one of your next recommendations, which was actually something that I never thought about, but it was pretty interesting.

You guys said that, of course the government’s very serious about how it does classification and it thinks that by classifying in the way that it does, it’s actually securing the information. But what you guys seem to be arguing is that when you create all these rules around the classification, it actually drives cost and complexity into information systems and actually compounds some of the vulnerabilities rather than closing them.

So how do you see classification in the 21st century, or would you push back on that characterization? Yeah.

Simon Handler: [00:21:12] We think that classification and confidentiality is really important when we’re dealing with advanced technologies and national security. But overall the DOD can do a far better job of planning for the loss of its secrets.

It needs to therefore limit its secrets where possible. Because determined adversaries are going to be able to overcome defenses eventually. It needs to DOD needs to accept that plan for losing things like credentials and designs and plans and so on.

Trey Herr: [00:21:47] I’ll just say, I think the big thing here again, is that, we talk about the levels of secrecy in the classification system and the, the grave significant harm to national security that.

Merits those, we’re forgetting the enemies foreign and domestic model and the project, which is too complex to manage well where classification leak occurs simply because of a mismanaged exchange of information between two compartments or two levels of secrecy is as significant as, a foreign adversary getting into a safe rifling through things and pulling something out.

So I think part of what we’re suggesting would limit secrecy, isn’t that. Classification is bad per se, or that the threats being supposed out there against which classification is meant to protect are true, but just that we need to recognize that these systems are not simple. The people are a huge factor and that we can help ourselves by minimizing the risk of these sorts of problems happening and speeding our own ability to develop these systems.

By addressing these kinds of limitations,

Eric Lofgren: [00:22:39] So on a related front here, your paper asks for, resilience, performance metrics to be put on contract. So we already have this big thing called the cybersecurity maturation model of the CMMC. Can you guys just at a high level, describe, what distinguishes your metrics from CMMC or, how do they compliment each other?

Trey Herr: [00:23:01] Yeah, it’s a good question. CMMC I think is shooting at a different target, which is to say it’s the maturity and really the risk management, sophistication of vendors. This is thinking much more about program management and less the sort of hygiene and integrity of the vendor itself, but. I’ll say, I think that, for us, and as has been said by many others and you manage what you measure and so providing metrics for a concept like a DevSecOps, for example, is the only real way to hold a vendor accountable for actually adhering to that performance.

Philosophy. So it can be an abstract thing at the wrong times. And what we don’t want to do is see, as we have, I think in some cases, these kinds of concepts, which have from roots in the engineering space that have been applied well by certain private sector and academic entities to really interesting ends to have those turnaround and used as branding by existing components of defense, industrial base as a way to overcome.

Access to a contract rather than really drive change in their own engineering process and project management. So in that sense, metrics is crucial, not just for what it is that we see out of the process in terms of efficacy or out, but just that the process actually does implement these recommendations that we actually do see these kinds of cultural shifts.

Eric Lofgren: [00:24:12] are those metrics like the DORA metrics for dev sec ops? Or are they looking to see that you have a dev sec ops type of I guess process as opposed to CMMC, which is more of just like cybersecurity rules.

ZOOM0014_Tr1: [00:24:26] Yeah. CMMC is really focused on maturity. And it’s trying to describe the character of your performance.

I think this is going a step. A step in a similar path, but in a different direction, which is to say, describe the philosophy of engineering, right? Show us that you’re actually embracing these techniques, which we believe are important and valuable to our process. But that we want to see more than lip service to, we want to see actual material changes to, and those metrics are one way to try and address that and imperfect one, for sure.

And again, as we go through the recommendations, one of the things that. We call out there. I think that we asked for is this notion of a center of excellence within DOD or a center of expertise the summer and it, one of its purposes, one of its reasons that we have it, there is there are not clear metrics for some of these, i, I talk about continuous integration, continuous development of the paper. One measure of CICD adherence is the number of commits you do on a code base in a day. That’s interesting, but it’s pretty raw. And when we talk about different kinds of programs, different levels of sophistication, that may not be a great measure.

And so part of what we’re looking for that that center of excellence to do is to actually define ways to measure some of these constructs and that’s leveraging work, being done in academia and in industry and, the FFRDC and national lab community. It’s not for them to be the only source of truth, but just to give an entity, the remit to really start to pull these concepts together and.

Generate some kind of measurable app before them. So you’re

Eric Lofgren: [00:25:46] saying, have there been some complaints from either government or other companies saying that, Hey, these companies, they say they’re implementing DevSecOps, but really it’s just like lipstick and it’s just the same old, agile scrum fall or whatever it is.

Trey Herr: [00:26:00] Yes. I think you there was an explosion of agile branding at a certain point a couple of years ago that. From the outside, probably wasn’t a, it didn’t seem likely to have been matched by the significant change in organizational structure and philosophy that it would need to have been accompanied by.

And we’ve absolutely heard complaints from some of our partners that they see a lot of discussion and a lot of phrasing and a lot of framing. And then things are delivered to them in a quarterly waterfall. And nobody ever talks to the user that is as unagile as it gets.,

Eric Lofgren: [00:26:28] Yeah. I guess you put the DevSecOps in the proposal and then if you win, then you consider whether you’re actually going to implement it.

But by that time it’s we got to get going and then we’ve already been doing it this way. Yeah, and we’ll cost it. And then bill it as a, a change order. But so for this this center of excellence, there’s always new offices come in here, there and everywhere.

What is required to give it the status it needs to succeed, or at least the influence that it these metrics are adopted by the program offices or whatever it is.

Trey Herr: [00:26:58] Yeah, no, I think it’s right. It’s the same thing that we’ve seen work for DDS [Defense Digital Service], but I think it also to some extent for 18 F which we’re digital and transformation focused entities.

That we’re trying to provide support for the transition, not necessarily do the heavy duty systems engineering and loop program offices. I think the three keys, one is access to the secretary’s office access to OSD and in some cases to the secretary directly, and a lot of that is going to fall to Dr.

Kathleen Hicks, who has is familiar with some of these concepts and has done some tremendous work while she was out at CSIS. But as she is organizing the department and really structuring the way that she wants it to function it’s always challenging to give any entity exceptional access to the principal.

And I think it’ll be and there’ll be an opportunity for her to make some determination about how much influence this sort of function should have. But I think especially in its early days, to move mountains and really get the kind of buy-in you need access to the secretary’s office is really crucial.

The second is not trying to solve the problem in DOD. A lot of these concepts exist already in the public domain. They’re being pushed on and developed by industry they’re coming from academia. The center of excellence does not need to be a high side bubble for this work to get relabeled and rebranded under a TS label.

And so I think an important, almost crucial aspect of this is that the center is kept relatively small and it’s dependent on these external entities to feed it information and to do a lot of this work, which is to say, it’s not a center of research. It’s a center of pulling together research, which I’m trying to distill best practices.

So it’s the last 20% of that pipeline. So keeping it small as crucial as well. But I think the last thing really is just taking it seriously, appropriated budget dollars for the office of Senate confirmed position, giving it the status internal to the large warring family that is DOD is is important both for its survival, but also for its longevity.

It’s a reasonable response for our program, not to take new offices, new appointees, seriously until they’ve been around for a few years, because there is this kind of constant churn of, change agents come in and they don’t change. And then they’re de agented and they’re out. And so for this, it’s its ability to convey that it’s going to be a part of the the conceptual architecture for program development and for requirements creation in the DOD for some time to come.

I think will help both determine its success, but also drive a lot of credibility,

Eric Lofgren: [00:29:15] but potentially not another member, on the JROC [Joint Requirements Oversight Council].

ZOOM0014_Tr1: [00:29:20] It wouldn’t be our first place for now again. I think, laughingly. Yes, but also realistically, this should be an entity that has the ability to drive this philosophy of change.

That would probably putting it more in a position of supporting the existing model.

Eric Lofgren: [00:29:34] So before we’re wrapping up here, I want to, I want you guys to talk a little bit about this battle lab idea that you guys introduced in the paper. So can you just introduce what that is and what your vision is for it?

Trey Herr: [00:29:44] Yeah. Happy to take a swing and then turn over to Simon for any thoughts on it. The battle lab is intended to channel two things. First is to take advantage of this SOSITE program that DARPA runs, which is. Got focus already on pulling in service member insight on programs and opportunities and repurposing importantly, existing weapons systems.

The battle lab’s intent is to try and develop an opportunity for staff, great officers and feel great officers about 120 we suggest in a given year, to really think through what it is that existing assets and platforms can do. What it is that they’re what missions exist for them now and where they could be reapplied or re-engineered in future.

Part of the reason for this is. As we were looking at through the notion of agile development and continuous integration, continuous deployment, there’s a really significant emphasis on user feedback. But the DOD has to develop these systems, not just for what one pocket of users wants. And that’s to overcome service parochialism as much as it is to stay ahead of complex and compound threat environment. It’s also trying to develop these systems to achieve a broader doctrinal end states, right? As, as a strategic intent and doctrine is the way to actually execute that strategy. And so the hope with battle lab is to try and pull this agile development methodology into the doctrinal process in a joint natively joint way in a more significant fashion, so that not just the need from a specific user community, but actually the con ops.

That user community exists in are both subject to this kind of mission resilience, principle, right? Intense feedback. Self-learning speedy adaptation and iteration to create that flexibility. I think the hope in the long-term, and this is an experimental process here, so we’re curious to see if we could get it implemented where it goes, but the hope in the longterm is that battle lab as a compliment to some of the existing transformation organizations as a source of guidance or insight on how to adapt existing programs and how to overcome some of the limitations of a new acquisition. One of the things that’s really striking about the century series as an example is it’s looking to take best of breed technologies from existing pots of commercial development.

And adapt those into something that works, not necessarily to do a sort of ground up rebuild and reconceptualization of what a modern fighter aircraft combat aircraft should look like. This, I think is a similar space to say, Hey, if we have an aging tanker asset, And we could string it up with some relatively cheap phased array radars.

Could we turn that into something else? You saw some really interesting work on this with light general aviation aircraft in the 2010s and twenties around creating these effective long legged electronic warfare and intelligence platforms, right? Small Cessnas and Gulf streams rather than.

Scratch from the beginning ground up ISR aircraft. That’s the kind of adaptation where not only with civilian technologies and COTS, but actually with existing defense platforms, we could see some really, can somebody find a mission for the C130 that takes this thing and, into the 2150s, the 2160s I believe that it’s out there.

So battle lab is to try and take that. From the sort of mid-grade officer perspective and say, okay, as we think about a changing concept of operations, where could we slot in aging aircraft with remaining service life, equipment from other domains and apply them in creative, interesting ways to solve the problem.

And thus, hopefully I think this is where it becomes most important. So I’ll close with it. Solve the problem of reducing or trying to reduce the number of overall programs that we need to acquire in a given year. If we can reuse something that exists and modify it, take a mature aircraft, take a mature ground vehicle that is in many cases, a better solution in terms of efficiency, speed of access to the technology and responsiveness to the threat.

Then clean sheet, designing a new platform. So I think that’s battle apps trying to do a couple of different things, but hopefully it both limits the size of the pipe as well as bring some new ideas in at the top.

Eric Lofgren: [00:33:35] Trey herr, Simon handler. Thanks for joining me on acquisition talk.

Trey Herr: [00:33:39] Great talking with you, Eric. Thanks for having us.

Simon Handler: [00:33:40] Thanks a lot, Eric.

Show less

Acquisition Talk

A daily blog on weapon systems acquisition