Is continuous testing the key to SpaceX software success?

In modern space flight, mission critical systems are becoming increasingly dependent on software controls for performing critical functions. Unfortunately, software failures are one of the most dominant causes of failures in today’s mission-critical systems. Software failures in such missions can cause mission performance degradation or even complete mission failure, incurring a heavy scientific and economical penalty. Some major failures over the years include

 

  • Mariner I – Missing Hyphen in code
  • Ariane-5 – Unhandled floating point exception
  • Mars Pathfinder – Priority inversion /scheduling bug
  • Mars Climate Orbiter – Navigation system failure, metric to imperial units conversion failure

… There are four software teams contributing to at SpaceX

 

  1. Flight Software
  2. Ground Software
  3. Avionics Test
  4. Enterprise Information Systems

The use of commodity components (x86, unhardened PPC processors and Linux) allows a single workstation to simulate every controller and processor. Hence allowing for automated testing en masse. SpaceX tests all flight software on what can be called a table rocket. They lay out all the computers and flight controllers on the Falcon 9 on a table and connect them like they would be on the actual rocket. For integration testing they run a complete simulated flight on the components, monitoring performance and potential failures. For stress testing, engineers perform what they call “Cutting the strings” where they randomly shut off a flight computer mid simulation, to see how it responds. This level of simulation mixed with a significant amount of automation is used to achieve high outputs from these developers. In fact, SpaceX can push software into product 17,000 times a day with confidence!

 

What we can derive from how SpaceX builds software is that reuse, DevOps and continuous testing workflows are key to their success. In fact, more and more companies are deploying DevOps and continuous testing workflows similar to SpaceX. As a result they have been able to make big leaps in innovation.

That was from a Coder’s Kitchen post, “How SpaceX develops software.” SpaceX software organization should look like the DoD’s to some degree. You have ground and flight software at SpaceX which is equivalent to software managed by program platform verticals like ships, aircraft, ground vehicles, etc. Then you have an Enterprise Information System group which may have a similar role as DoD orgs like the Space Force Enterprise Corps, Air Force Platform One, or PEOs for Enterprise Information Systems in the Army and Navy.

I suppose the missing link in the DoD is a dedicated mission system test software group. SpaceX’s test workstation seems to be a critical enabler to a devsecops approach to hardware/software integration problems. It seems like such a test regime needs (1) to be considered from the start, it’s hard to add it onto an existing program; and (2) modular design so that families of systems can all use the same test workstation.

Be the first to comment

Leave a Reply