At any given time, you will have sectors where demand is growing faster than productivity (think of health care and education) and other sectors where productivity is growing faster than demand (think of manufacturing). In the sectors where demand is growing faster than productivity, you have rising relative prices, or “cost disease.”
That was from Arnold Kling. There seems to be evidence of cost disease in healthcare, education, and even construction. Interpretations abound. But what about defense?
Back in 2003, Kling considered whether defense was a cost disease sector. He said that Victor Hanson’s assessment of the Iraq war may have indicated that the DOD has overcome cost disease:
The United States military is now evolving geometrically as it gains experience from near-constant fighting and grafts new technology daily. Indeed, it seems to be doubling, tripling, and even quadrupling its lethality every few years.
Two things are going on. The human skill gained through constant fighting and the new technologies. Both are extremely hard to measure.
Commanding general Schwarzkopf of Desert Storm allegedly said that the outcome with Iraq would have been the same even if the sides switched combat systems. The US had a qualitative superiority in personnel. But that must have been his feeling, right? Our systems must have also been so superior that the effect of each is difficult to disentangle.
Well, POGO tells us that the cheaper, and in some cases inferior, systems did much of the work for the US while the advanced technology was in some cases a headache.
The cheapest combat aircraft in the U.S. Air Force, the A-10, was responsible for over half of all Iraqi equipment losses, destroyed more tanks than any other aircraft (1,000), achieved the highest sortie rate, and the highest readiness rate [95%] of any U.S. Air Force combat aircraft in Desert Storm. [Note, the Air Force never wanted the A-10, it was forced on them like the F-16 was.]
Marines’ decades-old M60A1s, while inferior in many ways to the M1, cost only $1.2 million each, and handily destroyed Iraqi T-72s and suffered no losses to other tanks or missiles. Marine M60A1s units also stopped only once per day to refuel, and made daily maintenance checks with the engines running. In contrast, Army M1 units [$3 million] planned one hour fueling and maintenance halts every three to five hours [which “may have subsequently allowed key Iraqi armored units to escape certain destruction”].
Some advanced weapons gave trouble. Half of Apache attack helicopters were grounded worldwide to make the other half available at 90%, but still they only flew one-hour a day. The Patriot missile defense system was first claimed to have a 96% success rate, but that was exaggerated. It failed to hit a scud that killed 28 US soldiers. The success rate estimate was lowered to 5-10 percent, software was to blame.
Think about the Patriot. Whatever its cost-effectiveness was at a 96% hit rate, that is a factor 19x higher than it would be at a 5% hit rate. That’ll sway any cost-benefit analysis.
This gets back to the measurement problem. What ways are available to measure cost disease in defense systems? I will outline three general methods and their evidence: (1) Cost Growth; (2) Input-Output; and (3) Econometric.
Cost Growth:
This is the DOD’s favorite way to report on its performance, but it is clearly the most irrelevant. How do you get rid of cost growth? Start with a higher cost target! Cost growth is answering whether or not the DOD can predict the cost of work. It does not answer whether military effectiveness is increasing relative to cost.
Nevertheless, let’s explore what the DOD says in its Performance of the Defense Acquisition System report. First, let’s look at contract cost growth.
Contract cost growth from 18,470 EV reports on 1,123 major contracts for 239 MDAPs |
We see some trends, but cost growth has been under 10% in every year! That’s not so bad, right? What’s everyone complaining about?
Contract cost growth isn’t the statutory measure, however. That is based on program budgets authorized by Congress, as reported through the Selected Acquisition Report (SAR). Let’s see those trends:
Program Average Unit Cost (PAUC) Cost Growth by reform era. |
OK, so this looks a little bit worse. But still, post-2001 median growth is under 10%. What’s there to worry about? Well, having worked with SAR and EVM data for years, I can tell you that these numbers could have looked very different under equally valid specifications (garden of the forking paths).
Many choices can be contended with. Take quantity adjustment. Poor performance and cost growth lead to reduced quantities and program stretch outs. It is not that things were going exactly to plan, then Congress unfairly cut their budget. Removing these effects from cost is arbitrary and misleading. Using the median is also misleading because when programs go wrong, they can go very wrong into the fat-tailed distribution. Here’s another look at cost growth by program:
RDT&E cost growth by program, by year of MS B. Measured this way, the F-35 only grew a cumulative 50% over 13 years. |
Older programs have bad cost growth. Newer programs are still in early phases, so their growth hasn’t settled in yet giving the false impression that things are getting better. Follow-up posts will describe EVM and SAR data and their quirks.
Cost growth reported by the DOD doesn’t seem too worrisome. But let’s say the next contract/program is based on realistic cost estimates from historical data [programs are funded to independent will-cost estimates]. 20% cost growth on my last contract will set the new, higher, baseline for my next contract. 20% cost growth again returns compounding growth on what the contract/program should have cost. So constant growth growth to “realistic” will-cost estimates actually means performance is deteriorating at an accelerating rate.
This doesn’t take into account the severe jumps in cost built into will-cost estimates when going from one platform to another, justified by “higher performance”. But was that performance worth the cost?
We can say that cost growth is a signal, but doesn’t tell us anything directly about cost disease.
Input-Output:
Linking the flow of resource inputs to military outputs is our first real attempt to measure cost disease. This can be done in a number of ways. First, the crudest. What has been the size of our force structure compared to the size of budgets? Here’s the quantity of aircraft orders over time from RAND:
Aircraft purchases by DOD, 1975-2005. Aircraft budgets started rebounding in 2000 to 1980s levels, quantities did not. |
Now, I don’t have a direct link to show you the aircraft budget, but I’ve worked with those numbers and I can tell you that like the overall DOD budget, it rebounds in the late 90s/early 2000s to the constant-dollar budgets of the 1980s. So we are getting something less than 1/2 the aircraft for the same real budget. Here’s the total DOD budget (inflation adjusted), and aircraft got its share.
DOD Budget in constant 2018 dollars (billion). Data retrieved from here. Adjusted from 2000 to 2018 dollars using GDP Price Index from FredGraphs. Note that the constant dollar figures reported by the DOD diverge from the GDPPI used here in the 1980s, so it looks with their unique deflation methods that overall DOD spending was much higher in the distant past, and thus minimizes the view that the DOD budget has grown more in real terms. |
Total active ships in US Navy forces, as reported by the Navy. |
Again, I don’t have the Navy SCN budget over this period, but it has been growing. Be weary of the budget figures in the 30-year shipbuilding plan, they are deflated using a special index they develop with the BLS. It measures the increase in labor and commodity prices going into shipbuilding, and tends to grow faster than inflation. SCN budgets figures are often deflated at a rate faster than economy-wide inflation.
Well that’s a swag anyway. Not much data is available to the public, but no one in the DOD will debate the fact that force structure has shrank in critical areas while budgets have been robust or growing.
Yet this doesn’t tell you everything because the newer classes of ships or aircraft could be many times as capable, or cheaper to maintain and operate. Whoa, one step at a time.
For now, let’s dive into sources of escalation in system production costs. For example, are labor and material input prices growing? Or, do new platforms require more hours of labor and more complex material types?
Here is RAND measuring Navy ship cost escalation. They break it down into economy-driven factors (labor/material/equipment escalation that firms face) and customer-driven factors (complexity, quantity changes, etc.). Let’s start with economy-driven factors. Here’s what they find in terms of cost of labor (burdened includes overhead costs).
The Employment Cost Index (ECI) is the national average for wage growth, which outpaces the cost of consumer goods (CPI) reflecting productivity. Shipbuilding labor rates grow less than one percent above the ECI. (Now, this is even an underestimate on the burdened labor because of accounting changes that haven’t been adjusted for.) How about ship material prices?
BLS PPI rates for generic commodities related shipbuilding. Note that the x-axis says “real growth”, but this must be mistaken because the DOD/CPI growth rates were not 2-3% over general inflation. |
Material escalation is shown here is from the BLS Producer’s Price Index, not their specific measurements for Navy contractors. These are all growing less than the CPI (not the most relevant benchmark, but good enough). RAND found that contractor Labor and Material escalation doesn’t seem to have been too big a factor.
RAND concluded that economy-driven factors contribute to 4-5% annual growth in ship costs. That’s 1-2% real growth over inflation and corresponds to my knowledge of the Navy SCN figures. But these factors are thought to be outside the Navy contractor’s hands, just like doctor pay and the price of an MRI is outside the hospital’s hands.
But that’s not it. Actual ship production costs have increased more than that. RAND found an equally large driver of growth was in requirements and complexity. Let’s see the breakdown between in annual cost escalation between the DDG-2 (1961) and DDG-51 (2002).
There you go, 9.2% annual growth between the DDG-2 and DDG-51. The DDG-51 cost about $1 billion. If we updated this to include the DDG-1000, which cost about $7.5 billion per ship, the escalation rate would have been substantially higher. (We are buying more DDG-51s to cover the gap created by the DDG-1000 cancellation, which had only 3 of 11 critical technologies successfully demonstrated.)
This 2006 RAND study doesn’t capture the newer Navy boondoggles of the DDG-1000, CVN-78, LCS, RMS, LHA-6, LPD-17, and others.
Even then, RAND found in 2006 that ship cost escalation was 7-11%. That’s on par or higher than what they found for “cost disease” sectors like education and healthcare.
RAND did a similar study for aircraft cost escalation. The conclusions are roughly the same. Let’s take a glance at sources of cost escalation from the F-15 (1975) to the F-22 (2005).
Labor rates were growing at nearly 6% per year, but due to aerospace labor productivity and a general decrease in prime labor input (increased outsourcing), labor only contributed less than 1% growth to the escalation of the F-22 over the F-15. Economy-driven factors resulted in almost no real growth, but overall costs grew substantially.
For various types of aircraft, RAND again found roughly 7-11% annual cost escalation. 7-11% annual growth is, let’s call it, 4-8% real price growth over inflation. That means with a flat budget in real terms, we can afford to procure half as many ships and aircraft every 9-18 years that pass. In 30 years time, production rates could easily drop by a quarter with the same real budget. That seems to square with our overall look at force structure to budgets.
And yet this input-output look doesn’t tell the full story. Basically, if the price grew 10% and you could account for 4% with observable labor and material input escalation, you just assume the remaining growth is complexity, requirements, quantity changes, or something like that.
If higher technology isn’t reducing end item cost, must it not be vastly increasing its output (i.e., lethality)?
Knowing system cost isn’t good enough. We need to know the military value of the systems to see whether quality is outpacing cost. After all, an F-22 isn’t the same as an F-15. It has stealth features and supercruise, among other capabilities. How much of that 10% escalation is showing up as higher quality, and thus, needs to be adjusted for? Maybe leaps in capability indicate productivity gains.
Econometric:
William Nordhaus published a famous paper showing the quality-adjusted price of light. It showed a tremendous decrease in cost from firewood to candles to the electric bulb. The point was that we often miss how technology leads to radical reduction in prices if we don’t measure the right thing, such as a constant luminescence (lumens). He found that traditional measures of price indexes for light between 1800-2000 would be nearly 1,000x too high! Could this same error be made in defense by not accounting for constant quality?
Now, Nordhaus still didn’t control for everything. Firewood didn’t just create a unit of light, it was multifunctional. It provided warmth, heat for cooking, scared away animals, could be used as a signal, and so forth. A light-bulb is good for one thing, light.
That’s the problem, controlling for multiple incommensurable outputs of a good or service. The problem is compounded many times for defense systems because they have so many attributes and features which are constantly evolving. Even then, it is not even clear how systems will perform until they are put into actual combat.
There are a couple ways to measure effectiveness. You can look at technical specifications, such as size, weight, or thrust. You can also look at performance, such as speed, precision, or range. You’d then try to match up all relevant attributes between systems and measure their change. But is a speed twice as fast worth twice the quality, and thus, twice the cost? Maybe, maybe not. Can’t be said.
A bigger problem is new features. The F-15 didn’t have stealth features, the F-22 did. That’s basically an infinite increase in the quality parameter of stealth. Does adding stealth make the F-22 instantly twice as good as an F-15, all else equal? Can’t be said.
OK, so there’s lots of attributes, how do we weight their importance? Well, we can’t. Quality measures in different units cannot be jammed together. Even if you knew the precise measurements of every attribute, the incommensurable problem is insurmountable. You can’t put it into one number. This basically caused the failure of DOD systems analysis for McNamara’s “whiz kids.”
What’s the best we can do to measure cost disease? It’s called a hedonic index, which is basically a regression where cost is the dependent variable and you add independent variables for quality, quantity, and time. The BLS uses it to determine prices of goods whose qualities are changing fast, like computers.
Let’s look at some data of unit costs over time. This will be the dependent variable we want to predict from technical specifications or performance attributes. Let’s focus on tactical aircraft for now from an IDA study using SAR data:
Tactical Aircraft unit recurring costs (inflation-adjusted to 2002) from IDA study. |
First, this is on a constant dollar log scale, so those are big cost increases. Second, these are unit recurring flyaway costs. They exclude all RDT&E costs. They also exclude non-recurring costs, spares and repairs, and other categories, which is basically arbitrarily decided by the program office which reports the figures. Third, this shows annual production lots, which decrease over time for a system because of learning and rate effects associated with increasing quantities.
We can see that costs are growing exponentially between systems, and decreasing over time within a system. So we need to back out not only quality differences, but perhaps even quantity differences. We are ordering only a handful of F-22s and F-35s per year, but ordered hundreds in the first year of the F-16.
IDA used the following independent variables to measure the quality effects: Empty Weight, Max Speed, % Advanced Materials, Stealth (5th Generation), and STOVL. It used cumulative quantity produced to measure quantity effects (ignoring annual rate). Finally, it used a set of dummy variables, one for each and every year of production. The coefficient on the time dummy variables measure all change in cost not attributed to quality or quantity parameters, and when put together form an escalation index.
Hedonic regression summary from IDA study. |
Again, we find aircraft escalation rate above 7% per year (nominal), controlling for quality and quantity. IDA’s hedonic study on ground vehicles also showed significant escalation.
I will follow up with more thorough analysis, but let’s ponder on the methodology and results for a minute. Clearly weight, speed, stealth, and STOVL tell us just a very small part of aircraft effectiveness. What about its range, sortie rate, maintainability, agility, payload, visual footprint, radar, wing-loading? The list goes on indefinitely.
We can’t blame IDA for these deficiencies, they just don’t have the information, and if they did, it would reduce their degrees of freedom to zero pretty fast. But you can only have as much confidence in the escalation rate as you do in the measures of quality.
The T1 cost, or theoretical first unit cost, may also serve as the model’s prediction for the aircraft’s quality quoted in dollars. Because the F-14 is a swing-wing design (uncontrolled), it has a really high empty weight leading to a higher measured “quality” than any aircraft besides the F-22 and F-35C variant. Clearly this is absurd.
The bottom line is that hedonic indexes say nothing about quality! They answer the question of how cost variation is correlated with quality-proxying parameters. This says nothing about benefits to the consumer/DOD. It merely assumes that if something is costly to produce, then it must be beneficial to the consumer. Otherwise, why is it done?
In the DOD, there is a monospony/oligopoly structure. Decisions are made because people 10 years ago agreed to a requirement without understanding its technical implications.
Moving on. A learning curve of 84% is believable (if you “believe” in learning curves), but this removes a lot of cost increases that can arguably be left in. If we want more advanced technologies that are much more expensive, then we can afford less of them. New aircraft then get less benefit from “coming down the learning curve” because of their expense. They also get hit because of reduced annual quantities and program stretch outs due to RDT&E problems. So that 7% cost escalation should be much higher given the portfolio we have chosen.
The hedonic regression could inform judgment on future choices. But that’s a story for another day.
[Technical note that is overly simplified: 84% learning means that for every doubling of production aircraft, the unit recurring cost comes is reduced to 84%. So if the first unit cost $100, the second will be $84, the fourth will cost $71, the eighth $59, etc…
If I can only afford half as many aircraft, then I’ll pay a premium. It is this concern that leads to joint, multi-mission (jack-of-all-trades) aircraft to take advantage of learning/rate effects. But those decisions lead to an overly complex system that actually underperforms relative to simpler, dedicated-mission aircraft. They are also much more costly to maintain per flight hour.]
____________
One final comparison, the BEA puts out deflators for defense goods. They are kind of hard to believe.
IDA comparison of BEA and BLS price indexes. |
Defense aircraft prices, as measured by the BEA, are flat from 1985 to 2010. That’s something like 2% real price decay. Even civilian aircraft prices in the BLS PPI have more than doubled in that time. In another post, I’ll talk about the differences in price index methodologies between the BEA and the BLS. (The BLS does not use hedonic indexes for its PPI civilian aircraft).
Don’t cite BLS or BEA figures on defense if you’re going after cost disease. They just aren’t made for that.
Hedonic is the best we can do, but perhaps it is worse than no measure. IDA concluded that the F-35 is just another fighter aircraft program following historical trends. If the hedonic index was used to estimate F-35 production costs at Milestone B in 2002, IDA claimed there would have been zero cost growth! So in reality, if we projected costs “correctly,” then the F-35 is a spot-on program. Something about that feels both plausible and dangerous.
Conclusion:
It’s clear that defense acquisition costs are growing at least as fast, and probably much faster, than education and healthcare costs. Defense platform unit costs grow nominally from 7-11% per year. Doing some adjustments, DOD production costs probably grow twice the rate of inflation.
This can lead to a structural problem, such as the one Norm Augustine pointed to in 1984. If aircraft costs kept growing at their pace, Augustine estimated that by 2054 the entire DOD budget would go to buying one aircraft. That says nothing about quality, but it’s hard to imagine one airplane doing the entire air mission needs of the DOD.* This raises structural problems about choices we make on new weapon systems.
Basically, I’ve given you little to conclude from quantitative data. System costs have grown exponentially, much faster than inflation or budgets. We can try econometric models to pull out quality differences, but those simply don’t give a quality-adjusted price index as though we were measuring the same thing over time. We haven’t connected this to O&S costs, readiness, combat effectiveness, and so forth.
Yet the same could be said for healthcare and education, which are an evolving set of goods and services which are difficult to quality-adjust. There is a lot more to this story. Stay tuned for updates which will explore various explanations for defense cost disease. We have only scratched the surface of the evidence.
*Final note:
Anything that grows exponentially in the real world hits constraints, or else it would consume everything. In terms of defense “output,” even if we had exponential growth in lethality but could only afford one system, there’s another exponential facing us down.
Lanchester’s square-law finds that power is not proportional to the number of units engaged, but the square of the number. So if my system is 3x more powerful than yours, but you can field twice as many, I might think that I still have a 3-to-2 advantage in power. But Lanchester would argue that you would win. The power-score is 4-to-3 in your advantage. As more units are fielded at that ratio, the more handily you’d beat me.
Leave a Reply