AI exposes challenges to cost estimation

Many people have written about how artificial intelligence can improve the business of cost estimating. However, those papers tend to focus on costing out industrial era projects in construction or military hardware. In these cases, where most of the cost is in physical labor and raw materials — and where future steps can be planned relatively accurately — AI can probably generate more accurate cost figures.

Of course, that is true only when all the necessary data has been collected, and when previous efforts which look very much like future projects. Let’s leave aside this problem because it affects the human cost estimator as well (though not as severely, as human estimators often do ad hoc data collection or normalization to fill the gaps).

Let’s presume that we have a clean set of data of historical costs, identified by standard categories such as a work breakdown structure, organizational breakdown structure, labor and material pricing categories, recurring and nonrecurring, and so forth. I could argue that when we consider routine construction or hardware projects, a trained algorithm could produce better cost estimates on average than humans could without it.

The algorithm would find useful correlations in the data. For example, fabrication labor costs may be falling 10 percent every time production quantity doubles (i.e., a 90 percent learning curve). It might find other correlations associated with the level of technology, such as a 10 percent increase in a system’s range or speed being associated with 50 percent higher engineering costs on the power plant. Putting all these correlations together could create a pretty accurate cost estimate.

However, note that the correlations associated with technology change are only predictive of the cost of existing technologies. Attaining new performance levels outside of historical data cannot be predicted in most cases. Nature, for example, presents us nonlinearities which forces us to take new approaches.

You cannot, for example, put a powerful jet engine into a subsonic airframe to fly at supersonic speeds. The presence of shock waves at transsonic speeds requires a new airframe design — where the entire horizontal stabilizer is movable. The cost of discovering this fact cannot be found anywhere in historical data sets, no matter how smart the algorithm. Then, the information discovered at transsonic speeds will not be useful for predicting the cost of solving problems with hypersonic vehicles.

The R&D of new technologies is just one instance of the general class of projects which are not amenable to statistical costing techniques. Any project whose aim is to develop intangible assets — including R&D but also including software, databases, business processes, company culture — cannot be costed accurately. Costing requires knowing the activities and resource costs involved in the project. Even if resource costs are known, the activities involved in creating intangible objects cannot be know.

If the activities involved in building a new software systems were known, then the intangible asset has already been created in the planning stage. You can reproduce it at zero marginal cost. There is no costing problem. It wouldn’t be software development if there were no new aspects that needed to be figured out by the “man-on-the-spot”.

By contrast, consider the construction project. Almost all intangible assets have already been produced before the cost estimate is performed. We have blueprints and specifications of material types and all that. Cost estimating is still useful, however, because most of the project’s cost was not in the designs. Most of the budgeted dollars must go to construction labor, capital equipment, and building materials. The bulk of cost is in turning the design into a physical reality.

For software and other intangible assets, the entire expenditure of effort is in the design. For construction and hardware projects, all the “intangibles” are assumed to have been completed before the costing work is done — or we have to assume that the intangible costs of future projects will be about the same as in past projects. This is clearly problematic.

AI algorithms are very good at detecting correlations in data, so long as there is not too much environmental variation or uncertainty. Cost estimation also relies on finding correlations in historical data that are predictive of future projects (even engineering build-ups). But we know that AI algorithms are useful in only very narrow circumstances. An algorithm trained to identify cats cannot identify enemy missile sites. AI algorithms will therefore do well in costing projects that are relatively routine.

Algorithms cannot, however, generalize and make predictions where it matters most. When projects involve creating something new, which is well outside the range of observed data, project cost may have no relationship whatsoever to progress being made. (See Fredrick Brook’s “The Mythical Man Month”.) This is the case for creating any intangible asset, and has become apparent in software projects like GPS-OCX and the F-35’s ALIS system where additional resource inputs have accomplished zero or negative progress. Answering questions of cost for these projects requires simultaneously answering all technical questions as well, something that neither the cost estimator nor the AI algorithm are well-suited to do.