Philip Koopman and Michael Wagner, Edge Case Research
Self-driving cars promise improved road safety. But every publicized incident chips away at confidence in the industry’s ability to deliver on this promise, with zero-crash nirvana nowhere in sight. We need a way to balance long term promise vs. near term risk when deciding that this technology is ready for deployment. A “positive trust balance” approach provides a framework for making a responsible deployment decision by combining testing, engineering rigor, operational feedback, and transparent safety culture.
MILES AND DISENGAGEMENTS AREN’T ENOUGH
Too often, discussions about why the public should believe a particular self-driving car platform is well designed center around number of miles driven. Simply measuring the number of miles driven has a host of problems, such as distinguishing “easy” from “hard” miles and ensuring that miles driven are representative of real world operations. That aside, accumulating billions of road miles to demonstrate approximate parity to human drivers is an infeasible testing goal. Simulation helps, but still leaves unresolved questions about including enough edge cases that pose problems for deployment at scale.
By the time a self-driving car design is ready to deploy, the rate of potentially dangerous disengagements and incidents seen in on-road testing should approach zero. But that isn’t enough to prove safety. For example, a hypothetical ten million on-road test miles with no substantive incidents would still be a hundred times too little to prove that a vehicle is as safe as a typical human driver. So getting to a point that dangerous events are too rare to measure is only a first step.
In fact, competent human drivers are so good that there is no practical way to measure that a newly developed self-driving car has a suitably low fatality rate. This should not be news. We don’t fly new aircraft designs for billions of hours before deployment to measure the crash rate. Instead, we count on a combination of thorough testing, good engineering, and safety culture. Self-driving cars typically rely on machine learning to sense the world around them, so we will also need to add significant feedback from vehicles operating in the field to plug inevitable gaps in training data.
POSITIVE TRUST BALANCE
The self-driving car industry is invested in achieving a “positive risk balance” of being safer than a human driver. And years from now actuarial data will tell us if we succeeded. But there will be significant uncertainty about risk when it’s time to deploy. So we’ll need to trust development and deployment organizations to be doing the right things to minimize and manage that risk.
To be sure, developers already do better than brute force mileage accumulation. Simulations backed up by comprehensive scenario catalogs ensure that common cases are covered. Human copilots and data triage pipelines flag questionable self-driving behavior, providing additional feedback. But those approaches have their limits.
Rather than relying solely on testing, other industries use safety standards to ensure appropriate engineering rigor. While traditional safety standards were never intended to address self-driving aspects of these vehicles, new standards such as Underwriters Laboratories 4600 and ISO/PAS 21448 are emerging to set the bar on engineering rigor and best practices for self-driving car technology.
The bad news is that nobody knows how to prove that machine learning based technology will actually be safe. Although we are developing best practices, when deploying a self-driving car we’ll only know whether it is apparently safe, and not whether it is actually as safe as a human driver. Going past that requires real world experience at scale.
Deploying novel self-driving car technology without undue public risk will involve being able to explain why it is socially responsible to operate these systems in specific operational design domains. This requires addressing all of the following points:
Is the technology as safe as we can measure? This doesn’t mean it will be perfect when deployed. Rather, at some point we will have reached the limits of viable simulation and testing.
Has sufficient engineering rigor been applied? This doesn’t mean perfection. Nonetheless, some objective process such as establishing conformance to sufficiently rigorous engineering standards that go beyond testing is essential.
Is a robust feedback mechanism used to learn from real world experience? There must be proactive, continual risk management over the life of each vehicle based on extensive field data collection and analysis.
Is there a transparent safety culture? Transparency is required in evolving robust engineering standards, evaluating that best practices are followed, and ensuring that field feedback actually improves safety. A proactive, robust safety culture is essential. So is building trust with the public over time.
Applying these principles will potentially change how we engineer, regulate, and litigate automotive safety. Nonetheless, the industry will be in a much better place when the next adverse news event occurs if their figurative public trust account has a positive balance.
Philip Koopman is the CTO of Edge Case Research and an expert in autonomous vehicle safety. Including his role as a faculty member at Carnegie Mellon University, Koopman has been helping government, commercial and academic self-driving developers improve safety for over 20 years. He is a principal contributor to the Underwriters Laboratories 4600 safety standard.
Michael Wagner is the CEO of Edge Case Research. He started working on autonomy at Carnegie Mellon over 20 years ago.