Consequences and Tradeoffs—Dependability in Artificial Intelligence and Autonomy
Author: Amanda Andrei
Clinical diagnostic support. Loan approval. Predictive policing and parole.
These are all examples of consequential systems, meaning that they are systems with immediate, long-term, impactful consequences on people within them. And when artificial intelligence (AI) is added to the mix, helping authorities within these systems make decisions about medicine, mortgages, or freedom—that better be dependable.
Dependable AI refers to the adaptation of capabilities from high assurance systems engineering to consequential AI, especially machine learning systems, to efficiently mitigate concerns about fairness, operational risk, technical debt, credibility, and more.
In other words: how do we know we can depend on the AI helping us make or understand a major, life-altering decision?
“One chief way we achieve high assurance or high dependability is through a laborious or time-intensive process,” explains Chuck Howell, Chief Scientist for Dependable AI. “For example, in avionics, you go through an FAA (Federal Aviation Administration) certification process. But AI engineering tends to be fast, iterative, exploratory. How do you marry that up with a deliberate, thoughtful certification process?”
While Howell notes that there is much work done in exploring agile assurance and how to get high confidence faster, compromises must still be made—which can be unimpressive or boring in a social environment of flashy AI presentations. And often in AI research, “Everything is a tradeoff, and everyone’s under time pressure, and you have to get it running by the end of the month,” Howell says. “If you do a proof-of-concept prototype, you’ll have a glitzy demo you can give people in a week, but when you’re spending a lot of time doing a truly thorough hazard analysis or coming up with a test plan—you don’t see anything from that, it’s a terrible demo.”
And that glitzy demo can lead to technical debt, the concept of “extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution.” When it comes to machine learning and AI and the current emphasis on fast, iterative development and demos, these technologies are especially vulnerable to technical debt. And while some debt can be a good thing, allowing for more flexibility, too much of it can seriously decrease the confidence and dependability of a system, leading to real consequences for the users.
The dependability of AI also plays out over time, bringing up notions of fairness and bias in the system’s output. Recent examples of unintentional bias include job recruitment tools favoring men over women and chatbots repeating profanity and racism—problems that only came to light after time passed with the tools operating in the real world. And the problem is often a reflection of the intrinsic data fed into the system.
“For example, I train a system, it works well enough, but I deploy it, and the data it sees is changing over time,” Howell remarks. “But say the system is learning and changes its behavior, or an adversary has a different tactic to use against the system. How do I have confidence after it’s deployed?”
One of the ways to increase dependability in AI systems is to adjust expectations about the AI life cycle. While fast, iterative processes can provide remarkable results, there is also a need to fold assurance and testing into the life cycle—which requires a sense of knowing that payoffs will come later rather than sooner.
“You have to be willing to say, ‘I know this isn’t satisfying to do right now because it won’t be in a cool demo by tomorrow,’ but you put it in place right now and you’ll be very glad you did when you demo it in a year from now. The complexity of the assurance processes and the need to trade off assurance with other features requires delayed gratification,” notes Howell. “To paraphrase a quote [from computer scientist C. A. R. Hoare]: simplicity is the price of reliability and it’s a price that the wealthy find hardest to pay.”
Meaning that to have high dependability and confidence in a system, users and engineers should be willing to let go of having lots and lots of features that may have been tested on small or specific datasets—and instead opt for a system with well-tuned and well-tested features that have seen large and diverse datasets. And that takes time, expertise, and patience—well worth it when major life consequences are at stake.
Amanda Andrei is a computational social scientist in the Department of Cognitive Sciences and Artificial Intelligence. She specializes in social media analysis, designing innovative spaces, and writing articles on cool subjects.
© 2019 The MITRE Corporation. All rights reserved. Approved for public release. Distribution unlimited. (Case # 19-0465)
MITRE’s mission-driven team is dedicated to solving problems for a safer world. Learn more about MITRE.