Better Practices for Explainable Decisions: A Discussion of Explainable AI

Author: Dr. Jennifer DeCamp

In November 2019, a customer of Apple Card complained when he and his wife separately applied for Apple Card credit, and his wife was given a credit limit twenty times lower than his despite the fact that they jointly owned all assets in a community property state. According to Neil Vigdor in an article in the New York Times, an Apple Card representative checked into the matter and came back with the explanation that “It was the algorithm.” Goldman Sachs, who co-managed the card with Apple Corporation, provided a similar explanation, stating that the difference in credit limits was due to the use of an Artificial Intelligence (AI) Machine Learning (ML) algorithm, and that there was no bias in the algorithm, as employees had no knowledge of the gender of applicants. From this account and similar news stories, it appears that a key problem in this consumer, public relations, and legal nightmare was that no one—developers, testers, implementers, help desk personnel, management, or the customers—adequately understood how this algorithm—or AI ML algorithms in general—work. It is also clear that there was no means to adequately explain specific credit decisions to customers.

The right to a meaningful explanation is deeply embedded in our laws, practices, and culture. The Equal Credit Opportunity Act states that when an applicant experiences an “adverse action,” such as a low credit limit, the reasons for that action “must be specific and indicate the principal reason(s).” Moreover, “statements that the adverse action was based on the creditor’s internal standards or policies or that the applicant, joint applicant, or similar party failed to achieve a qualifying score on the creditor’s credit scoring system are insufficient.”

There is particular concern about understanding adverse actions that are a result of AI ML. For instance, the U.S. Federal Trade Commission Guidance for AI Adoption states that their “law enforcement actions, studies, and guidance emphasize that the use of AI tools should be transparent, explainable, fair, and empirically sound, while fostering accountability.” The Department of Defense Principles for Ethical AI states that “the Department’s AI capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.” The National AI Research and Development Strategic Plan: 2019 Update “emphasizes the need for explainable and transparent systems that are trusted by their users, perform in a manner that is acceptable to the users, and can be guaranteed to act as the user intended.” These science and technology gaps are being addressed by a range of research, including the Defense Advanced Research Projects Agency (DARPA) XAI program.

However, what constitutes explainablility and transparency—or similar issues such as interpretability or auditability—often depends on what is being explained and to whom. Ideally, the Apple Card credit assessment system would have been designed and built to produce an answer understandable to the user. For instance, it might be designed and built to show the specific text or other information that was used to calculate the decision. The user could thus contact the company to talk about the validity and weighting of data as opposed to the performance of a black-box algorithm.

However, most XAI is directed towards developers in order to debug and test their systems, as well as to provide answers to their customer support team and management in situations such as with the Apple Card, when a decision outcome does not meet expectations. As described in detail by Clodéric Mars, they often use software visualization or modeling tools to examine outcomes against major factors, such as gender, or to test the algorithm against controlled data sets.

Chuck Howell, MITRE’s Chief Scientist for Dependable AI, observes that developers often look at which data would reverse an outcome (e.g., which data would have provided the wife with the same Apple Card credit limit as her husband). In the Apple Card example, if the answer was a male given name (i.e., “Bob” instead of “Sue”), the developers would know to look for gender bias. If another answer is to raise the wife’s salary in the application by $50,000, then additional analysis may be needed regarding comparative salaries and/or relationships between multiple factors that may be responsible for reaching this result.

Some factors may be less obvious. For instance, when Amazon used an AI ML system to analyze qualities about their high performers and to compare the profiles with the those of job applicants, the system provided a set of recommendations that was biased towards men because most of the high performers were men. However, because most high performers were male, there were no instances in this dataset of high performers who had attended all-female colleges. For new job applicants, the fact that they had attended an all-female college was thus given a lower score than if they had attended a male or mixed-gender college. This limitation in the data could be rectified, possibly by adding in data on graduates of all-female colleges, even though it does not represent Amazon’s experience.

However, for many AI ML decisions and for many other kinds of products, the situation is more complicated. As Howell points out, not all AI ML systems or other products can produce explanations understandable to the intended user. For instance, a system might identify the data from MRIs that was used to make a certain determination. However, the highly mathematical output would not be understandable by most patients or even by most doctors. It would require professionals with specific training to interpret it.

In addition, there is often a need for a system to access real-world knowledge, such as that gender discrimination in today’s world is viewed negatively by society and prohibited by law. DARPA sees the next phase of XAI as one “where machines understand the context and environment in which they operate.” In the meantime, humans need to work in close collaboration with machines in order to develop systems that can be explained by human and/or machine to the end user. In addition, developers, social scientists, and decision analysts need to better understand the kinds of explanations that would be acceptable and helpful to a user. They also need to better understand the conditions under which a user would require or not require certain types of information, and the conditions under which a user would accept an explanation or lack of explanation from a system, a human, or some combination.

Dr. Jennifer DeCamp is a principal AI engineer for The MITRE Corporation. In this position, she works across the U.S. government and across international standards bodies to AI practices and tools.

MITRE’s mission-driven teams are dedicated to solving problems for a safer world. Through our public-private partnerships and federally funded R&D centers, we work across government and in partnership with industry to tackle challenges to the safety, stability, and well-being of our nation. Learn more about MITRE.

Managing Language Chaos

What Does It Mean for Artificial Intelligence to Achieve Parity with a Human? A Case Study of Neural Machine Translation

Consequences and Tradeoffs—Dependability in Artificial Intelligence and Autonomy

Building Smarter Machines by Getting Smarter About the Brain

Cultural Challenges in Data Science

Jun 29, 2020

Blog, Machine Learning, Artificial Intelligence, and Data Science