Every day we are experiencing the incipient use of AI technology in our daily lives. There is an increasing concern on society about implications on AI that has inspired a wave of journalism and literature. However, it is still a challenge for data practitioners to identify or predict the harmful repercussions of their models before they are deployed or even when they are in production.
We’ve experienced big problems around AI biases along recent years, like Afro-Americans twice as likely as whites to be labelled a higher risk but not actually re-offended , or when women are 47% more likely to be seriously insured and 17% more likely to die than a man in a similar accident and even with the lack of accuracy with regards to females with darker skin tones.
Gartner predicted in 2018 that next year 2022, 85% of AI projects will deliver erroneous outcomes due to bias in data, algorithms, or the teams responsible for managing them. This is not just a problem for gender inequality – it also undermines the usefulness of AI.
EU Commission released las February 2020 the white paper “On Artificial Intelligence - A European approach to excellence and trust”. Saying that people should be able to trust AI. They mentioned ecosystem trust, meaning compliance with EU rules, including the rules protecting fundamental rights and consumers’ rights, for AI systems operated in the EU that pose a high risk. There are 7 requirements identified: Human agency and oversight, Technical robustness and safety, Privacy and data governance, Transparency, Diversity, non-discrimination and fairness, Societal and environmental wellbeing, and Accountability.
Trust in AI. Explainable AI (XAI).
XAI is about making AI understandable for humans. In that sense EU introduced in 2018 a right of explanation, this is also a common practice for Credit Scoring in US. However, some critics question why AI should follow higher standards than humans, for me the answer is simple: AI is easy to change/improve than humans. Maintaining trust on AI is the goal of Explainable AI.
The health AI support example.
Let us imagine the following hypothetical example: we’ve a great predictive model than could accurately early warn adverse opioid-related outcomes among Medicaid recipients even before their initial opioid prescription is written.
All good till now, right? Well, actually no. If we look at the features of the model, we can start to reflect about possible issues confusing correlation with causality. Turns out that Angela E. Kilby has identified algorithmic unfairness in this specific context. I quote: “We plug our machine-generated opioid risk scores into a quasi-experimental setting to investigate risk scores correlate with heterogeneous treatment effects, finding that they do an extremely poor job of generating a cohort stratification with high versus low heterogeneous treatment effects. Results suggest that if doctors were to reallocate prescribing to become even more in concordance with risk scores, it is unlikely that improvements in opioid use disorder outcomes will be superior than if the reallocation happened randomly. Perversely, rates of opioid use disorder might actually worsen.”
In this case the model audit has consisted of correlating model opioid risk scores with a quasi-experimental setting (heterogeneous treatment effects data set).
Looking for an AI Audit Framework?
Looking for an end-to-end framework on model accountability I found this paper: (Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing). I recommend specially the lessons learned from other industries section (aerospace, medical and finance).
This is a starting point, however the main limitation on an internal audit is to ensure objectivity, we all know that insider incentives are perverse. A regulatory framework is need, we can use as a reference the above-mentioned industries, were independent companies, national and international institutions.
If we use as a reference FDA, their primary focus is enforcement of law. In the picture below you can see the Drug Product Lifecycle. From prototyping to commercial, the FDA’s mission is to facilitate the development of the premarket review and evaluation of Investigations of New Drug and New Drugs Applications.
In a nutshell, a standardized approach to evidence-based review and evaluation. The FDA emphasizes the Quality System approach to design of studies by providing oversight and objective review by setting thresholds for product safety and effectiveness by ensuring that organized data and appropriate labelling are present in support of the new drug’s intended and clinical use.
We need more FDA’s, EMA’s or EASA’s… to ensure excellence and trust on AI. Still a long way to go. Don’t you think?
References:
Algorithmic Fairness in Predicting Opioid Use Disorder using Machine Learning. Angela Kilby, Northeastern University
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing
Comments