Human-in-the-loop Systems, Algorithmic Auditing, and Multimodal Neurons
Using a human-in-the-loop system to iteratively deploy Machine Learning systems allows you to test it for robustness and scale to the production line intelligently.
The use of Machine Learning systems to automate a diverse set of tasks has exploded in recent years. This comes as no surprise, as the technological advances allow for processing of multiple and diverse input signals and retrieving insights for an impressive amount of different tasks. A survey by Landing AI reports that manufacturing companies are particularly interested in using vision-based learning systems for visual inspection. In fact, 26% of survey respondents reported that they are currently using such techniques for detecting defects on their products.
Putting machine learning models into production is not easy. How do you reproduce lab results on the production line? How do you split deployment in phases to avoid bad surprises?
These challenges are common. In fact, respondents to the above-mentioned survey report that "achieving lab results in production" is one of the top challenges for AI projects.
As the data acquired from live production lines can display unexpected phenomena, the AI model developed in the lab might not generalize to those scenarios. When a new defect arises, the lighting conditions vary during on a particularly sunny day, or a new color is introduced to an existing product line, how will your model react?
The key to bringing Machine Learning models to production is agile methodology. Successful AI applications all share one common starting point: a clearly scoped and human-in-the-loop proof-of-concept that mitigates the incurred risks and costs. One should not bite off more than they can chew.
Working in iterative sprints rather than a single large push enables to control progress and tackle obstacles one at a time. For vision systems in manufacturing in particular, the following four-step approach has been an essential recipe for successful deployments:
- Iterate on a first model in the lab using an initial dataset.
- Run the model in a live production environment while letting human experts make all the decisions independently. The experts can observe the model's performance and track their disagreements. The goal is to analyze, label, and retrain the system with the new information. This step is essential to generalize the model's performance to the production setting.
- Allow the model to make decisions in the production environment. Make sure that a human expert is shadowing the model. As before, he will analyse, label, and retrain the system with the new disagreements.
- Scale the system. Allow the model to make decisions for high-confidence cases, all while putting aside difficult cases. Put in place a collaborative workflow to scale the system (e.g. one human expert for multiple production lines).
Why it matters
You can de-risk your AI project by taking an iterative approach to validate hypotheses and fix obstacles iteratively.
The human-in-the-loop recipe detailed above leverages the differentiators of both the Machine Learning model and the human experts alike. While models are consistent and effective at executing repetitive tasks, human experts are better at adapting to changing conditions and making judgments on difficult cases that fall outside of the norm.
Developing and deploying Machine Learning systems is difficult, for visual inspection or otherwise. It is therefore essential to use an iterative approach and implement a human-in-the-loop strategy as quickly as possible. The sooner the model is exposed to the conditions it will face on the production line, the quicker its performance can improve. Make sure your Machine Learning systems perform well in the lab and on the production line by putting the right processes in place.
The growing AI industry needs to scrutinize the algorithms that are starting to govern our lives. Is auditing them enough to make them fair?
For years, researchers and journalists alike have been writing about the dangers of relying on Machine Learning systems to make weighty decisions: who gets locked up, who gets a job, who gets a loan—even who has priority for COVID-19 vaccines.
As companies shield their algorithms from the public sphere, there is no way of knowing whether their solutions perpetuate potential biases found in the datasets they acquire.
This is where algorithmic auditing comes into play. The goal of this practice is to look at a Machine Learning solution's code, data, and results to different stimuli in order to assess biases. Sometimes, this audit is also extended by interviewing the engineers that developed the system. The practice allows companies to investigate whether a system's training data is biased and create hypothetical scenarios to test effects on different populations. While the goal is to improve the algorithms internally, these types of audits are often broadcasted to the public.
Algorithmic auditing was placed into the limelight recently when HireVue faced criticism that the algorithms it used to assess candidates through video interviews were biased. HireVue is a popular hiring software company contracted by the likes of Goldman Sachs and Walmart.
To respond to the allegations, the company got its algorithms audited to publicly claim that the software's predictions "work as advertised with regard to fairness and bias issues". To be completely transparent, the full report was made available online.
HireVue made concrete changes to its process by eliminating video from its interviews. Despite these efforts, the company was widely accused of using the audit as a PR stunt. In fact, algorithmic auditors were also quite displeased with their public statements on the audit. More specifically, “in repurposing ORCAA's very thoughtful analysis into marketing collateral, they’re undermining the legitimacy of the whole field,” states Liz O’Sullivan, co-founder of an AI explainability and bias monitoring startup called Arthur.
Why it matters
Algorithmic auditing is an exceptionnally useful tool when used correctly. All ML systems whose decisions will have an impact on people should be audited for different types of biases. However, the lack of industry standards or regulations makes it easy for companies to abuse auditing.
Companies might use them to make real improvements, but they might not. They are not held accountable and there is no guarantee companies will address the potential problems within their algorithms.
“You can have a quality audit and still not get accountability from the company,” said Inioluwa Deborah Raji, an auditor and research collaborator at the Algorithmic Justice League. “It requires a lot of energy to bridge the gap between getting the audit results and then translating that into accountability."
Bias in facial recognition is relatable—people can see photos and the error rates and comprehend the consequences of racial and gender bias in the technology. However, it becomes much harder for the public to understand potential bias in something like interest-rate algorithms.
“It’s a bit sad that we rely so much on public outcry,” Raji said. “If the public doesn’t understand it, there is no fine, there’s no legal repercussions. And it makes it very frustrating.”
“Much like drug testing, there would have to be some type of agency like the Food and Drug Administration that looked at algorithms,” states Mutale Nkonde, founder of AI For the People. “If we saw disparate impact, then that algorithm wouldn’t be released to market.”
CLIP, a recent state-of-the-art model by OpenAI, contains neurons that respond to the same concept whether presented literally, symbolically, or even conceptually.
The complexities of the human brain are far from being completely understood. An important milestone, however, is the 2005 discovery of invariant visual representation by single neurons, also known as multimodal neurons. Responding to clusters of abstract concepts centered around a common high-level topic, neuroscientists believe these neurons to be essential in the consolidation of abstract memory.
A new model, CLIP, published by OpenAI earlier this year, is reported to show multimodal neuron behavior. The OpenAI team has for instance identified a "Spider-Man" neuron. The peculiarity of the neuron is that it responds to images of a spider, images of the text "Spider", and various visual representations of the comic-book character himself (conceptual drawing, photograph, with or without costume, etc.).
CLIP is a neural network that is trained to learn visual concepts from natural language supervision. It's task is to classify images provided a zero-shot list of categories to be recognized. It's results are very impressive, most particularly on difficult datasets such as ObjectNet, ImageNet Sketch, and ImageNet Adversarial.
The recent discovery of multimodal neurons in CLIP is a promising step forward for the Deep Learning community. In fact, it shows that there may be a common mechanism for abstraction in both natural and synthetic vision systems.
The types of multimodal neurons that exist in CLIP are various and diverse. From region and person neurons to fictional universes and brand neurons, the abstraction capability of these neurons is impressive. You can find more extensive and interactive visualizations on the OpenAI blog post or their publication in Distill pub.
While the discovery of the neurons is quite interesting in and of itself, the research team from OpenAI reported some very interesting findings relative to specific types of feature-neurons. More specifically, in-depth discussions about person and emotion neurons.
Why it matters
As you might imagine, a large numbers of these neurons deal with very sensitive topics, ranging from concepts associated to religions all the way to how mental illness is perceived through emotions. In fact, some neurons are explicitly representing protected human characteristics such as age, gender, race, religion, and even sexual orientation.
As such, and following that the model was trained using a curated dataset taken from the internet, it is likely that these neurons reflect prejudices views and biases. In fact, the research team reports that there seems to be a "terrorism/islam" neuron that responds to images of words such as “Terrorism”, “Attack”, “Horror”, “Afraid”, and also “Islam”, “Allah”, “Muslim”. Similarly, Latin American countries are selected by the "illegal immigration" neuron.
The points mentioned above are massive warning signs for a wide range of possible biases. How do you put such a model into production if it might have adverse effects on specific subgroups of people through it's decisions? On the other hand, it presents a unique opportunity to study the internet as an organism. More specifically, how learned representations of its vast pool of information relate to each other.