Real-Time Recommendations, AI Healthcare Skeptics, and DNN Compression
Recommending in Real-Time
An increasing gap can be observed in the Machine Learning community between those who serve real-time predictions and those who don't
If you have an account with one of the big tech providers (Facebook, Netflix, Twitter, YouTube, Instagram, Amazon, etc.), you have been the subject of a real-time recommendation. What does that mean? Serving predictions in real-time (commonly vulgarized as online predictions) means that your Machine Learning system receives live information from a front-end service and computes the prediction on-the-fly. This live computation differs from standard methods, where predictions are computed in advance.
There is increasing interest in the ML community surrounding real-time solutions: how they work, when to use them, and how big tech has them set up.
Let's back up a bit and imagine we are serving product recommendations in an online store. The simple way of serving predictions for an online system is in batches. On a regular time-frame (i.e. daily), recommendations are computed for every user and stored in a key-value database. When a user connects for shopping, the website service will fetch the latest stored recommendation for that user using his key. This technique, which computes predictions regularly and in advance, is called batch recommendations.
What happens when the context of your user's visit is important to the recommendations you want to give? What if the user usually shops for furniture, but today their needs are different: they want clothes? From the first interactions with the website (e.g. a search for 'hoodie', opening a 't-shirt' item page), we want our service to modify the recommendations and tailor them to this new context. If we continue giving recommendations for couches and drawers, we are risking losing the user's (1) attention or, even more importantly, (2) business to a competitor. In this scenario, we need context-specific online prediction capabilities.
How is this usually handled? There are three main methodologies to make online prediction feasible, explained in more detail here:
- Make your model faster (e.g. fusing operations, distributing computations, memory footprint optimization, writing high-performance kernels that target specific hardware, etc.)
- Make your model smaller using model compression techniques such as quantization and pruning (c.f. a list of open-source solutions)
- Make hardware faster
For more information on how big tech serves predictions online:
- Instagram's Explore recommender system
- Deep Neural Networks for YouTube Recommendations
- Tencent Real-time Recommendations
- Alibaba's Recommendation system
- Netflix: Using Navigation to Improve Recommendations in Real-Time
For other resources concerning recommendations in the applied Machine Learning setting (Amazon, TripAdvisor, Yahoo, Spotify, Dropbox, LinkedIn, and many others), click here.
Why it matters
When do I need online predictions? As mentioned above, real-time recommendations are essential when the customer journey is mission-centric and/or highly depends on its context. Some examples are online shopping, content consumption (Netflix and YouTube highly rely on the context of your visit to keep your attention), voice assistants, fraud detection, etc. Another interesting case is for cold-start customers: when you don't have historical purchases or behavior for a new user, using the visit's context is essential to give good recommendations, and hence keep that user's attention.
Speaking of your attention, as has been explained in The Social Dilemma: big tech companies want it. Research shows that prediction latency matters, a lot. For that reason, online giants are spending a lot of time and money making their systems more efficient to serve better predictions, online, fast.
On the other hand, most recommendation systems used by small to medium companies continue to serve predictions in batches. It's more cost-efficient and displays great performance in systems that are not mission-critic. For example, a B2B recommendation engine that helps increase sales reps' productivity by recommending products to their existing client base, which can increase sales by 10% or more. The buying frequency is manageable and the recommendations tend to not depend on the user's (i.e. the sales rep's) mission. Click here to assess the potential of a B2B recommender in your business.
As outlined here, the other blockers of setting up online predictions are (1) high initial infrastructure investment, (2) mental shift, (3) python incompatibility, and (4) a significantly higher processing cost.
Healthcare is AI's Largest Skeptic
Biased datasets and lack of interpretability are holding AI innovation back in the healthcare sector
As AI solutions are being deployed left and right, the healthcare sector stands out as a clear laggard. At first, this might seem strange. In fact, the technology needed to solve an immeasurable amount of potential use cases exists. From image classification and segmentation to recommendation systems, AI researchers have found many theoretical solutions to existing pain points. Learn more in an overview of AI applications in healthcare.
Recently, Rachel Thomas, the Founding Director of the Center for Applied Data Ethics at the University of San Francisco and co-founder of fast.ai, published an article entitled "Medicine's Machine Learning Problem" in the Boston Review. The piece explains the discrepancies between medicine and other sectors regarding the implementation of Machine Learning systems.
While the algorithms and techniques are very similar, the key difference is that healthcare data is human data. Dr. Inioluwa, an AI researcher, explains it nicely in a report on the data-ification of reported COVID-19 cases and deaths:
"Data are not bricks to be stacked, oil to be drilled, gold to be mined, opportunities to be harvested. Data are humans to be seen, maybe loved, hopefully taken care of. Data science is human subject research."
Right now, Machine Learning research shows the potential for many different use cases: identifying at-risk individuals, automated contouring systems for medical images, and many others. Unfortunately, human biases creep into such systems far too often. Most commonly, the existing racial or gender biases found in medical datasets are not properly eradicated before training time. For example, a study has revealed rampant racism in decision-making software used in US hospitals.
Dr. Thomas urges we take 5 principles to heart:
- Medical data can be incomplete, incorrect, missing, and biased.
- ML systems can contribute to the centralization of power at the expense of patients and health care providers alike.
- ML designers and adopters must not take new systems onboard without considering how they will interface with a medical system that is already dis-empowering and often traumatic for patients.
- ML must not dispense with domain expertise, and we must recognize that patients have their own expertise distinct from that of doctors.
- We need to move the conversation around bias and fairness to focus on power and participation.
Why it matters
The potential for implementing AI systems in the healthcare sector is immense. The human factor surrounding this implementation is essential, which is indubitably the biggest adoption barrier. Furthermore, the issues discussed above point the needle to a more general problem - bias creeping into Machine Learning models.
For more information about bias in Machine Learning:
- Google has a free Crash Course about Fairness, it explains how human biases creep into our models as well as how we can identify and address them.
- fast.ai offers a free Practical Data Ethics course taught by the founder Rachel Thomas herself.
- A Deeper Look at Dataset Bias by researchers from U of North Carolina, EPFL, U of Rome, and KU Leuven.
A study on the state of AI in 2020 by McKinsey shows that only 13% of organizations are working to mitigate risks of equity and fairness. To address these issues not only in healthcare but across industries, we need to raise awareness of the dangers of data-ification and bias. Whether by promoting blog posts or enrolling in specialized courses, the Machine Learning community has a long way to go in addressing different types of biases.
Don't compress your DNN!
A research team from Google Brain investigates the bias in compressed Deep Learning models
An increasing amount of AI solutions are deployed on resource-constrained devices such as mobile phones and IoT sensors. This shift to the edge has popularized the use of pruning and quantization to reduce the original model size significantly. Usually, the constraints of edge devices are related to either power and/or latency. The good news is that the compression of Deep Neural Networks (DNNs), when optimized following a specific hardware platform or using a programmable approach like NVIDIA's Condensa framework, is able to maintain a similar accuracy to the original model.
For more information on accuracy recovery algorithms for model compression:
- The LC Algorithm can find an optimal compression step of the model parameters independent of the learning task.
- ADAM-ADMM offers a systematic and unified framework for structured weight pruning.
- Discrimination-aware Channel Pruning allows you to choose channels that really contribute to discriminative power.
For more information on how and where compressed models are being used in industry:
- Compressing neural networks for image classification and detection by Facebook AI
- Model Compression for IoT Applications in Industry 4.0 via Multiscale Knowledge Transfer
A research team from Google has published a paper on the characterization of bias in compressed models. While compression is able to maintain high accuracy, researchers show that these models show a disproportionately high error rate on a small subset of examples. In that sense, the compressed model performs less well for underrepresented groups, otherwise known as bias.
In an older paper called "What Do Compressed Deep Neural Networks Forget?", the group evaluates the trade-offs incurred by compression. Notably, when global accuracy is maintained the model tends to do so disproportionately across classes. Depending on the model's task, this coincides with considerations of fairness, which is extremely problematic.
In the paper, the authors give these subsets for which performance decreases a name: Compression Identified Exemplars (CIE). Their methodology relies on training a model of varying compression to identify people with blond hair in the CelebA dataset (which contains the images of the faces celebrities). Comparing the performance of uncompressed versus compressed models, they were able to have a measure of how compression affects model bias within different subgroups (stratified by age and gender).
The results show that while a 95-percent pruned model had a global accuracy decrease of 1.5%, its decrease in accuracy for the 1 percent of least consistently labeled examples was of 12%. For more information about the Methods and Results, you can find the paper here.
Why it matters
Given the widespread deployment of compressed models and the shift to computing on the edge, having a good understanding of the effects of model compression is paramount. Correctly grasping the impact of these techniques on the performance of underrepresented features is essential to take potential bias into account. It is therefore to continuously audit compressed models and find ways to mitigate potential issues.
Should you compress your DNN?
Yes, you should, but be careful. Allowing on-edge devices to benefit from advanced analytics is essential for the democratization of AI. However, it is your responsibility as an ML practitioner to be aware of potential issues of fairness and model bias.
The authors propose the use of CIE as "a human-in-the-loop auditing tool to surface a tractable subset of the dataset for further inspection or annotation by a domain expert." In fact, by providing "qualitative and quantitative support that CIE surfaces the most challenging examples in the data distribution", using it as an auditing tool allows ML practitioners to identify and address potential issues before deploying their DNNs.
Furthermore, the important advances with regards to this topic have led to the implementation of a framework that uses a teacher-student learning paradigm to better preserve labels. The paper, published recently by researchers from the University of Utah, TU Kaiserlautern, and NVIDIA, aims to mitigate the bias and unreliability of standard pruning and quantization techniques.