Home
Services
Services
Develop your AI maturity with tailored services
01
AI & data strategy
02
Data engineering
03
Educational workshop
04
Tailored ML solutions
05
AI team extension
06
ML deployment & operations
Industries
01
Chemicals
02
CPG
03
Banking
04
Logistics
05
Manufacturing
06
Pharma
07
Utilities
Industries
Industries
Ensure the success of your AI transformation
01
Chemical
02
Consumer Packaged Goods
03
Financial Services
04
Logistics
05
Manufacturing
06
Pharmaceuticals
07
Utilities
About
Our valuesOur story
Our team
Blog
All postsAI & MLAbout VisiumCase studiesDigestTalks & Webinars
JOIN USCONTACT US
Posted on 
April 6, 2021
|
minutes

Interpretable X-rays, Causal ML, and Long-form QA

COVID-19 on X-rays: how can AI help?

A digital diagnostic tool using Machine Learning and cloud computing is able to read chest X-rays accurately and rapidly to help doctors identify, triage, and monitor COVID-19 patients.

Context

We are all too familiar with the impact COVID-19 has had in the past year. One of the key challenges for those working on the front-line is the identification and isolation of infected patients. While critical in controlling the virus' spread, this becomes a real challenge when testing procedures, hospital staff, and other resources are stretched thin by a wave of incoming patients.

To add onto the resource limitations found in many hospitals around the globe, suspected patients present a wide variety of different symptoms with varying levels of intensity. Some show undeniable signs such as coughing, fever, fatigue, aches, and pains; others have more mild cases or appear completely asymptomatic. As you know, the latter can still spread the virus unknowingly to others—posing a huge threat to their environment.

What's new

A number of reputable hospitals, screening centers, and clinics around the world are increasingly turning to fast and accurate chest X-ray exam tools that incorporate Machine Learning algorithms to detect COVID-19.

One provider of such tooling is Lunit. Their solution, called INSIGHT CXR, is trained with 3.5M high-quality, clinically proven (using validation with additional CT-scans) X-rays. Approved for commercial sales in Europe, Australia, and parts of Asia, the solution supports the detection of 10 major chest diseases with 97-99% accuracy.

‍

The technology is also interpretable, which is essential for solutions in healthcare that aid doctors in their daily workflow. In fact, INSIGHT CTR generates location information of the detected lesions in the form of heatmaps. Furthermore, probabilities are generated concerning the detected lesions to indicate how abnormal they are. This allows trained radiologists to further investigate uncertain cases.

‍

An example
Source: Lunit

Why it matters

South Korean radiologist Dr. Kyu-mok Lee commented on the analyses made by the system. “An X-ray is a compressed two-dimensional rendering of three-dimensional human structures. Inevitably, organs and structures overlap in the images”, he said.

X-rays only appear in black and white, which means that there are cases where lesions aren’t noticeable to the human eye. Lunit’s solution has the advantage of displaying lesions in vivid color.‍
–  Dr. Kyu-mok Lee, radiologist

He then states that “the reality for radiologists, especially in Korea, is that it’s impossible to invest a lot of time in reading each X-ray as they would have to read hundreds or thousands every day.”

The added value of Lunit's solution is that it helps medical staff make informed decisions more rapidly. Additionally, the patient's fate is not put solely in the hands of AI. Difficult cases are flagged by the system and followed up in more detail by specialized doctors.

What's next

The technology has already been adopted in South Korea, Thailand, Indonesia, Mexico, Italy, and France. It has proven particularly useful in reducing the workload in hospitals with many patients and few radiologists.

“By enabling more accurate, efficient, and timely diagnosis of chest diseases, Lunit INSIGHT CXR can help reduce the workload of medical professionals. In this way, they can bring more value to the patients in not only difficult circumstances, such as the current pandemic crisis, but in routine clinical settings as well.” states Brandon Suh, the CEO of Lunit.

The significant global deployments of the solution instill a lot of confidence in Lunit's developers. They believe it could have a larger role as an independent image reader to increase cancer detection in the future.

When can we expect causality in Machine Learning?

A new paper from researchers at Max Planck Institute, MILA, ETHZ, and Google Research discusses the intersection between the two fields of Machine Learning and graphical causality.

Context

Animal brains are intuitively strong at causal inference. Without being explicitly instructed to do so, animals learn from their environment by observation. For example, when we learn to play football, we understand that a player's leg's movement is what causing the ball to change direction, not the other way around. By forming underlying causal representations, we are easily able to answer interventional and counterfactual questions. For instance, where does the ball go if the player tilts his foot slightly upwards during the kick? What would happen if the ball flies a bit higher and thus isn't kicked by the player?

Machine Learning algorithms, on the other hand, have managed to outperform humans in very complex tasks occurring in extremely controlled environments such as chess. Using new Deep Learning techniques with huge amounts of data, these algorithms are able to transcribe audio in real-time, label thousands of images per second, and examine X-rays and MRI scans for disease indicators. However, they still struggle immensely with generalization to broader environments and simple causal inferences like the football example described above.

What's new

Published in late February, a paper by researchers from Max Planck Institute, MILA, ETH Zurich, and Google Research discusses the intersection between Machine Learning and graphical causality. The objective is to explore and find potential solutions to Machine Learning's lack of causality. Overcoming this problem could be key to solving some of the most important challenges in the field of Artificial Intelligence.

The use of causal models is so powerful because it allows to perform interventions and answer counterfactual questions.

“Machine Learning often disregards information that animals use heavily: interventions in the world, domain shifts, temporal structure — by and large, we consider these factors a nuisance and try to engineer them away, [...] In accordance with this, the majority of current successes of Machine Learning boil down to large scale pattern recognition on suitably collected independent and identically distributed (i.i.d.) data.”
-- Schölkopf et al., 2021
Source: Schölkopf et al., 2021

A straightforward question is: why is Machine Learning still using i.i.d despite its flaws?

The answer is short and simple, scalability. Pattern recognition at scale based on observational approaches can be very powerful. This means that when you frame your problem in a controlled setting, with strong compute and sufficient data (both quantitatively and qualitatively), you are bound to achieve relatively good results. It is no coincidence that the AI revolution comes simultaneously with the advent of high-speed processors and data availability.

As the environment grows in complexity, it becomes impossible to cover the entire distribution by adding more training examples. This is especially true in Reinforcement Learning applications. The following clip exemplifies this challenge, as it shows the Tesla autopilot crash into a overturned semi-truck (which probably never happened during training).

The key strength of causal models is that it allows you to repurpose previously gained knowledge for new domain applications. If you are a good football player, you are able to use skills learning in football such as running, team tactics, and ball passing strategy when introduced to rugby or handball.

I can already hear Machine Learning enthusiasts clamouring: "that's what we call transfer learning!". While extremely useful, transfer learning is limited to extremely narrow use-cases, most commonly the image classifier that is fine-tuned to detect more specific sub-classes of objects. In more complex tasks, such as learning video games, Machine Learning models need huge amounts of training (thousands of years’ worth of play) and respond poorly to minor changes in the environment (e.g., playing on a new map or with a slight rule change).

Why it matters

Broadly speaking, causal models can address the lack of generalization capability in Machine Learning.

“Generalizing well outside the i.i.d. setting requires learning not mere statistical associations between variables, but an underlying causal model,” the researchers write.

What's next

While the advantages of causal modeling are clear, it remains an uphill battle to implement these concepts in Machine Learning algorithms.

“Until now, Machine Learning has neglected a full integration of causality, and this paper argues that it would indeed benefit from integrating causal concepts.”
-- Schölkopf et al., 2021

The researchers discuss several challenges to the application of causal models with Machine Learning: “(a) in many cases, we need to infer abstract causal variables from the available low-level input features; (b) there is no consensus on which aspects of the data reveal causal relations; (c) the usual experimental protocol of training and test set may not be sufficient for inferring and evaluating causal relations on existing data sets, and we may need to create new benchmarks, for example with access to environment information and interventions; (d) even in the limited cases we understand, we often lack scalable and numerically sound algorithms.”

The promising signal is that these challenges are discussed and laid out concretely in papers like this one, slowly paving the way for future research in this domain.

Long-Form Question Answering benchmarks exposed

A joint paper from Google Research and Amherst demonstrates SOTA results on the KILT Long-form Question Answering benchmark, all while pointing out flaws in the evaluation system itself.

Context

As the field of Natural Language Processing (NLP) progresses, research teams are showing impressive results on tasks that seemed impossible just a couple of years ago. One of these tasks is open-domain Long-form Question Answering (LFQA). As is discernible in the task's name, the goal is to provide an elaborate paragraph-length answer given a question by retrieving relevant documents.

LFQA's younger brother, QA (open-domain Question Answering), has seen immense progress recently. The number of widely available datasets and concise benchmarking systems (e.g. SQuAD) are certainly responsible in part for these advances. Which brings us to the following question: are current benchmarks and evaluation metrics suitable for stimulating progress on LFQA?

What's new

Last week, researchers from Amherst and Google Research published “Hurdles to Progress in Long-form Question Answering”, a paper that will appear in NAACL 2021 (North American Chapter of the Association for Computational Linguistics).

The paper lays out the methodology used in their submission on KILT, a benchmark for Knowledge Intensive Language Tasks. While their submission tops the leaderboard on ELI5 (the only publicly available LFQA dataset), the authors posit that there are some flaws to the evaluation framework itself.

The model presented by the authors leverages two recent advances in NLP to achieve SOTA results:

  1. A sparse attention model such as Routing Transformer (RT), allowing for scaling the attention-based mechanism to long sequences.
  2. The RT model is able to reduce the attention mechanism complexity in the Transformer model from n2 (quadratic) to n1.5 (n being sequence length). Compared to models like Transformer-XL, it enables each token to attend to other tokens anywhere in the sequence, not only those in the immediate vicinity.
  3. A retrieval based model based on REALM with contrastive loss, aptly called c-REALM. This retrieval method "facilitates retrievals of Wikipedia articles related to a given query."
How it works
Source: Google AI Blog

For some examples of some LFQA pairs, refer to Google's blog postdiscussing the paper.

The evaluation framework (KILT) uses to metrics: (1) Precision (P-Rec) and (2) ROUGE-L

Despite their SOTA results, the authors point out some remaining issues with the KILT evaluation framework.

  1. There is little evidence that models are actually using retrievals on which they are conditioned.
  2. Trivial baselines such as input copying and random training set answer achieve relatively high ROUGE-L scores, even beating some models such as BART + DPR. This can be observed in the figure below, taken from Google's blog post.
  3. There is implicit train/validation overlap in the ELI5 dataset. In fact, some questions seem to be paraphrased versions of other questions, as shown below.
Trivial baselines get higher ROUGE-L scores than RAG and BART + DPR
Source: Google AI Blog
Some questions in validation are awfully similar to training questions
Source: Google AI Blog

Why it matters

Achieving SOTA results in LFQA is quite impressive, and is a promising step forward for the NLP community. However, as pointed out in the paper, there remain several issues in the current benchmarking framework for the task. For concrete advances to be made, there needs to be a fruitful environment that allows researchers to compare models on solid datasets by using relevant evaluation metrics.

What's next

As stated by the authors themselves: "We hope that the community works together to solve these issues so that researchers can climb the right hills and make meaningful progress in this challenging but important task."

‍

‍

Tagged:
No items found.
view All Posts
Subscribe
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Featured Posts
Digest
Data Cascades, AI for Football, and Protein Generation
Digest
AI in Manufacturing, Google Vertex AI, and Session-Based Recommendations
Digest
Green ML, Interpretable Cancer Detection, and Self-supervised Transformers
Digest
AI Regulations, Car Wreckognition, and External Data Copy
Digest
Melanoma Detection, Bank Customer Confidence and Welding Control
Tags
AI For Good
AI Governance
Document Management
Ethics in AI
Healthcare
Keynotes and Talks
ML in Production
Manufacturing
NLP
Probabilistic Programming
R&D
Retail
Sound AI
Stay Connected
Subscribe

Get new posts to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
More Posts

You might also like

AI & ML
AI in Financial Services: 10 questions you should ask before thinking about leveraging new technologies for AML- Fraud detection
Nov 8, 2021
 by 
Digest
Data Cascades, AI for Football, and Protein Generation
Jul 26, 2021
 by 
Digest
AI in Manufacturing, Google Vertex AI, and Session-Based Recommendations
Jun 15, 2021
 by 
AI & ML
Let’s Talk AI Strategy
Jun 4, 2021
 by 
Lucas Nottaris
Digest
Green ML, Interpretable Cancer Detection, and Self-supervised Transformers
Jun 1, 2021
 by 
Digest
AI Regulations, Car Wreckognition, and External Data Copy
May 18, 2021
 by 

We tailor state-of-the-art AI solutions for the world's best brands

Navigation
HomeAboutBlogDigestContactPrivacy policy
Services
AI & data strategyIdeation workshopTailored ML solutionsAI team extensionML operations
© Copyright 2022. All Rights Reserved.
Visium SA