Bio-inspired AI, COVID diagnosis, and NLP testing
The power of bio-inspired Artificial Intelligence
The imitation of the nematode's nervous system using only 19 neurons shows promising results in the context of autonomous driving
Deep Neural Networks perform incredibly when there is enough data to train them. Unfortunately, these models often don't generalize well or efficiently. Furthermore, they require heavy computational power to train hundreds of thousands of parameters.
Inspired from the nematode's nervous system, researchers developed a sparse recurrent neural network model called Neural Circuit Policies (NCP). The model first uses a small convolutional feature extractor to transform the camera's input into structured features. This is subsequently fed into the NCP network containing 19 neurons whose role is to output motor commands that control the car.
This system shows promising results considering the size of the network. The worm's neural system is minuscule but allows for locomotion, motor control, and navigation. These abilities are exactly what is needed for applications like autonomous driving. The authors state that "the system shows superior generalizability, interpretability, and robustness compared with orders-of-magnitude larger black-box learning systems".
Why it matters
Autonomous driving is an important challenge for AI to tackle. Further than combining technically complex systems, there are important ethical questions that can arise. For these reasons, robustness and interpretability are key factors for the potential widespread integration of autonomous driving.
Diagnose COVID-19 with Machine Learning
Scientists from Oxford have developed an extremely rapid Coronavirus diagnostic tool
The current testing framework for SARS-CoV-2 (more commonly referred to as Coronavirus) is mainly focused on viral testing. You've most probably already been subject to this type of test, it uses a nasal-swab to detects the virus' nucleic acid or antigen. An important drawback of this testing method is the response delay. Samples are sent to a laboratory where a method called PCR is performed. This protocol, followed by result extraction and communication to the patient usually takes between 24 and 72 hours.
Scientists from Oxford University have recently developed an extremely rapid diagnostic test, which can detect and identify different viruses (including SARS-CoV-2) in less than five minutes. The method uses images captured using a wide-field fluorescence microscope. The images are processed using adaptive filtering algorithms and analyzed using Machine Learning. More specifically, a Convolutional Neural Network is used to classify the image as containing SARS-CoV-2 or not.
While the method works considerably better for the Flu (85% accuracy), the results for detecting Coronavirus are promising (70% accuracy). Using state-of-the-art Computer Vision techniques could play an important role in speeding up viral testing.
It remains to be discussed how such a method could potentially be integrated in health systems around the world. Indeed, the integration and deployment of a Machine Learning project is often complex as it needs to take all parts of the data pipeline into account: extraction, aggregation, processing, analysis, and result communication.
Why it matters
Reducing result delay for viral testing in a pandemic scenario has the potential of having a massive impact on virus spread and contamination. Solving this problem by extracting the necessary information from images and a Convolutional Neural Network demonstrates the potential of data-driven techniques.
Re-inventing NLP model testing
Adding a human-centric approach to NLP testing is revealing flaws in the best state-of-the-art models
AI development is driven by benchmarks. Whether ImageNet for Computer Vision tasks or GLUE and SQuAd for Natural Language Understanding tasks, benchmarks have been instrumental in driving AI progress. By laying a solid basis for model performance comparison, researchers are led to improve models. However, a large part of benchmarks such as the ones mentioned above come with some flaws: they have artifacts, can be deceiving, are not human-centric, and are used for overfitting by researchers. As stated in Goodhart's law generalized by Marilyn Strathern, "When a measure becomes a target, it ceases to be a good measure".
Facebook has recently developed Dynabench, an online tool where users can try to fool language models. The goal is to gather human input dynamically to measure progress in NLP more accurately. This follows a trend of earlier efforts to test NLP models using human input such as Trick Me If You Can and Beat the AI from researchers at the University of Maryland and UCL respectively.
In a collaborative effort between Microsoft Research and the University of Washington, Checklist is a task-agnostic method for NLP model behavior testing. Inspired by typical testing methods from Software Engineering, researchers have developed a matrix for testing a large and diverse number of cases. It consists of three types of tests for a large array of different adversarial methods such as specific vocabulary, negation, semantic role labeling, fairness, and many others.
- Minimum Functionality Test (MFT) to target a specific behavior (similar to unit testing),
- Invariance Test (INV) for testing small perturbations that should not modify the result, and
- Directional Expectation test (DIR) for testing perturbations that should produce an expected result.
Some examples of these tests can be observed in the image below. The paper tests state-of-the-art models from Microsoft, Google, Amazon, as well as BERT and RoBERTa. As can be observed in the table taken from the paper, results show some alarming failure rates, even in the best models. Additionally, usage of the tool by researchers has proven to increase the number of tests performed and the amount of identified bugs.
A recent seminar hosted by Stanford ML Systems inviting the paper's first author can be found on YouTube. Furthermore, the Checklist repository is open source. (Pro hint: you can find an arXiv paper's code directly on the website since the recent addition of a code tab following a collaboration with PapersWithCode.)
Why it matters
Especially in NLP tasks, a human-centric approach to testing models is essential. For instance, users can use negation, specific entity names, and temporal vocabulary to attempt to trick modern state-of-the-art models. If a social network wants to implement the classification of hate-speech, it should be robust against adversarial statements using these techniques.