AI from Home, RL for MRI, and Death of Convolutions
Transition to remote-first seamlessly with AI
In light of the diverse, complex, and dynamic restrictions in place as an effort to combat COVID-19, companies have globally pushed towards allowing working from home permanently. Official statements from Microsoft, Shopify, Coinbase, Twitter, and many others show a global trend of converting to remote-first.
To help this transition, Headroom, an AI-powered video-conferencing start-up that just raised a seed round of $5 million, aims to leverage Computer Vision and Natural Language Processing techniques to optimize video quality, make the current speaker feature more inclusive, and provide transcripts, summaries, as well as other interaction analytics.
Another AI startup, Exer Labs, uses state-of-the-art pose detection to optimize your workout form. It offers apps like Exer Plank and Exer Physical Therapy to give real-time feedback on your form, track your reps, and give workout summaries. Moreover, it just secured $2 million in funding to bring their newest addition, Exer Studio, to market. The new tool is added to video conference meetings and optimizes virtual fitness classes by tracking performance and letting you compete with the rest of the class.
Finally, Google has recently been working on real-time sign language detection for video conferencing. This new model, based on PoseNet and optical flow detection with LSTM architecture, allows users who communicate using sign language to become the active speaker when they are signing. The research is a testament to Google's belief that video conferencing applications should be accessible to everyone.
Why it matters: A growing number of companies are allowing their workforce to work remotely permanently, even after the pandemic. While working from home has some clear advantages, there are some pit-falls, especially with regards to teamwork and communication. AI has a major role to play to make current communication techniques smarter, more inclusive, and more adapted to the modern workspace.
Reinforcement Learning accelerates MRI scanning
In August of 2018, Facebook AI Research and the NYU School of Medicine launched fastMRI, a collaborative research project whose goal is to investigate making MRI scans faster using Artificial Intelligence. More recently, they have published research demonstrating that the use of Reinforcement Learning has the potential to accelerate personalized scans (video below).
MRI scanners usually scan k-spaces, which can be seen as the building blocks of a scan, sequentially. These building blocks are then used to reconstruct the final image. Researchers use Reinforcement Learning to actively modify the sequence of k-spaces scanned as an effort to get a high-quality image with fewer k-spaces, effectively reducing scanning time.
As an analogy, imagine reading only a specific set of chapters in a book instead of reading it cover to cover. You start by reading a chapter, and an intelligent system tells you what chapter to read next to extract the most useful information, and so on until you have retrieved the needed information. The chapters are chosen optimally so that you get the key understandings, all whilst drastically reducing the reading time.
In more technical detail, the researchers framed the sequential problem as a Partially Observable Markov Decision Processes (POMDP) and utilized the state-of-the-art deep reinforcement learning method known as Double Deep Q-Networks (DDQN).
This milestone for fastMRI ties into the growing trend of using AI in medicine to make treatments and diagnostics more personalized. As a testament to opening research to the community, Facebook AI has released the reinforcement learning environment used in the paper.
Why it matters: While MRI is a fundamental tool of modern medicine and is key for diagnostics of a multitude of complex diseases, it is hindered by the time it currently takes to get a scan. Reducing scanning time to extract patient-specific key information will allow practitioners to serve more patients per day, expand the use of MRIs, and reduce patient discomfort.
The video shows two active MRI acquisition trajectories. The top uses the baseline, while the bottom uses our reinforcement learning approach. Left to right: acquired k-space columns, ground truth image, reconstruction from partial k-space measurements, and error map. Note how the reinforcement learning approach achieves higher-quality reconstructions earlier than the baseline. (Source: fastMRI data set)
The death of convolutions?
Transformers, a type of deep learning architecture designed to handle sequential data, have been subject to a popularity boost with the release of GPT-3 in May of 2020. The autoregressive language model developed by OpenAI can produce stunning examples of human-like text.
Subsequently, researchers have been applying Transformers to Computer Vision tasks. Spearheaded by the release of detection transformers (DETR) by Facebook AI in May and Image GPT by OpenAI in June, the use of transformers aims to solve the flaws of Convolutional Neural Networks (CNNs). The industry standard for the large majority of all computer vision tasks in the past 10 years, CNNs are known to lose a lot of valuable information in pooling layers, and thus ignores potentially important relationships between specific parts of the image and the whole. Recently, an anonymous paper submitted to ICLR 2021, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", has gained a lot of traction and will potentially mark the next significant milestone in computer vision. The highlights of the review, which is currently under double-blind review, are that (1) it obtains high accuracy with less computation time for training, (2) its model architecture does not include a convolutional network, and (3) it can encode small patches efficiently. Furthermore, it is able to pay ‘attention’ to the whole image in shallow network depths, unlike CNNs.
Why it matters: Convolution Neural Networks are a great tool for Computer Vision tasks, but they require extremely long training times for data-heavy tasks. Furthermore, they lose information by considering only the local pixel environment in the first layers. Obtaining high accuracy by using only Transformers, thought of mainly as an architecture for NLP, is a huge milestone in Computer Vision, and will pave the way for many applications to come.