2020 AI Recap, Global ML Community, and Mobile Object Detection
Artificial Intelligence in 2020
As we reach the end of the year, we look back at the major Artificial Intelligence milestones of 2020
This year has been AI's most exciting year yet. As of today, the implementation of Machine Learning solutions in global industries is still being lead by the early adopters. However, amidst the Coronavirus pandemic, people across the world are beginning to grasp the impact of the massive digitization that is to come in the near future. As the milestones of 2020 have shown us, the role of AI in this huge transformation is both promising and unsettling.
Tens of thousands of Machine Learning papers are published each year. Unfortunately, the clear impact each of them will have in real-life remains unclear. Meanwhile, the Machine Learning algorithms being run by tech giants (e.g. Apple, Amazon, Facebook, Google, etc.), whose real-world impact is immense, are developed behind closed doors. There remains quite a path to clear before the use of Machine Learning is democratized through industries and applications.
The most important language model yet: GPT-3
Despite its insane performance, GPT-3 has raised a lot of debate concerning the large monetary and environmental cost (it cost approximately $12 million to train GPT-3) of large language models as well as their tendency to produce biased outputs.
As a response to these inconveniences, researchers are starting to propose new Transformer-based methods such as Performers and Linformers. Their goal is to mitigate the lengthy training time while maintaining high performance.
AI for the good of society
As the adoption of AI increases and the understanding of ML Operations is refined, 2020 has seen many data-driven solutions for the good of society.
Whether it is finding ways to diagnose COVID using cough recordings, diagnosing tinnitus from brain scans, or solving the 50-year old protein folding problem: Artificial Intelligence clearly opened up some very interesting avenues for research.
The rise of the GANs: deepfakes
As working from home during the pandemic blurred the concept of time and space, state-of-the-art GANs started blurring out faces and replacing them to make indistinguishable fake content
Deepfakes aren't new, but they have seen some incredible advances this year. Exemplified by Jordan Peele making a fake Obama address and President Richard Nixon giving an alternate address about the moon landings, deepfakes are becoming increasingly more convincing.
AI Governance: the most important challenge yet
Tools driven by Machine Learning can be extremely powerful. In this sense, these solutions are double-edged swords. Luckily, consciousness about the dangers and potential misuse of Machine Learning has increased in the past year, from solving bias in datasets, evaluating the outcome fairness of specific models, all the way to regulating the type of data that can be used and tracked.
Recently, we have seen BMW release its code of ethics and California passing the AB-730 bill designed to criminalize the use of deepfakes that give false impressions of politician's words or actions. Moreover, public debate about ethics in AI has seen a recent jump after Google fired one of their important AI ethicists, Timnit Gebru.
We are looking forward to the AI advances coming in 2021. Hopefully, as model performances increase, the democratization of Machine Learning solutions will in turn put more importance on accountability as well as highlight possible biases and ethical misuses of the technology.
We will keep covering important AI milestones in 2021. We will curate, evaluate, and publish the three most relevant topics every two weeks. Feel free to join us by subscribing to the digest using the form below!
The Launch of MLCommons
50+ Global technology and academic leaders in AI unite with objective to accelerate innovation in Machine Learning
Machine Learning is a relatively young field. Over the years, many actors have attempted to create standardized material to unify certain aspects, from modeling and testing libraries to deployment toolkits and data versioning software. Some of these attempts, such as the GLUE benchmark for NLP or the PapersWithCode initiative on arxiv, have been very well-received by the industry.
One of these attempts is MLPerf, a benchmarking tool for measuring the performance of hardware for Machine Learning tasks.
The founders of MLPerf have brought together an engineering consortium of companies, schools, and research labs to build open-source and standardized tools for machine learning.
This consortium, called MLCommons, includes representatives from Alibaba, Facebook AI, Google, Intel, Dell, Samsung, NVIDIA, and many others. The list of partnering schools mostly includes Universities with a global reputation for leading AI research such as U.C. Berkely, Stanford, Harvard, the University of Toronto, and others.
MLCommons will focus on three pillars:
- Benchmarks and Metrics that are able to compare ML solutions, software, and systems transparently.
- Publicly available crowd-sourced Datasets and Models to build new state-of-the-art AI solutions and applications.
- Best Practices to allow sharing models between teams globally.
Today, MLCommons already includes one project for each of these pillars.
With regards to benchmarks and metrics, MLPerf has become an industry-standard for evaluation training and inference performance across different infrastructures.
For datasets and models, MLCommons has released People's Speech, a large dataset containing 87 000 hours of speech in 57 different languages.
Finally, in the Best Practices category, MLCube is a set of common conventions that allow users to run and share models with anyone, anywhere.
Why it matters
Publicly available datasets and benchmarks have driven the majority of recent progress in the Machine Learning Industry. The production and maintenance of such resources are complex, expensive, and require input and feedback from many different actors. MLCommons takes on the challenge by bringing 50+ leading organizations together.
As David Kanter, the Executive Director of MLCommons states, “MLCommons is the first organization that focuses on collective engineering to build that infrastructure. We are thrilled to launch the organization today to establish measurements, datasets, and development practices that will be essential for fairness and transparency across the community.”
The global Machine Learning community is impatiently waiting for MLCommons next releases. Hopefully, these standardized tools and methods will spearhead innovative initiatives in the field.
“MLCommons has a clear mission - accelerate Machine Learning innovation to ‘raise all boats’ and increase positive impact on society,” states Peter Mattson, the President of MLCommons.
Simultaneous Face, Hand, and Pose detection on Mobile
Google AI has developed an all-in-one face, hand and pose detection solution for mobile using multiple and dependent neural networks
Some of the most important advances in Computer Vision revolve around the adequate detection of human behavior in images. Correctly detecting and tracking objects carry a lot of potential use case applications in various industries and in diverse steps of the value chain. It comes as no surprise that models that detect human pose, faces, and hand position share that characteristic.
MediaPipe, an open-source project by Google, offers cross-platform ML solutions for real-time media analysis. These ML solutions include standalone tasks such as Face Detection, Hair Segmentation, Object Detection, Iris detection, and many others.
Until recently, MediaPipe offered separate solutions for Face, Hand, and Pose Detection. Last week, Google researchers have released MediaPipe Holistic, a solution that combines all three tasks.
The consolidated pipeline integrates three separate models for Face, Hand, and Pose Detection. While each model uses the same input image, said input is processed differently to achieve optimal task-specific results. For instance, the face and hand detection models require image cropping before reducing the input's pixel resolution (which is done to allow real-time inference). In the end, the solution yields a total of 540+ key-points for each analyzed frame.
Each task is run in real-time with minimal memory transfer between inference backends. Furthermore, the solution is modular in the sense that it allows for component interchangeability depending on your device's hardware performance.
An additional note with regards to the solution's Model Card, a topic discussed in a previous Digest: this solution does not yet have a Model Card. However, as all other MediaPipe solutions do have one, it seems as if it is simply a matter of time for the Google Research team to add the relevant documents to the MediaPipe documentation.
Why it matters
Using real-time detection models in cross-platform applications enables a large variety of impactful use-cases. Some examples are sign language recognition, augmented reality effects, additional features in video-conferencing applications, fitness detection, and gesture control. Moreover, applications like this one prove the technical feasibility of integrating complex Machine Learning solutions in mobile- and edge-devices.
As stated by the researchers, "We hope the release of MediaPipe Holistic will inspire the research and development community members to build new unique applications. We anticipate that these pipelines will open up avenues for future research into challenging domains, such as sign-language recognition, touchless control interfaces, or other complex use cases."