Unsupervised Diagnoses, Amazon Monitoring, & Probabilistic Programming
Using your Doctor's notes for disease detection
Researchers from Stanford proposes a method that uses text to generate features for its associated unlabeled image
Context
Helping doctors make key decisions using data-driven solutions is an important challenge. With image classification models aplenty in other fields, health science applications are struggling to augment their workflows using Machine Learning. In large part, this delay can be associated with a lack of labeled training data. In fact, medical images require a lot of domain knowledge for accurate labeling. Often, trained doctors and physicians are too busy with more high-stake and tangible tasks. In a previous digest, we discussed how synthetic data augmentation could help resolve such issues.
What's new
A team of researchers from Stanford University has come up with an interesting unsupervised alternative. The method, called ConVIRT, leverages the naturally occurring pairing of images and textual data to classify medical imagery.
In fact, the text reports accompanying medical images often contain very useful information about the image's contents. This information can be used to extract the class associated to the input, and this without any expert input whatsoever!

The authors built two separate pipelines: one for the textual input and another for the image. The NLP pipeline consisted of BERT variant. To compare the image encoding with the textual encoding in a consistent space, a single hidden layer was added to a ResNet-50. For more information regarding the specific architecture, a PDF version of the paper is available.
Why it matters
The proposed method was evaluated using four medical image classification tasks and two zero-shot retrieval tasks. The obtained results indicate that their method considerably outperforms strong baselines (ResNet-50 pre-trained on ImageNet and in-domain initialization methods). In fact, the method requires only 10% of labeled training data as ImageNet to achieve better performance.
This improved data efficiency is very promising as it could help alleviate the high cost of medical data labeling.
How Amazon monitors factory workers and machines
In its expansion into the industrial sector, the tech giant uses state-of-the-art AI to monitor factories
Context
A large majority of companies still rely on scheduled maintenance procedures to verify the state of machinery. This is done in order to reduce the occurrence of line outages and factory shutdowns, which can bring enormous inconvenience or product unavailability to the end customers.
Furthermore, in factories around the world, compliance regulations need to be upheld. Employees are often required to wear Personal Protective Equipment (PPE) and follow specific guidelines such as staying out of unauthorized zones, maintaining social distancing, etc. Most often than not, the misuse or manipulation of these regulations and guidelines lead to potentially costly and dangerous accidents.
What's new
Cloud computing leader AWS has developed hardware-reliant systems to monitor the health of heavy machinery and detect worker compliance.
The former relies on a two-inch, 50-gram sensor called Monitron. It can record vibration and temperature, which a Machine Learning model then uses to flag anomalous behavior.
Leveraging data-driven solutions to predict machine failure instead allows companies to replace or maintain their machinery during set maintenance windows. This way, machines don't break down at unexpected times. That way, there are no negative impacts on customers.

Amazon has been testing 1,000 Monitron sensors at its fulfillment centers near Mönchengladbach in Germany. Their new system is being tested to monitor conveyor belts handling packages.
AWS's second addition to its industrial product line is called Panorama. The system enables pushing Machine Learning models to the edge, connecting to pre-installed camera systems. This way, managers can automate the monitoring of workers. The system can detect misuse of or missing PPE, vehicles that are in unauthorized parking spots, the respect of social distancing measures, and so on.

A set of companies are testing AWS Panorama. Siemens Mobility said it will use the new technology to monitor traffic flow in different cities. Furthermore, Deloitte has stated that it was working with a major North American seaport to utilize the tool for monitoring shipments.
Why it matters
These new Amazon products demonstrate the benefit of using data-driven solutions in a factory setting. Furthermore, it shows that implementing end-to-end solutions is crucial to ensuring added value for AI solutions.
“This idea of predictive analytics can go beyond a factory floor,” Mr. Thill said. “It can go into a car, on to a bridge, or on to an oil rig. It can cross fertilize a lot of different industries.” said Matt Garman (AWS’s head of sales and marketing) speaking to the Financial Times.
What's next
While the new products have raised some concerns with critics, the advantages they bring are indubitable. The concerns are mostly linked to the fact the client company does not seem to have enough control over the Machine Learning models embedded in Monitron and Panorama. In fact, the capabilities seem extremely generalized. This is where AI providers such as Visium can provide solutions that are highly optimized to a client’s needs - all whilst using Amazon’s standardized and compliant hardware.
While Amazon ensures no pre-packaged facial recognition capabilities are embedded within Panorama, there has been debate about the ethical issues surrounding packaged monitoring systems in general. To mitigate this issue, Amazon relies on a defined list of terms and regulations to ensure that their systems are used solely for safety purposes.
Facebook AI's evaluation framework for Probabilistic Programming Languages
Facebook AI introduces a new benchmark called PPL Bench for evaluating Probabilistic Programming Languages on a variety of statistical models
Context
Using Probabilistic Programming Languages, statisticians and data scientists alike are able to formulate probability models in a formal language. Using a probability model allows you to perform Bayesian Inference by computing the posterior probability of an event. To be more specific, you are able to assess the probability of an event by using prior probabilistic belief given a set of observations.
The advantages of using such techniques combined with Machine Learning algorithms are multiple and diverse. First, you can aggregate similar behavior together (e.g. hierarchical structure in your dataset) to increase the accuracy of your model. Second, you are able to grow consistency and robustness by adding beliefs from professionals with expert domain knowledge. Finally, formulating Machine Learning problems using probability models allow you to leverage probabilistic output—taking into account uncertainty to assess risk.
What's new
Researches from Facebook AI have created an open-source benchmark framework for evaluating PPLs used for statistical modeling. PPL Bench has a dual objective: (1) evaluate improvements in PPLs in a standardized setting and (2) help users pick the right PPL for their modeling application.
Implemented in Python 3, PPL Bench handles Model Instantiation and Data Generation, Implementation and Posterior Sampling, and Evaluation. The modular workflow is explained graphically in the image below.

The Evaluation of the PPL implementations is done using several evaluation metrics.
- Predictive log-likelihood with respect to samples. This allows users to evaluate how fast each PPL converges to final predictions.
- Gelman-Rubin convergence statistic.
- The effective sample size is used to evaluate if there are any positive correlations between generated samples, which should theoretically not be the case and kept to a minimum in practical implementations.
- Inference time is used to evaluate the potential runtime of practical use cases.
Why it matters
Probabilistic Programming is a very powerful tool whose use has exploded in the last decade. Proposing an open-sourced evaluation framework for PPLs attempts to create a standardized mechanism for implementation comparison. Not only does it raise awareness and spark discussions, but it also allows users to pick the right PPL for their task at hand using data-driven insights following the most common PPL considerations.
What's next
As is stated by Bradford Cottel, Technical Program Manager at Facebook AI, "We hope that community contributions will help grow and diversify PPL Bench and encourage wider industrial deployments of PPLs."
Here are the relevant links to the paper and code.