Driver Surveillance, Dropbox Previews, and Data Literacy
Ceiling-mounted surveillance cameras leverage Machine Learning algorithms to flag drivers misbehavior
As the economy sputters due to the adverse consequences of the coronavirus pandemic, business is booming at Amazon. The tech-giant has recently hired 400'000 workers to stow, sort, pick, pack, and deliver packages around the United States. Unsurprisingly, the 'stay at home' order is increasing the number of goods sold online.
Amazon has recently rolled out the use of Driveri by Netradyne for its US drivers. The system leverages cameras to flag driver behavior such as texting, not using their seatbelt, speeding, ignoring stop signs, and many others. The full list can be found here. The solution is similar to Lytx DriveCam, the intelligent dashboard camera system used in UPS trucks.
The cameras record drivers 100% of the time. The goal is to improve safety in the Amazon delivery network. For instance, drivers can also use the system to document potentially problematic events such as a person approaching the vehicle or an inaccessible delivery location.
The system's provider, Netradyne, states that the system reduces collisions by two thirds. Leveraging continuous learning, the algorithms are periodically updated to become smarter with additional use.
Why it matters
Some drivers state that the system violates their privacy. In fact, Amazon is known for pressuring drivers to make deliveries faster. An interesting article in The New York Times highlights how the company avoids all responsibilities for the human cost of pushing fast shipment. While many people believe that in-car surveillance is intrusive, others point out the advantages of reducing human error by assisting drivers on their stressful delivery routes.
AI systems carry great potential to automate performance audits. Furthermore, when configured correctly, they can alert drivers to prevent accidents. On the other hand, the invasive nature of some systems raises questions regarding the right to privacy. It remains important to create and distribute augmented intelligence solutions that empower users instead of dehumanizing them.
Dropbox is saving $1.7M a year on document previews
Cannes is Dropbox's new Machine Learning system that allows replacing $1.7M of infrastructure cost into $9k of ML infrastructure
Dropbox is a file storage company that offers cloud storage, and file synchronization for individuals and enterprises. One of its features is that it provides previews for 100+ filetypes. The system behind the feature, called Riviera, pre-generates and caches previews for large files. For instance, it might rasterize a page from a multi-page PDF document to show a high-quality preview in the Dropbox web app. Due to the immense amount of processed data (tens of petabytes per day), the computing costs of this pre-generation (which Dropbox calls pre-warming) are considerable.
Dropbox saw an opportunity to reduce the infrastructure expense related to pre-warming by using Machine Learning. The project is called Cannes, "after the famous city on the French Riviera where international films are previewed." By predicting what files should be pre-warmed and which should not, Dropbox can reduce costs without damaging customer experience.
The line between cost savings and user experience degradation is a fine one. When a file is incorrectly classified as no pre-warm needed, Riviera has to generate the preview on the fly (as it was not cached!), leaving the user to wait for the result to appear in app. To account for this tradeoff, the Riviera team set guardrails to prevent degrading user experience.
Another important trade-off that Dropbox took into account was the one between model complexity and interpretability. Before trying to increase performance with complex models, the Cannes team focused on saving costs in a fast and interpretable manner. The v1 showed predict previews with >70% of accuracy, all while reducing the amount of pre-warmed requests by 40%. Deploying an easy, fast-to-train, and low-cost solution to 1% using A/B testing allowed them to validate their offline results. The online results were in line with the offline ones and the cache hit rate dropped only by a couple of percentage points.
Deploying a Machine Learning model in production requires tracking some specific metrics in addition to the model performance (i.e. confusion matrix). In fact, prediction serving infrastructure metrics (e.g. availability, data freshness) as well as preview metrics (e.g. preview latency distribution and cache hit rate) from the A/B test's holdout group. All these metrics are calculated hourly, visualized, and tracked using Apache Superset. This allows the Cannes team to be notified when model behavior shifts before it can cause a significant impact on user experience. The team has furthermore given clear instructions to non-ML teams about how they can identify Cannes vs. non-Cannes issues. If a Cannes issue does occur, there is also a clear path of escalation to mitigate the impact of the failure.
Why it matters
Lessons can be learned from Dropbox's problem-solving approach for cost savings with regards to their Riviera system. Cannes shows that with the right trade-offs in mind as well as a solid validation and deployment plan, relatively simple Machine Learning models can create immense business value. In fact, they were able to replace an annual estimated $1.7M in pre-warming costs with $9'000 in yearly ML infrastructure. Furthermore, it demonstrates that it is possible to cut costs significantly, all while keeping customer experience and SLAs in check.
As stated by the Cannes team: "There are many exciting avenues to explore for the next iteration of this project. There are more complex model types we can experiment with now that the rest of the Cannes system is in production. [...] Another new Previews application we’ve discussed is using ML to make predictive decisions more granular than a binary prewarm/don’t-prewarm per file. We may be able to realize further savings by being more creative with predictive prewarming, reducing costs with no deterioration to the file preview experience from the user’s perspective.
We hope to generalize the lessons and tools built for Cannes to other infrastructure efforts at Dropbox. ML for infrastructure optimization is an exciting area of investment."
Building data literacy within your company
MIT Sloan sheds light on how to build data literacy, which has become an essential in-demand skill in recent years
One of the top three barriers in building strong data analytics teams is data literacy, as reported in a recent Chief Data Officer survey in Gartner. Along with the cultural challenge to accept change as well as the lack of resources to fund innovation, sharing a common understanding and language around data, analytics, and Machine Learning is key for developing AI maturity within your company. A different report from Accenture indicates that only 21% of employees (of the 9'000 surveyed) are confident with data literacy.
MIT Sloan School of Management has recently published an interesting article entitled How to build data literacy in your company. The article indicates that as data is becoming a modern currency, it is essential for companies to have a data-literate workforce.
In their paper Approaches to Building Big Data Literacy, MIT professor Catherine D’Ignazio and collaborating researcher Rahul Bhargava describe data literacy as being able to:
- read with data, which means understanding what data is and the aspects of the world it represents.
- work with data, including creating, acquiring, cleaning, and managing it.
- analyze data, which involves filtering, sorting, aggregating, comparing, and performing other analytic operations on it.
- argue with data, which means using data to support a larger narrative that is intended to communicate some message or story to a particular audience.
As you can imagine, different roles employ data literacy in different ways. These differences can be identified using data literacy personas. From the skeptics and the citizen analysts to the enthusiasts and data-driven executives, you can evaluate your data literacy using this assessment.
For companies to become more AI mature, they must understand why data literacy is needed, and how to teach it to their workforce with a data literacy plan. An agile data literacy development loop is proposed with the following steps.
- Data literacy definition: explicit what skills your employees should learn and the level of proficiency for each role.
- Literacy assessment: agree on a method for evaluating data literacy in your organization.
- Stratified learning: make resources available for your employees to improve their data literacy skills in a stratified manner.
- Communication: involve leaders, build a culture of learning, and reward curiosity.
- Evaluation: assess employee's new skills and find points of improvement for the next loop.
Why it matters
Data is everywhere, and as such, people everywhere should become data literate.
While improving data literacy is a lot of work, data training is worth it. “In a world of more data, the companies with more data-literate people are the ones that are going to win,” MIT Sloan senior lecturer Miro Kazakoff stated.
“Data literacy has always been a requirement in successful organizations. It's just that data illiteracy is more obvious now — or data illiteracy just causes more damage now than it used to,” he continued.
Need help building data literacy in your company? Assess your organisation's AI maturity using our AI maturity assessment.