Copy
Twitter
ML Digest: OpenAI on Large Scale Reinforcement Learning.
Welcome to this week of the Best of Machine Learning Digest. In this weekly newsletter, we resurface some of the best resources in Machine Learning posted in the past week. This time, we've gotten 47 submissions, including 4 papers.
We are looking for helping hands! Get involved!

Papers

This week, 4 Papers were posted on Best of ML. In the following, we're showing you the Top posts of this week.
Dota 2 with Large Scale Deep Reinforcement Learning
 
On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
 
Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance.
 
Neural Voice Puppetry: Audio-driven Facial Reenactment
 
We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples. Our method is not only more general than existing works since we are generic to the input person, but we also show superior visual and lip sync quality compared to photo-realistic audio- and video-driven reenactment techniques.
 
Common Voice: A Massively-Multilingual Speech Corpus
 
The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and development. Common Voice is designed for Automatic Speech Recognition purposes but can be useful in other domains (e.g. language identification). To achieve scale and sustainability, the Common Voice project employs crowdsourcing for both data collection and data validation. The most recent release includes 29 languages, and as of November 2019 there are a total of 38 languages collecting data. Over 50,000 individuals have participated so far, resulting in 2,500 hours of collected audio. To our knowledge this is the largest audio corpus in the public domain for speech recognition, both in terms of number of hours and number of languages. As an example use case for Common Voice, we present speech recognition experiments using Mozilla's DeepSpeech Speech-to-Text toolkit. By applying transfer learning from a source English model, we find an average Character Error Rate improvement of 5.99 +/- 5.48 for twelve target languages (German, French, Italian, Turkish, Catalan, Slovenian, Welsh, Irish, Breton, Tatar, Chuvash, and Kabyle). For most of these languages, these are the first ever published results on end-to-end Automatic Speech Recognition.
 

Blog Posts

This week, 32 Blog Posts were posted on Best of ML. In the following, we're showing you the Top 2 posts of this week.
Bias in Machine Learning: How Facial Recognition Models Show Signs of Racism, Sexism and Ageism
 
Back in 2018, an article by the technology market research firm, Counterpoint, predicted that over one billion smartphones would be equipped with facial recognition by 2020. Today, Apple, Samsung, Motorola, OnePlus, Huawei, and LG all offer devices that feature facial recognition.

When we pocket our phones and step outside, public spaces are dotted with facial recognition cameras and hundreds if not thousands of retail stores use facial recognition cameras across the globe. Most large retailers are tight-lipped about their use of facial recognition for theft prevention, but articles like this confirm big names like Target and Walmart have already experimented with facial recognition in their stores.
 
Deploying Machine Learning Models as Data, not Code: omega|ml
 
The data science community is on a mission to find the optimal approach to deploying machine learning solutions. My open source framework for MLOps, omega|ml implements a novel approach by deploying models as data, increasing speed and flexibility while reducing the complexity of the tool chain.
 






This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
Monn Ventures · Winterthurerstrasse 649 · Zürich 8051 · Switzerland

Email Marketing Powered by Mailchimp