WHAT'S IMPORTANT
📍 OpenAI shared two consequential papers combining language and computer vision, DALL·E and CLIP:
DALL·E
DALL·E is a 12-billion parameter GPT-3 that generates images from text queries, named after Pixar’s WALL·E and Salvador Dalí. The generative model can manipulate and rearrange objects in a generated image and even create things that don’t exist. For example an armchair in the shape of an avocado.
🤖 Make sure to try the online demo in the OpenAI Blog (openai.com)
⚡️ Watch a short video summary by Two Minute Papers (8 min on youtube.com)
🐌 If you have the time, follow Yannic Kilcher for an extensive look through the accompanying paper (55 min on youtube.com)
OpenAI
CLIP
The second paper is CLIP, short for Contrastive Language–Image Pre-training, introducing a zero-shot image classifier. While previous supervised classification models show great performance with classes they already know, they don't generalize very well. If you show them an image of an object that wasn't in the training dataset, they generally don't perform any better than if they guessed randomly. CLIP, on the other hand, can identify an incredible range of things it has never seen before.
🤖 Read the official blog publication by OpenAI (openai.com)
👩💻 Try CLIP yourself following this guide from Roboflow (roboflow.com)
🐌 Follow Yannic Kilcher, the 1-man paper-discussion-group, who goes through the paper in depth (50 min on youtube.com)
|