DeepMind Builds The First-ever Video-to-Audio AI

Plus: Runway launches new model Gen-3 Alpha, China’s DeepSeek Coder V2 beats GPT-4 Turbo.

Jun 18, 2024

Hello Engineering Leaders and AI Enthusiasts!

Welcome to the 299th edition of The AI Edge newsletter. This edition features DeepMind’s video-to-audio (V2A) technology that create soundtracks for silent videos.

And a huge shoutout to our amazing readers. We appreciate you😊

In today’s edition:

🎬 Google DeepMind’s new AI can generate soundtracks for videos
🌟 Runway launches new model Gen-3 Alpha
🚀China’s DeepSeek Coder V2 beats GPT-4 Turbo
🧠 Knowledge Nugget: Cyborg nostalgia by
terry nguyen

Let’s go!

Google DeepMind’s new AI can generate soundtracks for videos

DeepMind is developing video-to-audio (V2A) technology to generate rich soundtracks for silent videos generated by AI models. V2A combines video pixels with natural language text prompts to create synchronized audiovisual content. The technology offers enhanced creative control, allowing users to guide the audio output using positive and negative prompts.

What sets DeepMind's V2A apart is its ability to understand raw pixels and generate audio without manual alignment. However, V2A struggles with artifacts or distortions in videos and generates audio that is not super convincing. As DeepMind continues to gather feedback from creators and filmmakers, they remain committed to developing this technology responsibly.

Why does it matter?

The technology could help revive and enhance historical footage, silent films, and other archival material. However, generative AI tools like V2A also threaten to disrupt the film and TV industry, potentially eliminating jobs without strong labor protections.

Source

Runway launches new model Gen-3 Alpha

Runway launched Gen-3 Alpha, its latest AI model for generating video clips from text descriptions and still images. Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures, and emotions and can interpret various styles and cinematic terminology. However, it has limitations, including a maximum video length of 10 seconds, and struggles with complex character and object interactions and following the laws of physics precisely.

Runway partnered with entertainment and media organizations to create custom versions of Gen-3 for more stylistically controlled and consistent characters, targeting specific artistic and narrative requirements. They also have implemented safeguards, such as a moderation system to block attempts to generate videos from copyrighted images and a provenance system to identify videos coming from Gen-3.

Why does it matter?

As competition in AI video generation heats up, Runway's Gen-3 Alpha empowers artists and filmmakers to create high-quality, controllable videos with ease, pushing the boundaries of storytelling and creative possibilities.

Source

China’s DeepSeek Coder V2 beats GPT-4 Turbo

Chinese AI startup DeepSeek has announced the release of DeepSeek Coder V2, an open-source code language model. It is built upon the DeepSeek-V2 MoE model and excels at coding and math tasks, supporting over 300 programming languages. It outperforms state-of-the-art closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro, making it the first open-source model to achieve this feat. DeepSeek Coder V2 also maintains comparable performance in general reasoning and language capabilities.

The model is being offered under an MIT license, which allows for research and unrestricted commercial use. It can be downloaded or accessed via API on DeepSeek's platform.

Why does it matter?

DeepSeek aims to "unravel the mystery of AGI with curiosity" and has quickly emerged as a notable Chinese player in the AI race. As it only costs $0.14/1M tokens(input) and $0.28/1M tokens(output), it will give notable models like GPT-4 Turbo intense competition.

Source

Enjoying the daily updates?

Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.

Refer a friend

When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.

Knowledge Nugget: Cyborg nostalgia

In her thought-provoking article,

terry nguyen

explores the evolution of AI representation in films, from the dystopian, unruly entities of the 1970s to the more human-like, emotionally complex AIs in contemporary movies. She highlights how early AI films, such as Demon Seed, portrayed AI as a threatening, invasive force that infested people's bodies, while newer films like Her and After Yang focus on the emotional depth and empathy in human-machine relationships.

Nguyen argues that this shift reflects the gradual removal of the computer's physicality and the increasing use of software in our daily lives. The author longs for the imaginative "other-ness" and unruly physicality of vintage AI and cyborg films, which explore the fear of the unknown alongside themes of love.

Why does it matter?

Films shape our perceptions and expectations of AI. The current boredom towards AI in contemporary society may originate from the narrow applications promoted by tech giants, which focus on productivity and creative substitution rather than the more imaginative possibilities explored in older films.

Source

What Else Is Happening❗

🔍 Perplexity now displays weather, currency conversion, and simple math directly through cards

This move aims to keep users from going to Google for such results. Perplexity's CEO, Aravind Srinivas, acknowledged that Google handles basic queries like weather, time, and live sports scores well, and his company had work to do in that area. (Link)

🛡️ U.S. government and private sector ran the first AI attack simulation

Federal officials, AI model operators, and cybersecurity companies ran the first joint simulation of a cyberattack on a critical AI system. It also involved experts from private sector companies like Microsoft, Nvidia, and OpenAI. It helped identify potential new threats and establish communication channels between the government and the private sector. (Link)

🚀 Adobe Acrobat got a major upgrade, bringing AI to PDFs and more

Adobe Firefly generative AI enables image generation and editing directly within Acrobat for the first time. Acrobat AI Assistant's new features, "insights across documents" and "enhanced meeting transcripts," help users extract insights and share information from various document types. Adobe is offering free, unlimited access to Acrobat AI Assistant from June 18 to June 28. (Link)

🤖 TikTok introduces gen AI avatars of creators and stock actors for ads

"Custom Avatars" allow creators to scale their likeness for multilingual avatars and brand collaborations, while brands can use pre-built "Stock Avatars" to add a human touch. Plus, the new "AI Dubbing" tool translates content into ten languages, helping creators and brands increase their global reach. (Link)

🧱 Pixelbot 3000 builds Lego art using simple AI prompts

YouTuber Creative Mindstorms designed and built the Pixelbot 3000, a Lego printer that automates the assembly of brick-built mosaics. It uses OpenAI's DALL-E 3 to generate images based on simple text prompts. First it generates a simplified cartoon-style image, then it is divided into a 32 x 32 grid, and the color of the center pixel in each square is sampled to create a high-contrast scaled image for the mosaic. (Link)

New to the newsletter?

The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.

Thanks for reading, and see you tomorrow. 😊

The AI Edge

DeepMind Builds The First-ever Video-to-Audio AI

Plus: Runway launches new model Gen-3 Alpha, China’s DeepSeek Coder V2 beats GPT-4 Turbo.

Google DeepMind’s new AI can generate soundtracks for videos

Runway launches new model Gen-3 Alpha

China’s DeepSeek Coder V2 beats GPT-4 Turbo

Enjoying the daily updates?

Knowledge Nugget: Cyborg nostalgia

What Else Is Happening❗

New to the newsletter?

Discussion about this post