Google’s VideoPoet Is the Ultimate All-in-One Video AI
Plus: Microsoft Copilot creates music with Suno, Runway introduces text-to-speech and video ratios.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 172nd edition of The AI Edge newsletter. This edition brings you the ultimate all-in-one video AI, VideoPoet, by Google Research.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🎥 Google’s VideoPoet is the ultimate all-in-one video AI
🎵
Microsoft Copilot turns your ideas into songs with Suno
💡 Runway introduces text-to-speech and video ratios for Gen-2
📚 Knowledge Nugget: The Busy Person's Introduction to Large Language Models by
Let’s go!
Google’s VideoPoet is the ultimate all-in-one video AI
To explore the application of language models in video generation, Google Research introduces VideoPoet, an LLM that is capable of a wide variety of video generation tasks, including:
Text-to-video
Image-to-video
Video editing
Video stylization
Video inpainting and outpainting
Video-to-audio
VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator. It demonstrates state-of-the-art video generation, in particular in producing a wide range of large, interesting, and high-fidelity motions.
Why does this matter?
Leading video generation models are almost exclusively diffusion-based. But VideoPoet uses LLMs’ exceptional learning capabilities across various modalities to generate videos that look smoother and more consistent over time.
Notably, it can also generate audio for video inputs and longer duration clips from short input context which shows strong object identity preservation not seen in prior works.
Microsoft Copilot turns your ideas into songs with Suno
Microsoft has partnered with Suno, a leader in AI-based music creation, to bring their capabilities to Microsoft Copilot. Users can enter prompts into Copilot and have Suno, via a plug-in, bring their musical ideas to life. Suno can generate complete songs– including lyrics, instrumentals, and singing voices.
This will open new horizons for creativity and fun, making music creation accessible to everyone. The experience will begin rolling out to users starting today, ramping up in the coming weeks.
Why does this matter?
While many of the ethical and legal issues around AI-synthesized music have yet to be ironed out, tech giants and startups are increasingly investing in GenAI-based music creation tech. DeepMind and YouTube partnered to release Lyria and Dream Track, Meta has published several experiments, Stability AI and Riffusion have launched platforms and apps; now, Microsoft is joining the movement.
Runway introduces text-to-speech and video ratios for Gen-2
Text to Speech: Users can now generate voiceovers and dialogue with simple-to-use and highly expressive Text-to-speech. It is available for all plans starting today.
Ratios for Gen-2: Quickly and easily change the ratio of your generations to better suit the channels you’re creating for. Choose from 16:9, 9:16, 1:1, 4:3, 3:4.
Why does this matter?
These new features add more control and expressiveness to creations inside Runway. It also plans to release more updates for improved control over the next few weeks. Certainly, audio and video GenAI is set to take off in the coming year.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: The Busy Person's Introduction to Large Language Models
A large language model is simpler than you might think. Essentially, it boils down to just two files on a computer.
In this informative and insightful article,
has aimed to break down complex ideas into simple, digestible concepts. His in-depth explanation of LLMs covers their core components, functionality, training process, generative nature, and evolution from document generators to AI assistants, inspired by Andrej Karpathy's YouTube video.The article also covers the advancement of LLMs, discussing RLHF, scaling laws, integration of external tools, multimodality, future directions, and more.
Why does this matter?
As we shift our focus to the future of AI, it is essential to understand LLMs, their current academics, and research interests. This comprehensive guide is designed to extend this knowledge to a broader audience and shed light on the fascinating world of LLMs with simplicity.
What Else Is Happening❗
🌍Google expands access to AI coding in Colab across 175 locales.
It announced the expansion of code assistance features to all Colab users, including users on free-of-charge plans. Anyone in eligible locales can now try AI-powered code assistance in Colab. (Link)
🔐Stability AI announces paid membership for commercial use of its models.
It is now offering a subscription service that standardizes and changes how customers can use its models for commercial purposes. With three tiers, this will aim to strike a balance between profitability and openness. (Link)
🎙️TomTom and Microsoft develop an in-vehicle AI voice assistant.
Digital maps and location tech specialist TomTom partnered with Microsoft to develop an AI voice assistant for vehicles. It enables voice interaction with location search, infotainment, and vehicle command systems. It uses multiple Microsoft products, including Azure OpenAI Service. (Link)
🏠Airbnb is using AI to help clampdown on New Year’s Eve parties globally.
The AI-powered technology will help enforce restrictions on certain NYE bookings in several countries and regions. Airbnb's anti-party measures have seen a decrease in the rate of party reports over NYE, as thousands globally stopped from booking last year. (Link)
🤖AI robot outmaneuvers humans in maze run breakthrough.
Researchers at ETH Zurich have created an AI robot called CyberRunner they say surpassed humans at the popular game Labyrinth. It navigated a small metal ball through a maze by tilting its surface, avoiding holes across the board, and mastering the toy in just six hours. (Link)
That's all for now!
If you are new to The AI Edge newsletter, subscribe to get daily AI updates and news directly sent to your inbox for free!
Thanks for reading, and see you tomorrow. 😊