Google Introduces W.A.L.T, AI for Photorealistic Videos
Plus: Runway introduces general world models, Alter3: A humanoid robot using GPT-4
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 166th edition of The AI Edge newsletter. This edition brings you W.A.L.T, AI for photorealistic video generation, by Google Research.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🎥 Google introduces W.A.L.T, AI for photorealistic video generation
🌍
Runway introduces general world models
🤖 Alter3, a humanoid robot generating spontaneous motion using GPT-4
📚 Knowledge Nugget: Why Incumbents LOVE AI by
Let’s go!
We need your help!
We are working on a Gen AI survey and would love your input.
It takes just 2 minutes.
The survey insights will help us both.
And hey, you might also win a $100 Amazon gift card!
Every response counts. Thanks in advance!
Google introduces W.A.L.T, AI for photorealistic video generation
Researchers from Google, Stanford, and Georgia Institute of Technology have introduced W.A.L.T, a diffusion model for photorealistic video generation. The model is a transformer trained on image and video generation in a shared latent space. It can generate photorealistic, temporally consistent motion from natural language prompts and also animate any image.
It has two key design decisions. First, it uses a causal encoder to compress images and videos in a shared latent space. Second, for memory and training efficiency, it uses a window attention-based transformer architecture for joint spatial and temporal generative modeling in latent space.
Why does this matter?
The end of the traditional filmmaking process may be near... W.A.L.T's results are incredibly coherent and stable. While there are no human-like figures or representations in the output here, it might be possible quite soon (we just saw Animate Anyone a few days ago, which can create an animation of a person using just an image).
Runway introduces general world models
Runway is starting a new long-term research effort around what we call general world models. It belief behind this is that the next major advancement in AI will come from systems that understand the visual world and its dynamics.
A world model is an AI system that builds an internal representation of an environment and uses it to simulate future events within that environment. You can think of Gen-2 as very early and limited forms of general world models. However, it is still very limited in its capabilities, struggling with complex camera or object motions, among other things.
Why does this matter?
Research in world models has so far been focused on very limited and controlled settings, either in toy-simulated worlds (like those of video games) or narrow contexts (world models for driving). Runway aims to represent and simulate a wide range of situations and interactions, like those encountered in the real world. It would also involve building realistic models of human behavior, empowering AI systems further.
Alter3, a humanoid robot generating spontaneous motion using GPT-4
Researchers from Tokyo integrated GPT-4 into their proprietary android, Alter3, thereby effectively grounding the LLM with Alter's bodily movement.
Typically, low-level robot control is hardware-dependent and falls outside the scope of LLM corpora, presenting challenges for direct LLM-based robot control. However, in the case of humanoid robots like Alter3, direct control is feasible by mapping the linguistic expressions of human actions onto the robot's body through program code.
Remarkably, this approach enables Alter3 to adopt various poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate sequences of actions over time without explicit programming for each body part. This demonstrates the robot's zero-shot learning capabilities. Additionally, verbal feedback can adjust poses, obviating the need for fine-tuning.
Why does this matter?
It signifies a step forward in AI-driven robotics. It can foster the development of more intuitive, responsive, and versatile robotic systems that can understand human instructions and dynamically adapt their actions. Advances in this can revolutionize diverse fields, from service robotics to manufacturing, healthcare, and beyond.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: Why Incumbents LOVE AI
Since the release of ChatGPT, we saw an explosion of startups like Jasper, Writer AI, Stability AI, and more.
Far from it: Adobe released Firefly, Intercom launched Fin, heck even Coca-Cola embraced stable diffusion and made a freaking incredible ad (below)!
So why are incumbents and enterprises able to move so quickly? Here are some some brief thoughts on it by
LLMs are not a new platform: Unlike massive tech AND org shifts like Mobile or Cloud, adopting AI doesn't entail a massive tech or organizational overhaul. It is an enablement shift (with data enterprises already have).
Talent retention is hard…except when AI is involved: AI is a retention tool. For incumbents, the best thing to happen is being able to tell the best engineers who have been around for awhile that they get to work on something new.
The article also talks about the opportunities ahead.
Why does this matter?
The article emphasizes that while incumbents’ presence doesn't negate the vast opportunities for AI founders and startups. Focus not just on the current trends but on leveraging LLMs to redefine architectures and disrupt incumbents fundamentally.
What Else Is Happening❗
🍔An AI chatbot will take your order at more Wendy's drive-thrus.
Wendy’s is expanding its test of an AI-powered chatbot that takes orders at the drive-thru. Franchisees will get the chance to test the product in 2024. The tool, powered by Google Cloud’s AI software, is currently active in four company-operated restaurants near Columbus, Ohio. (Link)
🤝Microsoft and Labor Unions form a ‘historic’ alliance on AI and its work impact.
Microsoft is teaming up with labor unions to create “an open dialogue” on how AI will impact workers. It is forming an alliance with the American Federation of Labor and Congress of Industrial Organizations, which comprises 60 labor unions representing 12.5 million workers. Microsoft will also train workers on how the tech works. (Link)
🇻🇳Nvidia to expand ties with Vietnam, and support AI development.
The chipmaker will expand its partnership with Vietnam's top tech firms and support the country in training talent for developing AI and digital infrastructure. Reuters reported last week Nvidia was set to discuss cooperation deals on semiconductors with Vietnamese tech companies and authorities in a meeting on Monday. (Link)
🛠️OpenAI is working to make GPT-4 less lazy.
The company acknowledged on Friday that ChatGPT has been phoning it in lately (again), and is fixing it. Then overnight, it made a series of posts about the chatbot training process, saying it must evaluate the model using certain metrics– AI benchmarks, you might say — calling it “an artisanal multi-person effort.” (Link)
🚀Nvidia emerges as a leading investor in AI companies.
Nvidia, the world's most valuable chipmaker, has participated in 35 deals in 2023, almost six times more than last year. It is seeking to capitalize on its position as the dominant provider of AI processors, investing in big new AI platforms valued in the billions of dollars to smaller start-ups applying AI to industries such as healthcare or energy. (Link)
That's all for now!
Subscribe now to join the prestigious readership of The AI Edge alongside professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other top organizations.
Thanks for reading, and see you tomorrow. 😊