Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 174th edition of The AI Edge newsletter. This edition brings you the Meta’s new research in video generation, Fairy.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🎥 Meta’s Fairy can generate videos 44x faster
🤖
NVIDIA presents new text-to-4D model
🌟 Midjourney V6 has enhanced prompting and coherence
📚 Knowledge Nugget: NLP Research in the Era of LLMs by
Let’s go!
Meta’s Fairy can generate videos 44x faster
GenAI Meta research has introduced Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy.
Remarkably efficient, Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that the approach delivers superior quality, decisively outperforming established methods.
Why does this matter?
Fairy offers a transformative approach to video editing, building on the strengths of image-editing diffusion models. Moreover, it tackles the memory and processing speed constraints observed in preceding models along with quality. Thus, it firmly establishes its superiority, as further corroborated by the extensive user study.
NVIDIA presents a new text-to-4D model
NVIDIA research presents Align Your Gaussians (AYG) for high-quality text-to-4D dynamic scene generation. It can generate diverse, vivid, detailed and 3D-consistent dynamic 4D scenes, achieving state-of-the-art text-to-4D performance.
AYG uses dynamic 3D Gaussians with deformation fields as its dynamic 4D representation. An advantage of this representation is its explicit nature, which allows us to easily compose different dynamic 4D assets in large scenes. AYG's dynamic 4D scenes are generated through score distillation, leveraging composed text-to-image, text-to-video and 3D-aware text-to-multiview-image latent diffusion models.
Why does this matter?
AYG can open up promising new avenues for animation, simulation, digital content creation, and synthetic data generation, where AYG takes a step beyond the literature on text-to-3D synthesis and also captures our world's rich temporal dynamics.
Midjouney V6 has improved prompting and image coherence
Midjourney has started alpha-testing its V6 models. Here is what's new in MJ V6:
Much more accurate prompt following as well as longer prompts
Improved coherence, and model knowledge
Improved image prompting and remix
Minor text drawing ability
Improved upscalers, with both 'subtle‘ and 'creative‘ modes (increases resolution by 2x)
An entirely new prompting method had been developed, so users will need to re-learn how to prompt.
Why does this matter?
By the looks of it on social media, users seem to like version 6 much better. Midjourney’s prompting had long been somewhat esoteric and technical, which now changes. Plus, in-image text is something that has eluded Midjourney since its release in 2022 even as other rival AI image generators such as OpenAI’s DALL-E 3 and Ideogram had launched this type of feature.
We need your help!
We are working on a Gen AI survey and would love your input.
It takes just 2 minutes.
The survey insights will help us both.
And hey, you might also win a $100 Amazon gift card!
Every response counts. Thanks in advance!
Knowledge Nugget: NLP Research in the Era of LLMs
NLP research has undergone a paradigm shift over the last year. A range of LLMs has validated the unreasonable effectiveness of scale. Currently, the state of the art on most benchmarks is held by LLMs that are expensive to fine-tune and prohibitive to pre-train outside of a few industry labs.
So researchers today are faced with a constraint that is much harder to overcome: compute.
What research is left for academics, PhD students, and newcomers to NLP without deep pockets? Should they focus on the analysis of black-box models and niche topics ignored by LLM practitioners?
In this article,
first argues why the current state of research is not as bleak– rather the opposite! He then highlights five research directions that are important for the field and do not require much compute, taking inspiration from various reviews of research directions in the era of LLMs.Why does this matter?
Rather than waiting for compute costs to go down, making LLMs more efficient can have a wide impact. This post presents a selection of five research directions that are particularly crucial and encourages more potential opportunities to explore in AI research.
What Else Is Happening❗
🆕Google AI research has developed 'Hold for Me' and a Magic Eraser update.
It is an AI-driven technology that processes audio directly on your Pixel device and can determine whether you've been placed on hold or if someone has picked up the call. Also, Magic Eraser now uses gen AI to fill in details when users remove unwanted objects from photos. (Link)
💬Google is rolling out ‘AI support assistant’ chatbot to provide product help.
When visiting the support pages for some Google products, now you’ll encounter a “Hi, I’m a new Al support assistant. Chat with me to find answers and solve account issues” dialog box in the bottom-right corner of your screen. (Link)
🏆Dictionary selected "Hallucinate" as its 2023 Word of the Year.
This points to its AI context, meaning "to produce false information and present it as fact." AI hallucinations are important for the broader world to understand. (Link)
❤️Chatty robot helps seniors fight loneliness through AI companionship.
Robot ElliQ, whose creators, Intuition Robotics, and senior assistance officials say it is the only device using AI specifically designed to lessen the loneliness and isolation experienced by many older Americans. (Link)
📉Google Gemini Pro falls behind free ChatGPT, says study.
A recent study by Carnegie Mellon University (CMU) shows that Google's latest large language model, Gemini Pro, lags behind GPT-3.5 and far behind GPT-4 in benchmarks. The results contradict the information provided by Google at the Gemini presentation. This highlights the need for neutral benchmarking institutions or processes. (Link)
That's all for now!
If you are new to The AI Edge newsletter, subscribe to get daily AI updates and news directly sent to your inbox for free!
Thanks for reading, and see you tomorrow. 😊