Meta’s Audiobox Advances Controllability for AI Audio

Plus: Mozilla's llamafile turns LLMs into single-file executables, Alibaba's Animate Anyone.

Dec 04, 2023

Hello Engineering Leaders and AI Enthusiasts!

Welcome to the 160th edition of The AI Edge newsletter. This edition brings you Meta’s new foundation research model for audio generation.

And a huge shoutout to our amazing readers. We appreciate you😊

In today’s edition:

🧠 Meta’s Audiobox advances controllability for AI audio
📁 Mozilla lets you turn LLMs into single-file executables
🚀 Alibaba’s Animate Anyone may be the next breakthrough in AI animation
📚 Knowledge Nugget: Venture Capital's Race for AI Supremacy in 2023 Explained by
AI Supremacy

Let’s go!

Meta’s Audiobox advances controllability for AI audio

Audiobox is Meta’s new foundation research model for audio generation. The successor to Voicebox, it is advancing generative AI for audio further by unifying generation and editing capabilities for speech, sound effects (short, discrete sounds like a dog bark, car horn, a crack of thunder, etc.), and soundscapes, using a variety of input mechanisms to maximize controllability.

Most notably, Audiobox lets you use natural language prompts to describe a sound or type of speech you want. You can also use it combined with voice inputs, thus making it easy to create custom audio for a wide range of use cases.

Why does this matter?

Audiobox demonstrates state-of-the-art controllability in speech and sound effects generation with AI. With it, developers can easily build a more dynamic and wide range of use cases without needing deep domain expertise. It can transform diverse media, from movies to podcasts, audiobooks, and video games.

(Source)

Mozilla lets you turn LLMs into single-file executables

LLMs for local use are usually distributed as a set of weights in a multi-gigabyte file. These cannot be directly used on their own, making them harder to distribute and run compared to other software. A given model can also have undergone changes and tweaks, leading to different results if different versions are used.

To help with that, Mozilla’s innovation group has released llamafile, an open-source method of turning a set of weights into a single binary that runs on six different OSs (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD) without needing to be installed. This makes it dramatically easier to distribute and run LLMs and ensures that a particular version of LLM remains consistent and reproducible forever.

Why does this matter?

This makes open-source LLMs much more accessible to both developers and end users, allowing them to run models on their own hardware easily.

Source

Alibaba’s Animate Anyone may be the next breakthrough in AI animation

Alibaba Group researchers have proposed a novel framework tailored for character animation– Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation.

Despite diffusion models’ robust generative capabilities, challenges persist in image-to-video (especially in character animation), where temporally maintaining consistency with details remains a formidable problem.

This framework leverages the power of diffusion models. To preserve the consistency of intricacies from reference images, it uses ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity, it introduces an efficient pose guider. It achieves SoTA results on benchmarks for fashion video and human dance synthesis.

Why does this matter?

This could mark the beginning of the end of TikTok and Instagram. Some inconsistencies are noticeable, but it's more stable and consistent than earlier AI character animators. It could look scarily real if we give it some time to advance.

Source

Enjoying the daily updates?

Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.

Refer a friend

When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.

Knowledge Nugget: Venture Capital's Race for AI Supremacy in 2023 Explained

We’ve seen a lot of high-octane hype trends in recent years– fintech, web3, VR, cloud, and more. Will Generative AI be any different?

Since Big Tech firms stand out here, here’s

AI Supremacy

showing how to visualize Big Tech investing in Generative AI.

In this article, he deep dives and discusses the funding, aggressive investments, and strategic moves made by tech giants in a race with venture capital firms. It also talks about the potential impact of GenAI on various industries, controversies, leadership issues, and much more.

Why does this matter?

Most importantly, the article discusses the role of venture capital in shaping the future of AI, fueling the growth and development of generative AI technology.

Source

What Else Is Happening❗

🤖OpenAI to buy $51M worth AI chips from a startup backed by CEO Sam Altman.

Documents show that OpenAI signed a letter of intent to spend $51 million on brain-inspired chips developed by startup Rain. OpenAI CEO Sam Altman previously made a personal investment in Rain. (Link)

📌Pinterest begins testing a ‘body type ranges’ tool to make searches more inclusive.

It will allow users to filter select searches by different body types. The feature, which will work with women’s fashion and wedding ideas at launch, builds on Pinterest’s new body type AI technology announced earlier this year. (Link)

📈Intel neural-chat-7b model achieves top ranking on LLM leaderboard.

At 7 billion parameters, neural-chat-7b is at the low end of today’s LLM sizes. Yet it achieved comparable accuracy scores to models 2-3x larger. So, even though it was fine-tuned using Intel Gaudi 2 AI accelerators, its small size means you can deploy it to a wide range of compute platforms. (Link)

🖼️Leonardo AI in real-time is here, with two tiers for now.

Paid get "Realtime" mode where it updates as you paint and as you move objects. Free get "Interactive" mode, where it updates at the end of a brush stroke or once you let go of an object. Paid is now live and free to go live soon. (Link)

🔁Google delayed the Gemini launch from this week to January.

Gemini was supposed to get a big reveal this week with a series of events in California, New York, and Washington aimed at politicians and policymakers. Sundar Pichai decided to delay the launch after Google “found the AI didn’t reliably handle some non-English queries.” (Link)

That's all for now!

If you are new to The AI Edge newsletter, subscribe to get daily AI updates and news directly sent to your inbox for free!

Thanks for reading, and see you tomorrow. 😊

The AI Edge

Meta’s Audiobox Advances Controllability for AI Audio

Plus: Mozilla's llamafile turns LLMs into single-file executables, Alibaba's Animate Anyone.

Meta’s Audiobox advances controllability for AI audio

Mozilla lets you turn LLMs into single-file executables

Alibaba’s Animate Anyone may be the next breakthrough in AI animation

Enjoying the daily updates?

Knowledge Nugget: Venture Capital's Race for AI Supremacy in 2023 Explained

What Else Is Happening❗

Discussion about this post