Apple Quietly Releases A Multimodal LLM
Plus: Microsoft introduces WaveCoder, Alibaba announces TF-T2V for text-to-video.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 176th edition of The AI Edge newsletter. This edition brings you Apple’s unexpected entry into the open-source LLM landscape with Ferret.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🎥 Apple quietly released an open-source multimodal LLM in October
🎵
Microsoft introduces WaveCoder, a fine-tuned Code LLM
💡 Alibaba announces TF-T2V for text-to-video generation
📚 Knowledge Nugget: AI & productivity: the economic effects by
Let’s go!
Apple quietly released an open-source multimodal LLM in October
Researchers from Apple and Columbia University released an open-source multimodal LLM called Ferret in October 2023. At the time, the release– which included the code and weights but for research use only, not a commercial license– did not receive much attention.
The chatter increased recently because Apple announced it had made a key breakthrough in deploying LLMs on iPhones– it released two new research papers introducing new techniques for 3D avatars and efficient language model inference. The advancements were hailed as potentially enabling more immersive visual experiences and allowing complex AI systems to run on consumer devices such as the iPhone and iPad.
Why does this matter?
Ferret is Apple’s unexpected entry into the open-source LLM landscape. Also, with open-source models from Mistral making recent headlines and Google’s Gemini model coming to the Pixel Pro and eventually to Android, there has been increased chatter about the potential for local LLMs to power small devices.
Microsoft introduces WaveCoder, a fine-tuned Code LLM
New Microsoft research studies the effect of multi-task instruction data on enhancing the generalization ability of Code LLM. It introduces CodeOcean, a dataset with 20K instruction instances on four universal code-related tasks.
This method and dataset enable WaveCoder, which significantly improves the generalization ability of foundation model on diverse downstream tasks. WaveCoder has shown the best generalization ability among other open-source models in code repair and code summarization tasks, and can maintain high efficiency on previous code generation benchmarks.
Why does this matter?
This research offers a significant contribution to the field of instruction data generation and fine-tuning models, providing new insights and tools for enhancing performance in code-related tasks.
Alibaba announces TF-T2V for text-to-video generation
Diffusion-based text-to-video generation has witnessed impressive progress in the past year yet still falls behind text-to-image generation. One of the key reasons is the limited scale of publicly available data, considering the high cost of video captioning. Instead, collecting unlabeled clips from video platforms like YouTube could be far easier.
Motivated by this, Alibaba Group’s research has come up with a novel text-to-video generation framework, termed TF-T2V, which can directly learn with text-free videos. It also explores its scaling trend. Experimental results demonstrate the effectiveness and potential of TF-T2V in terms of fidelity, controllability, and scalability.
Why does this matter?
Different from most prior works that rely heavily on video-text data and train models on the widely-used watermarked and low-resolution datasets, TF-T2V opens up new possibilities for optimizing with text-free videos or partially paired video-text data, making it more scalable and versatile in widespread scenarios, such as high-definition video generation.
We need your help!
We are working on a Gen AI survey and would love your input.
It takes just 2 minutes.
The survey insights will help us both.
And hey, you might also win a $100 Amazon gift card!
Every response counts. Thanks in advance!
Knowledge Nugget: AI & productivity: the economic effects
There’s a report from Goldman Sachs, titled Generative AI could raise global GDP by 7%.
One of the bullish claims about AI is that it will generate a sustained period of higher economic productivity. But how accurate are the forecasts? Any of the variables could be off.
There is something to the claim that AI increases productivity. Just look at all the programmers who tell us how amazing Github Copilot is. Here’s research from Github about how productive users of…Github Copilot are:
This article is
’s take on the economic effects of AI on productivity. It touches on the uncertain nature of forecasts, drawing parallels with his experience as a financial analyst.Why does this matter?
It emphasizes that the future is too high variance for forecasts to be of much use. What is more useful to know is this: AI technology is rapidly improving, and it is making many people much more productive at their jobs. Over time this will affect the economy in ways which will surprise us, and which will redound to our benefit.
What Else Is Happening❗
📱Apple’s iPhone design chief enlisted by Jony Ive & Sam Altman to work on AI devices.
Sam Altman and legendary designer Jony Ive are enlisting Apple Inc. veteran Tang Tan to work on a new AI hardware project to create devices with the latest capabilities. Tan will join Ive’s design firm, LoveFrom, which will shape the look and capabilities of the new products. Altman plans to provide the software underpinnings. (Link)
🤖Microsoft Copilot AI gets a dedicated app on Android; no sign-in required.
Microsoft released a new dedicated app for Copilot on Android devices. The free app is available for download today, and an iOS version will launch soon. Unlike Bing, the app focuses solely on delivering access to Microsoft’s AI chat assistant. There’s no clutter from Bing’s search experience or rewards, but you will still find ads. (Link)
🌐Salesforce posts a new AI-enabled commercial promoting “Ask More of AI”.
It is part of its “Ask More of AI” campaign featuring Salesforce pitchman and ambassador Matthew McConaughey. (Link)
📚AI is telling bedtime stories to your kids now.
AI can now tell tales featuring your kids' favorite characters. However, it's copyright chaos– and a major headache for parents and guardians. One such story generator called Bluey-GPT begins each session by asking kids their name, age, and a bit about their day, then churns out personalized tales starring Bluey and her sister Bingo. (Link)
🧙♂️Researchers have a magic tool to understand AI: Harry Potter.
J.K. Rowling’s Harry Potter is finding renewed relevance in a very different body of literature: AI research. A growing number of researchers are using the best-selling series to test how generative AI systems learn and unlearn certain pieces of information. A notable recent example is a paper titled “Who's Harry Potter?”. (Link)
That's all for now!
If you are new to The AI Edge newsletter, subscribe to get daily AI updates and news directly sent to your inbox for free!
Thanks for reading, and see you tomorrow. 😊
Keep up the good work! Thanks!
Hey, thanks for sharing my piece about AI & productivity. I wouldn't quite say that forecasts are of no use. I think that they become problematic when they're overly specific. "AI will increase productivity because of factors a, b, c..." is, I think, reasonable. "AI will increase productivity by 5.67% over the next 5 years, and 10.7% in perpetuity thereafter" is...useless. A forecast should indicate general trends, or directional arrows of progress (to use Josh Wolfe's phrase).