NVIDIA’s Parakeet Beats OpenAI's Whisper v3
Plus: Tencent released LLaMA-Pro-8B, TinyLlama: A 1.1B Llama trained on 3T tokens.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 183rd edition of The AI Edge newsletter. This edition brings you NVIDIA’s latest open-source speech recognition model that beats OpenAI’s Whisper v3.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🎙️ NVIDIA’s Parakeet Beats OpenAI's Whisper v3
🚀
Tencent released LLaMA-Pro-8B on Hugging Face
🦙 TinyLlama: A 1.1B Llama model trained on 3 trillion tokens
📚 Knowledge Nugget: Fine Tuning Mistral 7B on Magic the Gathering Drafts by
Let’s go!
NVIDIA’s Parakeet Beats OpenAI's Whisper v3
NVIDIA NeMo, a leading open-source toolkit for conversational AI, has released Parakeet, a family of state-of-the-art automatic speech recognition (ASR) models capable of transcribing spoken English with exceptional accuracy.
Developed in collaboration with Suno.ai, the four Parakeet models boast 0.6-1.1 billion parameters with training on 64,000 hours of audio data covering different accents, ranges, and sound conditions. They exhibit resilience against non-speech segments, including music and silence, effectively preventing the generation of hallucinated transcripts. And they achieve state-of-the-art numbers on the HF Leaderboard.
Why does this matter?
Parakeet is a major step forward in the evolution of conversational AI. Its accuracy, coupled with the flexibility and ease of use offered by NeMo, empowers developers to create more natural and intuitive voice-powered applications. The possibilities are endless, from enhancing the accuracy of virtual assistants to enabling seamless real-time communication.
Tencent released LLaMA-Pro-8B on Hugging Face
Developed by Tencent's ARC Lab, LLaMA-Pro is an 8.3 billion parameter model. LLaMA-Pro is a progressive version of the original LLaMA model, enhanced by the addition of Transformer blocks. It is further trained on code and math corpora totaling 80 billion tokens.
It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent.
Why does this matter?
Humans generally acquire new skills without compromising the old, which LLMs can’t (e.g., from LLaMA to CodeLLaMA). This novel method of post-pretraining efficiently and effectively improves the model’s knowledge without catastrophic forgetting. It balances both general and specific capabilities, laying a solid foundation for developing advanced AI agents that operate effectively in various environments.
TinyLlama: A 1.1B Llama model trained on 3 trillion tokens
New research has introduced TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. With the same architecture and tokenizer as Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention), achieving better computational efficiency.
Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes.
Why does this matter?
Small Language Models (SLMs) are attracting a lot of attention for their computational efficiency, adaptability, and accessibility. Contributing further, TinyLlama can enable end-user applications on mobile devices and serve as a lightweight platform for testing a wide range of innovative AI ideas.
Additionally, TinyLlama can be plugged and played in many open-source projects built upon Llama.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: Fine Tuning Mistral 7B on Magic the Gathering Drafts
Fine tuning is enticing– promising to fill the gaps in GPT-4’s capabilities while also being faster and cheaper. But there is surprisingly small amount of content out there to help reason how effective fine tuning is and how hard it is to successfully fine tune new capabilities into language models.
To test models’ ability to reason (i.e., perform a somewhat complex task that requires high context understanding) about out-of-distribution (i.e., unseen) data,
uses his hobby: Magic the Gathering (specifically, draft).Why does this matter?
As intended, the experiment demonstrates the effectiveness of fine-tuning. But more importantly, it serves as a practical case study offering valuable insights and reveals that fine tuning is a fundamentally experimental process to get “right”. It may require a specialized skillset (in particular, one harder to learn than prompt engineering).
What Else Is Happening❗
🖼️Microsoft is adding a new image AI feature to Windows 11 Copilot.
The new “add a screenshot” button in the Copilot panel lets you capture the screen and directly upload it to the Copilot or Bing panel. Then, you can ask Bing Chat to discuss it or ask anything related to the screenshot. It is rolling out to the general public but may be available only to select users for now. (Link)
🚗Ansys collaborates with Nvidia to improve sensors for autonomous cars.
Pittsburgh-based Ansys is a simulation software company that has created the Ansys AVxcelerate Sensors within Nvidia Drive Sim, a scenario-based autonomous vehicle (AV) simulator powered by Nvidia’s Omniverse. This integration provides car makers access to highly accurate sensor simulation outputs. (Link)
🗣️New version of Siri with generative AI is again rumored for WWDC.
Apple is preparing to preview a new version of Siri with generative AI and a range of new capabilities at Worldwide Developers Conference (WWDC), according to a user (on Naver) with a track record for posting Apple rumors. It is Ajax-based and touts natural conversation capabilities, as well as increased user personalization. (Link)
🛡️NIST identifies types of cyberattacks that manipulate behavior of AI systems.
Computer scientists from the National Institute of Standards and Technology (NIST) identify adversaries that can deliberately confuse or even “poison” AI and ML in a new publication. A collaboration among government, academia, and industry, it is intended to help AI developers and users get a handle on the types of attacks they might expect along with approaches to mitigate them– with the understanding that there is no silver bullet. (Link)
🧬Isomorphic Labs partners with pharma giants to discover new medications with AI.
Isomorphic Labs, the London-based, drug discovery-focused spin-out of Google AI R&D division DeepMind has partnered with pharmaceutical giants, Eli Lilly and Novartis, to apply AI to discover new medications to treat diseases. This collaboration harnesses the companies’ unique strengths to realize new possibilities in AI-driven drug discovery. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊