Meta’s 400B+ Llama 3: A Watershed Moment for AI
Plus: Mistral AI has unveiled Mixtral 8x22B, Meta researchers propose an alternative to transformers.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 257th edition of The AI Edge newsletter. This edition brings you Meta’s new, open-source Llama 3 models.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🦙 Meta’s Llama 3 models are here; 400B+ models in training!
📈 Mixtral 8x22B claims highest open-source performance and efficiency
🦈 Meta’s Megalodon to solve the fundamental challenges of the Transformer
📚 Knowledge Nugget: An introduction to Large Language Models by
Let’s go!
Meta’s Llama 3 models are here; 400B+ models in training!
Llama 3 is finally here! Meta introduced the first two models of the Llama 3 family for broad use: pretrained and instruction-fine-tuned language models with 8B and 70B parameters. Meta claims these are the best models existing today at the 8B and 70B parameter scale, with greatly improved reasoning, code generation, and instruction following, making Llama 3 more steerable.
But that’s not all. Meta is also training large models with over 400B parameters. Over coming months, it will release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities.
Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
Why does this matter?
While Llama 400B+ is still in training, it is already trending. Its release might mark a watershed moment for AI as the open-source community gains access to a GPT4-class model. It will be a powerful foundation for research efforts, and it could be a win for open-source in the longer run if startups/businesses start building more local, tailored models with it.
Mixtral 8x22B claims highest open-source performance and efficiency
Mistral AI has unveiled Mixtral 8x22B, a new open-source language model that the startup claims achieves the highest open-source performance and efficiency. it’s sparse mixture-of-experts (SMoE) model actively uses only 39 billion of its 141 billion parameters. As a result, it offers an exceptionally good price/performance ratio for its size.
The model’s other strengths include multilingualism, with support for English, French, Italian, German, and Spanish, as well as strong math and programming capabilities.
Why does this matter?
While Mistral AI’s claims may be true, there’s a new competitor on the market- Llama 3. So we might have to reconsider the claims on the best open-source model out right now. But whatever the benchmarks say, only the practical usefulness of these models will tell which is truly superior.
Meta’s Megalodon to solve the fundamental challenges of the Transformer
Researchers at Meta and the University of Southern California have proposed a new model that aims to solve some of the fundamental challenges of the Transformer, the deep learning architecture that gave rise to the age of LLMs.
The model, called Megalodon, allows language models to extend their context window to millions of tokens without requiring huge amounts of memory. Experiments show that Megalodon outperforms Transformer models of equal size in processing large texts. The researchers have also obtained promising results on small– and medium-scale experiments on other data modalities and will later work on adapting Megalodon to multi-modal settings.
Why does this matter?
Scientists have been looking for alternative architectures that can replace transformers. Megalodon is the latest in the series. However, much research has already been poured into enhancing and making transformers efficient. For example, Google’s Infini-attention released this week. So, the alternatives have a lot to catch up to. For now, transformers continue to remain the dominant architecture for language models.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: An introduction to Large Language Models
Since the release of ChatGPT, LLMs have become ubiquitous in our daily lives. In this post,
gives a high-level introduction to Large Language Models, the technical details about them as well as the current challenges facing them.He talks about the backbone of LLMs- the artificial neural network or ANN (also called a "Transformer" architecture). He also dives into the standard training approach for LLMs, next token prediction, and the challenges of bias and hallucinations in LLMs.
Why does this matter?
LLMs are at the forefront of the AI field. While they are powerful AI models impacting our daily lives (in good and bad ways), they are still being figured out. Understanding them gives us a window into the potential and challenges of this evolving field.
What Else Is Happening❗
🔍Meta adds its AI chatbot, powered by Llama 3, to the search bar in all its apps.
Meta has upgraded its AI chatbot with its newest LLM Llama 3 and has added it to the search bar of its apps– Facebook, Messenger, Instagram, and WhatsApp– in multiple countries. It also launched a new meta.ai site for users to access the chatbot and other new features such as faster image generation and access to web search results. (Link)
🚗Wayve introduces LINGO-2, a groundbreaking AI model that drives and narrates its journey.
LINGO-2 merges vision, language, and action, resulting in every driving maneuver coming with an explanation. This provides a window into the AI’s decision-making, deepening trust and understanding of our assisted and autonomous driving technology. (Link)
🤖Salesforce updates Slack AI with smart recaps and more languages.
Salesforce rolled out Gen AI updates for Slack. The new features build on the native AI smarts– collectively dubbed Slack AI– announced in Feb and provide users with easy-to-digest recaps to stay on top of their day-to-day work interactions. Salesforce also confirmed expanding Slack AI to more languages. (Link)
✈️US Air Force tests AI-controlled jets against human pilots in simulated dogfights.
The Defense Advanced Research Projects Agency (DARPA) revealed that an AI-controlled jet successfully faced a human pilot during an in-air dogfight test last year. The agency has conducted 21 test flights so far and says the tests will continue through 2024. (Link)
🔋Google Maps will use AI to find out-of-the-way EV chargers for you.
Google Maps will use AI to summarize customer reviews of EV chargers to display more specific directions to certain chargers, such as ones in parking garages or more hard-to-find places. The app will also have more prompts to encourage users to submit their feedback after using an EV charger. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊