Meta Releases Code Llama 70B, Rivals GPT-4
Plus: Neuralink implants brain chip in first human, Alibaba upgrades Qwen-VL.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 199th edition of The AI Edge newsletter. This edition brings you Meta’s Code Llama 70B, an open-source behemoth to rival private AI development.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
and
🔝
Meta released Code Llama 70B, rivals GPT-4
🧠 Neuralink implants its brain chip in the first human
🚀 Alibaba announces Qwen-VL; beats GPT-4V and Gemini
📚 Knowledge Nugget: An introduction to evaluating LLMs by
Let’s go!
Meta released Code Llama 70B, rivals GPT-4
Meta released Code Llama 70B, a new, more performant version of its LLM for code generation. It is available under the same license as previous Code Llama models–
CodeLlama-70B
CodeLlama-70B-Python
CodeLlama-70B-Instruct
CodeLlama-70B-Instruct achieves 67.8 on HumanEval, making it one of the highest-performing open models available today. CodeLlama-70B is the most performant base for fine-tuning code generation models.
Why does this matter?
This makes Code Llama 70B the best-performing open-source model for code generation, beating GPT-4 and Gemini Pro. This can have a significant impact on the field of code generation and the software development industry, as it offers a powerful and accessible tool for creating and improving code.
Neuralink implants its brain chip in the first human
In a first, Elon Musk’s brain-machine interface startup, Neuralink, has successfully implanted its brain chip in a human. In a post on X, he said "promising" brain activity had been detected after the procedure and the patient was "recovering well". In another post, he added:
The company's goal is to connect human brains to computers to help tackle complex neurological conditions. It was given permission to test the chip on humans by the FDA in May 2023.
Why does this matter?
As Mr. Musk put it well, imagine if Stephen Hawking could communicate faster than a speed typist or auctioneer. That is the goal. This product will enable control of your phone or computer and, through them almost any device, just by thinking. Initial users will be those who have lost the use of their limbs.
Alibaba announces Qwen-VL; beats GPT-4V and Gemini
Alibaba’s Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include
Substantial boost in image-related reasoning capabilities;
Considerable enhancement in recognizing, extracting, and analyzing details within images and texts contained therein;
Support for high-definition images with resolutions above one million pixels and images of various aspect ratios.
Compared to the open-source version of Qwen-VL, these two models perform on par with Gemini Ultra and GPT-4V in multiple text-image multimodal tasks, significantly surpassing the previous best results from open-source models.
Why does this matter?
This sets new standards in the field of multimodal AI research and application. These models match the performance of GPT4-v and Gemini, outperforming all other open-source and proprietary models in many tasks.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: An introduction to evaluating LLMs
As more and more LLMs have been released over the last 6 months, comparing model quality has become a favorite pastime. We all have different experiences with models and use different models for different tasks– Bard for analysis & synthesis, Claude for code generating, and so on.
Many of us have also probably looked at rankings like the LMSys model leaderboard or the HuggingFace Open LLM leaderboard, both of which are full of many different numbers.
Despite all the comparisons– quantitative and qualitative– it’s still not clear how we should be thinking about model quality. Is there one best model or different models for different tasks? The answer is, as always, that it depends.
In this article,
and give you an overview of how LLMs evaluations work today.Why does this matter?
Evaluating LLMs is a critical task, and its importance will only grow. As more companies adopt the technology (and as more models come out), understanding what models work well for which tasks is going to be critical. While plenty of ink has been spilled on new techniques, there’s a ton of work left to be done, both in general-purpose model evaluations as well as in specific domains.
What Else Is Happening❗
🤝OpenAI partners with Common Sense Media to collaborate on AI guidelines.
OpenAI will work with Common Sense Media, the nonprofit organization that reviews and ranks the suitability of various media and tech for kids, to collaborate on AI guidelines and education materials for parents, educators, and young adults. It will curate “family-friendly” GPTs based on Common Sense’s rating and evaluation standards. (Link)
🚀Apple's 'biggest' iOS update may bring a lot of AI to iPhones.
Apple's upcoming iOS 18 update is expected to be one of the biggest in the company's history. It will leverage generative AI to provide a smarter Siri and enhance the Messages app. Apple Music, iWork apps, and Xcode will also incorporate AI-powered features. (Link)
🆕Shortwave email client will show AI-powered summaries automatically.
Shortwave, an email client built by former Google engineers, is launching new AI-powered features such as instant summaries that will show up atop an email, a writing assistant to echo your writing and extending its AI assistant function to iOS and Android, and multi-select AI actions. All these features are rolling out starting this week. (Link)
🌐OpenAI CEO Sam Altman explores AI chip collaboration with Samsung and SK Group.
Sam Altman has traveled to South Korea to meet with Samsung Electronics and SK Group to discuss the formation of an AI semiconductor alliance and investment opportunities. He is also said to have expressed a willingness to purchase HBM (High Bandwidth Memory) technology from them. (Link)
🎯Generative AI is seen as helping to identify M&A targets, Bain says.
Deal makers are turning to AI and generative AI tools to source data, screen targets, and conduct due diligence at a time of heightened regulatory concerns around mergers and acquisitions, Bain & Co. said in its annual report on the industry. In the survey, 80% of respondents plan to use AI for deal-making. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊
I asked 70B a fairly simple question about a training script I'd written. it gave a canned answer which showed it missed my point. worse it talked about the script using MSE. The script gets error like this: truth - prediction.