NVIDIA’s new software boosts LLM performance by 8x 🔥
Plus: Google Deepmind's language models as optimizers. Meta plans to rival OpenAI's GPT-4.
Hello, Engineering Leaders and AI Enthusiasts!
Welcome to the 102nd edition of The AI Edge newsletter. This edition brings you NVIDIA’s new software, which boosts LLM performance by 8 times🔥.
And a huge shoutout to our incredible readers. You all rock! 😊
In today’s edition:
📈 NVIDIA’s new software boosts LLM performance by 8x
👏🔧
Google Deepmind introduces language models as optimizers
🆕 Meta plans to rival OpenAI's GPT-4 with its new model
💡 Knowledge Nugget: The LLM build phase- These tools will help you build faster by
Let’s go!
NVIDIA’s new software boosts LLM performance by 8x
NVIDIA has developed a software called TensorRT-LLM to supercharge LLM inference on H100 GPUs. It includes optimized kernels, pre- and post-processing steps, and multi-GPU/multi-node communication primitives for high performance. It allows developers to experiment with new LLMs without deep knowledge of C++ or NVIDIA CUDA. The software also offers an open-source modular Python API for easy customization and extensibility.
(The following figures reflect performance comparisons between an NVIDIA A100 and NVIDIA H100.)
Additionally, it allows users to quantize models to FP8 format for better memory utilization. TensorRT-LLM aims to boost LLM deployment performance and is available in early access, soon to be integrated into the NVIDIA NeMo framework. Users can apply for access through the NVIDIA Developer Program, with a focus on enterprise-grade AI applications.
Why does this matter?
H100 alone is 4x faster than A100. Adding TensorRT-LLM and its benefits, including in-flight batching, results in an 8X total increase to deliver the highest throughput. Also, on Meta’s Llama 2 TensorRT-LLM can accelerate inference performance by 4.6x compared to A100 GPUs.
Companies like Databricks have found TensorRT-LLM to be easy to use, feature-packed, and efficient, enabling cost savings for customers.
Google Deepmind introduces language models as optimizers
Google DeepMind introduces the concept of using language models as optimizers, This work is called Optimization by PROmpting (OPRO). This new approach describes the optimization problem in natural language. The models are trained to generate new solutions based on a defined problem and previously found solutions.
This is applied to linear regression, traveling salesman problems, and prompt optimization tasks. The results show that the prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K and up to 50% on Big-Bench Hard tasks.
Why does this matter?
Google Deepmind's OPRO can revolutionize problem-solving in various fields. It improves task accuracy, outperforming human-designed approaches benefiting end users with more efficient solutions.
Meta plans to rival OpenAI's GPT-4 with its new model
Meta is reportedly planning to train a new chatbot model that it hopes will rival OpenAI's GPT-4. The company is acquiring Nvidia H100 AI-training chips, so they won’t need to rely on Microsoft’s Azure cloud platform to train the new chatbot. Meta is expanding its data centers to create a more powerful chatbot.
CEO Mark Zuckerberg wants the model to be free for companies to create AI tools. Meta is building the model to speed up the creation of AI tools that can emulate human expressions.
Why does this matter?
Meta's pursuit of a GPT-4 rival and the acquisition of Nvidia H100 AI-training chips will give intensified competition to other AI giants in the market. By expanding its data centers, Meta seeks to reduce dependence on external cloud platforms, signifying a strategic move towards AI self-sufficiency.
📢 Invite friends and get rewards 🤑🎁
Enjoying the daily AI updates? Refer friends and get perks and special access to The AI Edge.
Get 400+ AI tools and 500+ prompts for 1 referral.
Get a free shoutout for 3 referrals!
Get The Ultimate Gen AI Handbook for 5 referrals.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: The LLM build phase - These tools will help you build faster by .
This readworthy article by
highlights various tools and resources that can help in building faster with LLM. It mentions repositories like Generative Model Programming and YALCC, which offer interesting ideas for API interfaces and constrained predictions. It also discusses frameworks like Demonstrate-Search-Predict Python and LLFn, which aim to simplify the process of experimenting and shipping AI applications using LLMs.Additionally, it mentions papers such as AdaTape, which presents a dynamic approach to Transformers, and BOLAA, which focuses on benchmarking LLM-augmented autonomous agents. The article also provides resources for training and fine-tuning LLM models, including code explanations and video tutorials. It concludes by discussing the performance of AMD GPUs for LLM inference, which can outperform NVIDIA GPUs in terms of cost-per-performance.
Why does this matter?
This article offers valuable insights into tools and resources for faster development. End users can leverage this article to their advantage by gaining awareness of the latest tools and resources for working with LLMs.
What Else Is Happening❗
✅ Reddit launched an AI-powered keyword research tool that will help advertisers. (Link)
✅ Infosys is likely to collab with NVIDIA to train 3 lakh+ employees on AI! (Link)
✅ India’s Reliance partners with Nvidia to develop a new LLM. (Link)
✅ Researchers at Humboldt University in Berlin have developed a biased GPT model called OpinionGPT. (Link)
✅ Nasdaq has received SEC approval for its first exchange AI-powered order type! (Link)
🧐 Monday Musings: The LLM Collection, From Prompt Engineering Guide.
This comprehensive list of LLMs will make it easier for users to find the latest LLMs, papers, and checkpoints.
The LLM Collection is a valuable resource for those interested in prompt engineering and improving language models. With this, users can quickly access the most up-to-date information and stay informed about the latest developments in the field.
🛠️ Trending Tools
DocuSpeed: AI-powered tool for quick insights and translations from PDFs.
Gen Expert: Enhance your GPT experience with customizable models and AI auto-completion.
Movie Mania: Daily game where ChatGPT describes a movie plot for players to guess.
Essay Builder AI: Free tool for generating high-quality essays effortlessly.
WebBrevity AI: Summarize lengthy web pages in seconds with this open-source app.
Castly: Your AI-enhanced learning and writing companion.
iWish: Automate tech support and transform customer experience with iWish AI.
AI Score My Site: Discover your website’s discoverability and ranking potential with AI.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI Enthusiasts.
Thanks for reading, and see you tomorrow. 😊