AI Weekly Rundown (September 9 to September 15)
Google and Meta accelerate the development of GPT-4 rivals, and more this week.
Hello, Engineering Leaders and AI Enthusiasts,
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
✅ NVIDIA’s new software boosts LLM performance by 8x
✅ Google Deepmind introduces language models as optimizers
✅ Meta plans to rival OpenAI's GPT-4 with its new model
✅ Google's responsible AI leap
✅ Microsoft, MIT, and Google transformed entire Project Gutenberg Collection into audiobooks
✅ Amazon, Nvidia, Microsoft, and Google lead hiring surge in GenAI
✅ Apple silently making AI moves
✅ Salesforce’s Einstein can customize AI for you
✅ NExT-GPT advances human-like AI research
✅ Stability AI launches text-to-music AI
✅ Adobe's slew of AI updates
✅ Microsoft Research’s new language model trains AI cheaper and faster
✅ Google Challenges GPT-4 with Gemini
✅ Google Research’s new generative image dynamics
✅ Microsoft Research's self-aligning LLMs
Let’s go!
NVIDIA’s new software boosts LLM performance by 8x
NVIDIA has developed a software called TensorRT-LLM to supercharge LLM inference on H100 GPUs. It includes optimized kernels, pre- and post-processing steps, and multi-GPU/multi-node communication primitives for high performance. It allows developers to experiment with new LLMs without deep knowledge of C++ or NVIDIA CUDA. The software also offers an open-source modular Python API for easy customization and extensibility.
(The following figures reflect performance comparisons between an NVIDIA A100 and NVIDIA H100.)
Additionally, it allows users to quantize models to FP8 format for better memory utilization. TensorRT-LLM aims to boost LLM deployment performance and is available in early access, soon to be integrated into the NVIDIA NeMo framework. Users can apply for access through the NVIDIA Developer Program, with a focus on enterprise-grade AI applications.
Google Deepmind introduces language models as optimizers
Google DeepMind introduces the concept of using language models as optimizers, This work is called Optimization by PROmpting (OPRO). This new approach describes the optimization problem in natural language. The models are trained to generate new solutions based on a defined problem and previously found solutions.
This is applied to linear regression, traveling salesman problems, and prompt optimization tasks. The results show that the prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K and up to 50% on Big-Bench Hard tasks.
Meta plans to rival OpenAI's GPT-4 with its new model
Meta is reportedly planning to train a new chatbot model that it hopes will rival OpenAI's GPT-4. The company is acquiring Nvidia H100 AI-training chips, so they won’t need to rely on Microsoft’s Azure cloud platform to train the new chatbot. Meta is expanding its data centers to create a more powerful chatbot.
CEO Mark Zuckerberg wants the model to be free for companies to create AI tools. Meta is building the model to speed up the creation of AI tools that can emulate human expressions.
Google's responsible AI leap
Google is launching the Digital Futures Project and a $20 million Google.org fund, which will provide grants to leading think tanks and academic institutions worldwide. The project will support researchers, organize convenings, and foster debate on public policy solutions to encourage the responsible development of AI.
Inaugural grantees of the Digital Futures Fund include the Aspen Institute, Brookings Institution, Carnegie Endowment for International Peace, the Center for a New American Security, the Institute for Security and Technology, SeedAI, and more. The fund will support institutions from countries around the globe.
Microsoft, MIT, and Google transformed entire Project Gutenberg Collection into audiobooks
In a new research called Large-Scale Automatic Audiobook Creation, Microsoft, MIT, and Google collaborated to transform the entire Project Gutenberg Collection into audiobooks. The library now boasts thousands of free and open audiobooks powered by AI.
Utilizing recent advances in neural text-to-speech, the team achieved exceptional quality of voice acting. The system also allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio.
Amazon, Nvidia, Microsoft, and Google lead hiring surge in GenAI
There is an explosive demand for Generative AI talent today. Here are some compelling statistics.
The number of companies mentioning “Generative AI” in monthly job postings is increasing exponentially.
Tech giants leading the surge in hiring for GenAI talent include Amazon, Nvidia, Oracle, Microsoft, Google, and more. Big banks like Citigroup and CapitalOne are also hiring big in GenAI.
Unsurprisingly, technology is the #1 sector looking to hire GenAI experts. Finance is #2nd, and healthcare is #3, while demand has been tepid in sectors like real estate, basic materials, and energy.
Companies are paying a lot for GenAI talent! Among all technical skills/technologies tracked, jobs mentioning “Generative AI” or “LLMs” had the highest average base salary offered, with an average of $200,837/year.
Apple silently making AI moves
Apple is quietly incorporating artificial intelligence into its new iPhones and watches to improve basic functions. The company showcased new gadgets with improved semiconductor designs that power AI features, such as better call quality and image capture.
Apple's AI efforts have been reshaping its core software products behind the scenes without explicitly mentioning AI at its developer conference. Apple's new watch chip includes a four-core "Neural Engine" that enhances Siri's accuracy by 25% and enables new ways to interact with the device. The iPhone also automatically recognizes people in the frame for improved image capture.
Salesforce’s Einstein can customize AI for you
Salesforce introduced Einstein Copilot Studio, which allows customers to customize their AI offerings. The tool consists of three elements: prompt builder, skills builder, and model builder.
With the prompt builder, customers can add their own custom prompts for their products or brands.
The skills builder enables companies to add actions to prompts, such as competitor analysis or objection handling.
The model builder allows customers to bring their own models or use supported third-party offerings.
Salesforce is also working on a system called "the Einstein Trust Layer" to address issues like bias and inappropriate responses.
NExT-GPT advances human-like AI research
The NExT-GPT system is a multimodal language model that can understand and generate content in various modalities, such as text, images, videos, and audio. It fills the gap in existing models by allowing for any multimodal understanding and generation.
NExT-GPT leverages pre-trained encoders and decoders, requiring only a small amount of parameter tuning. It also introduces a modality-switching instruction tuning (MosIT) and a curated dataset for complex cross-modal understanding.
📢 Invite friends and get rewards 🤑🎁
Enjoying AI updates? Refer friends and get perks and special access to The AI Edge.
Get 400+ AI Tools and 500+ Prompts for 1 referral.
Get A Free Shoutout! for 3 referrals.
Get The Ultimate Gen AI Handbook for 5 referrals.
When you use the referral link above or the “Share” button on any post, you'll get credit for any new subscribers. Simply send the link in a text, email or share it on social media with friends.
Stability AI launches text-to-music AI
Stability AI has launched Stable Audio, a music and sound generation product. Stable Audio utilizes generative AI techniques to provide faster and higher-quality music and sound effects through a user-friendly web interface.
The product offers a free version for generating and downloading tracks up to 45 seconds long and a subscription-based 'Pro' version for commercial projects with 90-second downloadable tracks. Stable Audio allows users to input descriptive text prompts and desired audio length to generate customized tracks. The underlying model was trained using music and metadata from AudioSparx, a music library.
Adobe's slew of AI updates
Adobe has announced new AI and 3D features in Adobe Premiere Pro and Adobe After Effects, as well as enhanced storage capabilities in Frame.io. The new features in Premiere Pro include Enhance Speech, which uses AI to remove background noise and improve dialogue quality, and Text-Based Editing improvements such as filler word detection.
After Effects introduces a true 3D workspace and an AI-powered Roto Brush for easier object removal. Frame.io has introduced Storage Connect for Enterprise customers, allowing them to use their existing storage while maintaining control of their assets. These features are available in beta now and will be generally available later this fall.
Also, another big update is Adobe's Firefly generative AI models are now generally available in its Creative Cloud, Adobe Express, and Adobe Experience Cloud. Firefly features like generative fill and generative expand in Photoshop are now accessible without beta installation.
Adobe is also launching Firefly as a standalone web app. The company plans to charge for Firefly using "generative credits" that measure user interactions with the models. Adobe is now paying bonuses to contributors of its stock image service, Adobe Stock, whose content is being used to train its generative AI model, Firefly.
Microsoft Research’s new language model trains AI cheaper and faster
Microsoft Research has developed a new language model called phi-1.5 that could make training AI models cheaper and faster. The model uses curated synthetic data from existing large language models like OpenAI's ChatGPT.
Despite having only 1 billion parameters compared to models with over 100 billion inputs, phi-1.5 has shown promising abilities with eliminating the need for web scraping or relying on data sources with copyright issues.
The model can reason and solve complex problems such as grade-school mathematics and basic coding. It exhibits traits of larger language models, both positive and negative, including the ability to think step by step and the potential for biased and toxic generations.
Google Challenges GPT-4 with Gemini
Google is reportedly nearing the release of its conversational AI software, Gemini. Which is intended to compete with OpenAI's GPT-4 model. Gemini is a collection of large-language models that can power chatbots, summarize text, generate original text, help write code and create images based on user requests.
Google is currently giving developers access to a version of Gemini, but not the largest version it is developing. The company plans to make Gemini available to companies through its Google Cloud Vertex AI service. Google has invested heavily in generative AI to catch up with OpenAI's ChatGPT.
Google Research’s new generative image dynamics
Google Research’s new paper introduces a method for turning single still images into seamless looping videos or interactive dynamic scenes. The model is trained on real video sequences with natural motion, such as trees swaying or clothes blowing in the wind.
Given a single image, the model can predict long-term motion patterns in the Fourier domain. These predictions can be converted into dense motion trajectories, which can be used for various applications, such as creating dynamic videos from still images or enabling realistic interactions with objects in pictures.
Microsoft Research's self-aligning LLMs
The paper introduces a method called RAIN that allows language models to align themselves with human preferences without the need for finetuning or extra data. By integrating self-evaluation and rewind mechanisms, unaligned models can produce responses consistent with human preferences through self-boosting.
RAIN operates without training or parameter updates and uses a fixed-template prompt to guide the model's alignment with human preferences. Experimental results show that RAIN significantly improves the harmlessness rate of language models while maintaining their helpfulness. It also establishes a new defense baseline against adversarial attacks.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI Enthusiasts.
Thanks for reading, and see you on Monday. 😊