Finetune a 65B LLM on One GPU in 24H

Plus: Google DeepMind for YouTube Shorts, Opera unveils browser AI Aria

May 25, 2023

Hello, Engineering Leaders and AI Enthusiasts,

Welcome to the 27th edition of The AI Edge newsletter. In today’s edition, we bring you QLoRA, an efficient approach to finetune SoTA 65B parameter models on a 48GB GPU with ChatGPT-like performance. Thank you everyone who is reading this. 😊

In today’s edition:

🤖 QLora: Finetune 65B model on single 48GB GPU
🎬 Google Deepmind creates YT shorts descriptions
🚀 Opera unveils Aria, its GPT-powered browser AI
📚 Knowledge Nugget: Why the original transformer figure is wrong, and other tidbits about LLMs by

Sebastian Raschka

Let’s go!

QLora: Finetune 65B model on single 48GB GPU

QLoRA is an efficient finetuning method that enables training a 65B parameter model on a single 48GB GPU while maintaining full 16-bit finetuning task performance. It uses 4-bit quantization and Low-Rank Adapters (LoRA) to backpropagate gradients through a pre-trained language model.

Their best model family, which they named Guanaco, outperforms previous models on the Vicuna benchmark, achieving 99.3% of ChatGPT's performance within 24 hours of finetuning.

It introduces several innovations to save memory without sacrificing performance:

(a) 4-bit NormalFloat (NF4), a new data type that is information-theoretically optimal for normally distributed weights

(b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and

They have released all their models and code, which enables researchers and practitioners to leverage these advancements.

Why does this matter?

QLora optimizes memory usage and preserves full 16-bit finetuning task performance. This means that researchers and practitioners can now train highly performant LLMs without requiring extensive computational resources, making them more accessible and affordable.

Source

Google Deepmind creates YT shorts descriptions

Google's DeepMind AI language model, Flamingo, is now being used to generate descriptions for YouTube Shorts. This will help to improve the discoverability of Shorts, as they often do not include helpful titles or descriptions.

Flamingo can generate descriptions by analyzing the initial frames of a video to explain what's going on. The generated descriptions will be stored as metadata to help YouTube better categorize videos and match search results to viewer queries

Why does this matter?

Flamingo is one of the first AI models that can be used to generate natural language descriptions of videos. It could make it easier for people to find the videos they are looking for, and it could also help promote new and interesting videos. Flamingo is a significant step forward in the development of AI, and it will likely have a major impact on how we use videos in the future.

Source

Opera unveils Aria, its new browser AI powered by OpenAI’s GPT

Opera is introducing an AI side panel in its browser called Aria, which is based on its “Composer” infrastructure and connects to OpenAI’s GPT technology. It is enhanced by additional capabilities, such as adding live results from the web.

Aria is both a web and a browser expert that allows users to collaborate with AI while looking for information on the web, generating text or code, or getting your product queries answered. It aims to enhance user creativity and productivity by harnessing the power of AI.

Aria is a free service with up-to-date information, meaning it is connected to the internet and not limited to content prior to 2021. This also makes it a more advanced offering than standard GPT-based solutions. Plus, it is shipping to over 180 countries, including the EU.

Why does this matter?

Introducing Aria marks the next step in Opera’s plans to integrate generative AI services in browsers. However, the new sidebar is similar to the features that Microsoft introduced for its Edge web browser (including a new Bing AI chatbot in the sidebar). It looks like more companies are joining the giants like Microsoft and Google in transforming the browsing experience with AI.

Source

Knowledge Nugget: Why the original Transformer figure is wrong, and some other interesting historical tidbits about LLMs

Following his insightful article on understanding LLMs,

Sebastian Raschka

shares another article listing four important papers to understand transformers from a more historical perspective. The papers include:

On Layer Normalization in the Transformer Architecture (2020)
Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Neural Networks (1991)
Universal Language Model Fine-tuning for Text Classification (2018)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher (2022)

Why does this matter?

The article provides valuable historical insights that can help inform and shape new research in AI, particularly in transformer-based models. Moreover, it helps understand the strengths, limitations, and historical context of LLMs and helps evaluate their potential benefits and risks, offering a starting point for further exploration and investigation.

Source

What Else Is Happening

🏞️ WOW! Time-lapse of majestic waterfalls in the jungle created with Gen-2 AI technology. (Link)

🤝 Wipro expanded its partnership with Google Cloud to bring generative AI capabilities (Link)

🌎 OpenAI’s ChatGPT iOS app is taking an international tour! (Link)

🚨 TruEra released FREE tool for testing LLM’s hallucinations (Link)

⚡️ Elasticsearch Relevance Engine gives generative AI an creative update! (Link)

Trending Tools

Kai: Interact with ChatGPT from your iPhone’s keyboard. Give a prompt, switch to KAI, hit Write and select a response.
CopysAI: Copywriting platform with AI code, voice-overs, speech-to-text, and images. Create content faster and smarter.
Talkio: Language training app using AI to improve oral language skills. Over 400 tutors with unique artificial personalities.
KeyWI: Eliminate time-consuming keyword research & competitor analysis. GPT-powered editor for SEO-optimised content.
Humbird: AI-powered Talent CRM for high-growth tech companies to build diverse workforce and reduce recruitment cycle.
ChatUML: AI copilot to help work with diagrams in fun and interactive way. Transform ideas to diagrams effortlessly!
OdinAI: Easy for health apps to generate GPT-4 powered recommendations. Send recommendations, generate insights, access API.
Macro: PDF Editor using AI to make documents interactive & editable. Click cross-references for preview, navigate with tabs.

That's all for now!

If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI enthusiasts.

Thanks for reading, and see you tomorrow.

The AI Edge

Finetune a 65B LLM on One GPU in 24H

Plus: Google DeepMind for YouTube Shorts, Opera unveils browser AI Aria

QLora: Finetune 65B model on single 48GB GPU

Google Deepmind creates YT shorts descriptions

Opera unveils Aria, its new browser AI powered by OpenAI’s GPT

Knowledge Nugget: Why the original Transformer figure is wrong, and some other interesting historical tidbits about LLMs

What Else Is Happening

Trending Tools

Discussion about this post