AI Weekly Rundown (May 20 to May 26)
Microsoft, Google, Meta, Mind-Video, DragGAN, LIMA, and more AI shake-ups.
Hello, Engineering Leaders and AI Enthusiasts.
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
🧠 Mind-Video: High-quality video reconstruction from brain activity
🖼️ DragGAN: A new AI model for interactive point-based image manipulation
🔊 Meta scaling Speech Technology to 1,100+ languages
🌐 LIMA- Meta's powerful 65B LLM language model
💡 Microsoft unveils major AI updates at Build 2023
⚙️ Latest updates from Google: Product Studio, AI search ads & Bard’s new update
🎛️ QLora: Finetune 65B model on single 48GB GPU
🦍 Gorilla beats GPT-4: LLM makes AI better with efficient API calling
🚶♂️ Man walks after 12 years using AI Brain-Spine
Let’s go
Mind-Video: High-quality video reconstruction from brain activity
Mind-Video is a method for reconstructing continuous visual experiences in videos from non-invasive brain recordings, specifically continuous fMRI data of the cerebral cortex. The proposed approach combines masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model.
Adversarial guidance is utilized to achieve high-quality video reconstruction with arbitrary frame rates. The reconstructed videos were evaluated using semantic and pixel-level metrics, showing an average accuracy of 85% in semantic classification tasks and a structural similarity index (SSIM) of 0.19, surpassing the previous state-of-the-art by 45%. The model is biologically plausible and interpretable, aligning with established physiological processes.
DragGAN: A new AI model for interactive point-based image manipulation
Researchers from Google, MIT, and Max Planck Institute for Informatics have proposed DragGAN. The novel approach allows a user to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown below.
Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc.
Meta scaling Speech Technology to 1,100+ languages
Meta’s Massively Multilingual Speech (MMS) project aims to address the lack of speech recognition models for most of the world's languages, introduced Introducing speech-to-text, text-to-speech. Combining self-supervised learning techniques with a new dataset containing labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages.
The MMS models outperform existing ones and cover 10 times as many languages. The project's goal is to increase accessibility to information for people who rely on voice as their primary means of accessing information. The models and code are publicly available for further research and development. The project aims to contribute to the preservation of the world's diverse languages.
LIMA- Meta's powerful 65B LLM language model
Meta’s AI researchers introduce LIMA - a new 65B parameter language model fine-tuned on 1,000 curated prompts and responses. It doesn't use reinforcement learning, yet generalizes well to unseen tasks. Compared with other models, LIMA's responses are either equivalent or preferred in 43% of cases compared to GPT-4 and even more so when compared to Bard and davinci003. This simple approach with limited instruction tuning achieves high-quality output. However, scaling up examples remains challenging despite LIMA's strong performance.
Microsoft unveils major AI updates at Build 2023
AI was the central theme at Microsoft Build, the annual flagship event for developers. The company announced major updates in integrating AI throughout the entire technology framework, empowering developers to make the most of the new AI era.
Here are the initial AI-focused announcements from the event.
Windows Copilot for Windows 11
Windows 11 will be the first PC platform to centralize AI assistance with the introduction of Windows Copilot. With Bing Chat and first- and third-party plugins, users can work across multiple applications through simple prompts.
Connected AI plugin ecosystem for MS and OpenAI
Microsoft will adopt the same open plugin standard that OpenAI introduced for ChatGPT, enabling interoperability across ChatGPT and the breadth of Microsoft’s copilot offerings.
Developers can now use one platform to build plugins that work across both consumer and business surfaces, including ChatGPT, Bing, Dynamics 365 Copilot, and Microsoft 365 Copilot.
Plus, Bing is coming to ChatGPT as the default search experience.
Azure AI Studio to build and deploy AI models
As a part of new Azure AI tooling, Microsoft introduced Azure AI Studio– a full life cycle tool to build, train, evaluate, and deploy the latest next-generation models responsibly with just a few clicks.
Microsoft Fabric for unified data and analytics
Bring your data into the era of AI, Fabric can unify experiences, reduce costs and deploy intelligence faster on a single, AI-powered platform. It is an end-to-end, unified analytics platform that brings together all the data and analytics tools that organizations need.
Dev home for a single project dashboard
Dev Home will help streamline and manage any type of project developers are working on – Windows, cloud, web, mobile, or AI – providing all the information needed right at the fingertips in one customizable dashboard.
Latest updates from Google: Product Studio, AI search ads & Bard’s new update
The search engine giant Google has unveiled some exciting AI advancements. Which includes:
Google will utilize generative AI to enhance the relevance of Search ads based on the context of a query. (Link)
Google Bard AI chatbot now responds with images. (Link)
Google introduces Product Studio, which creates product imagery using AI. (Link)
QLora: Finetune 65B model on single 48GB GPU
QLoRA is an efficient finetuning method that enables training a 65B parameter model on a single 48GB GPU while maintaining full 16-bit finetuning task performance. It uses 4-bit quantization and Low-Rank Adapters (LoRA) to backpropagate gradients through a pre-trained language model.
Their best model family, which they named Guanaco, outperforms previous models on the Vicuna benchmark, achieving 99.3% of ChatGPT's performance within 24 hours of finetuning.
It introduces several innovations to save memory without sacrificing performance:
(a) 4-bit NormalFloat (NF4), a new data type that is information-theoretically optimal for normally distributed weights
(b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and
(c) paged optimizers to manage memory spikes.
They have released all their models and code, which enables researchers and practitioners to leverage these advancements.
Gorilla beats GPT-4: LLM makes AI better with efficient API calling
Gorilla is a fine-tuned LLaMA-based model that does better API calling than GPT-4. The relevant paper claims that it demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly.
It’s important to note that the paper used self-instruct on GPT-4 to generate the training data on which it is fine-tuned. So while Gorilla is better at API calling than GPT-4, it’s highly likely that GPT-4 excels in all the other categories.
Man walks after 12 years using AI Brain-Spine
A man who suffered a spinal cord injury and got paralyzed from a motorcycle accident 12 years ago is now able to walk again with an AI-powered intervention. The system consists of two implants and a base unit that converts brain signals into muscle stimuli.
The paper mentioned a 74% accuracy in converting brain signals to stimuli at an impressive latency of 1.1 seconds. The man can now walk up ramps, climb stairs, stand, sit, and more.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI enthusiasts.
Thanks for reading, and see you Monday.