Hello, Engineering Leaders and AI Enthusiasts,
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
✅ Google DeepMind finds way to put AI into robots
✅ ChatGPT to Bard– Researchers find a way to turn AI chatbots evil
✅ Together AI extends Llama-2 to 32k context
✅ Google’s AI will auto-generate ads
✅ LLMs to think more like a human for answer quality
✅ ToolLLM masters 16k+ real-word APIs
✅ Google DeepMind Advances Biomedical AI with ‘Med-PaLM M’
✅ Meta is building AI friends for you
✅ GPT-5 coming soon?
✅ Meta’s AudioCraft is AudioGen + MusicGen + EnCodec
✅ Google’s latest venture to continue experimenting with AI
✅ LLaMA2-Accessory: An Open-source Toolkit for LLM Development
✅ DeepSpeed-Chat: Affordable RLHF training for AI
✅ OpenAI is rolling out new updates to improve ChatGPT
✅ Latest versions of Vicuna, based on the open LLaMA-2
Let’s go!
Google DeepMind finds way to put AI into robots
Google DeepMind has introduced Robotic Transformer 2 (RT-2), a first-of-its-kind vision-language-action (VLA) model that learns from both web and robotics data. It then translates this knowledge into generalized instructions for robotic control. This helps robots more easily understand and perform actions– in both familiar and new situations
The approach results in very performant robotic policies and, more importantly, leads to a significantly better generalization performance and emergent capabilities due to web-scale vision-language pretraining. Thus, internet-scale text, image, and video data can now be used to help robots develop better common sense.
ChatGPT to Bard– Researchers find a way to turn AI chatbots evil
LLMs today undergo extensive fine-tuning to ensure they do not produce harmful content in their responses. However, new research has introduced an approach that automatically produces adversarial suffixes to prompt the models, which results in affirmative responses for objectionable queries.
Unlike traditional jailbreaks, these are built in an entirely automated fashion, allowing one to create virtually unlimited number of such attacks. Although built to target open-source LLMs, the strings easily transfer to many closed-source, publicly-available chatbots too, like ChatGPT, Bard, and Claude.
Together AI extends Llama-2 to 32k context
Together AI has released LLaMA-2-7B-32K, a 32K context model built using Meta’s Position Interpolation and Together AI’s data recipe and system optimizations, including FlashAttention-2. You can fine-tune the model for targeted, long-context tasks– such as multi-document understanding, summarization, and QA. Here’s the model in Playground completing a book:
Upon evaluation, the model achieves comparable quality than the original LLaMA-2-7B base model.
Google’s AI will auto-generate ads
Google Ads has introduced a new feature that uses AI to generate advertisements on its platform automatically. The feature utilizes Large Language Models and generative AI to create campaign workflows based on prompts from marketers.
Google Ads can analyze landing pages, successful queries, and approved headlines to generate new creatives. The company also highlighted its commitment to privacy and introduced enhanced privacy features like Privacy Sandbox.
LLMs to think more like a human for answer quality
This research introduces "Skeleton-of-Thought" (SoT), a method to decrease the generation latency of large language models. SoT guides LLMs first to generate the skeleton of the answer and then complete the contents of each skeleton point in parallel.
This approach provides significant speed-up (up to 2.39x across 11 different LLMs) and can potentially improve answer quality regarding diversity and relevance. SoT is an initial attempt at optimizing LLMs for efficiency and encouraging them to think more like humans for better answers.
Research by: Microsoft Research And Department of Electronic Engineering, Tsinghua University.
ToolLLM masters 16k+ real-word APIs
ToolLLM is a framework that enhances the tool-use capabilities of open-source LLMs by training them to follow human instructions to use external tools (APIs). The framework includes a dataset called ToolBench, which contains instructions for using over 16,000 real-world APIs.
A depth-first search-based decision tree (DFSDT) is used to improve the planning and reasoning capabilities of the LLMs. An automatic evaluator called ToolEval is also developed to assess the performance of the LLMs. The results show that the trained LLM, ToolLLaMA, can execute complex instructions and generalize to unseen APIs, performing comparably to closed-source LLMs like ChatGPT.
Google DeepMind Advances Biomedical AI with ‘Med-PaLM M’
Google and DeepMind have introduced Med-PaLM M, a multimodal biomedical AI system that can interpret diverse types of medical data, including text, images, and genomics. The researchers curated a benchmark dataset called MultiMedBench, which covers 14 biomedical tasks, to train and evaluate Med-PaLM M.
The AI system achieved state-of-the-art performance across all tasks, surpassing specialized models optimized for individual tasks. Med-PaLM M represents a paradigm shift in biomedical AI, as it can incorporate multimodal patient information, improve diagnostic accuracy, and transfer knowledge across medical tasks. Preliminary evidence suggests that Med-PaLM M can generalize to novel tasks and concepts and perform zero-shot multimodal reasoning.
Meta is building AI friends for you
Meta, the owner of Facebook, is developing chatbots with different personalities to increase engagement on its platforms. These chatbots, known as "personas," will mimic human conversations and may include characters like Abraham Lincoln or a surfer. The chatbots are expected to launch early in September and will provide users with search functions, recommendations, and entertainment.
The move is aimed at retaining users and competing with platforms like TikTok. However, there are concerns about privacy, data collection, and the potential for manipulation.
GPT-5 coming soon?
OpenAI has recently filed a Trademark application with the US Patent and Trademark Office for GPT-5. The application was filed on 18-07-2023 and is currently awaiting examination.
The trademark is intended to cover categories of:
Downloadable computer programs and software related to language models
The AI of human speech and text, NLP, ML-based language, and speech processing
Translation of text or speech and sharing datasets for ML
Conversion of audio data into text, voice, and speech recognition
Creating and generating text and developing and implementing artificial neural networks.
The application relates to Software as a Service (SaaS) in these areas.
Meta’s AudioCraft is AudioGen + MusicGen + EnCodec
Meta has introduced AudioCraft, a new family of generative AI models built for generating high-quality, realistic audio & music from text. AudioCraft is a single code base that works for music, sound, compression & generation — all in the same place. It consists of three models– MusicGen, AudioGen, and EnCodec.
Meta is also open-sourcing these models, giving researchers and practitioners access so they can train their own models with their own datasets for the first time. AudioCraft is also easy to build on and reuse. Thus, people who want to build better sound generators, compression algorithms, or music generators can do it all in the same code base and build on top of what others have done.
Google’s latest venture to continue experimenting with AI
Google has announced Lab Sessions, a series of experimental AI collaborations with visionaries – from artists to academics, scientists to students, creators to entrepreneurs. Here’s a Google Labs Session exploring how AI computer vision models could help people learn sign language in new ways.
This will help Google continue to experiment with AI. And it aims to showcase its existing and future collaborations across all kinds of disciplines.
LLaMA2-Accessory: An Open-source Toolkit for LLM Development
LLaMA2-Accessory is an advanced open-source toolkit for pre-training, fine-tuning, and deployment of Large Language Models (LLMs) and multimodal LLMs. Its repository is mainly inherited from LLaMA-Adapter with more advanced features.
Thus, it supports more datasets, tasks, visual encoders, and efficient optimization methods. (LLaMA-Adapter is a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model).
DeepSpeed-Chat: Affordable RLHF training for AI
New Microsoft research has introduced DeepSpeed-Chat, a novel system that makes complex RLHF (Reinforcement Learning with Human Feedback) training fast, affordable, and easily accessible to the AI community (open-sourced). It has three key capabilities:
Easy-to-use Training and Inference Experience for ChatGPT Like Models
A DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT
A robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way
The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. Here’s how it compares to two other frameworks (Colossal-AI and HuggingFace DDP) for accelerating RLHF training on a single NVIDIA A100-40G commodity GPU.
OpenAI is rolling out new updates to improve ChatGPT
OpenAI is shipping out a bunch of small updates over the next week to improve the ChatGPT experience. Here’s a tl;dr
1. Prompt examples: At the beginning of a new chat, you will now see examples to help you get started.
2. Suggested replies: ChatGPT will suggest relevant ways to continue your conversation.
3. GPT-4 by default: When starting a new chat as a Plus user, ChatGPT will remember your previously selected model – no more defaulting back to GPT-3.5.
4. Upload multiple files: Now, ChatGPT can analyze data and generate insights across multiple files.
5. Stay logged in: You’ll no longer be logged out every 2 weeks!
6. Keyboard shortcuts: Work faster with shortcuts, like ⌘ (Ctrl) + Shift + ; to copy last code block. Try ⌘ (Ctrl) + / to see the complete list.
Latest versions of Vicuna, based on the open LLaMA-2
The latest Vicuna v1.5 series based on Llama 2 features 4K and 16K context lengths (has extended context length via positional interpolation by Meta), and have improved performance on almost all benchmarks. Vicuna 1.5 tl;dr
7B & 13B parameter versions
4096 and 16384 token context window
trained on 125k ShareGPT conversations
Commercial use
Evaluated with standard benchmarks, human preference, and LLM-as-a-judge
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI enthusiasts.
Thanks for reading, and see you on Monday! 😊