AI Weekly Rundown (February 24 to March 01)
Major AI announcements from Microsoft, Meta, Mistral, DeepMind, NVIDIA, and more.
Hello Engineering Leaders and AI Enthusiasts!
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
🛡️
Microsoft eases AI testing with new red teaming tool
🧠 Transformers learn to plan better with Searchformer
👀 YOLOv9 sets a new standard for real-time object recognition
🌪️ Mistral Large: The new rival to GPT-4; 2nd best LLM of all time
🎮 DeepMind’s new gen-AI model creates video games in a flash
📱 Meta’s MobileLLM enables on-device AI deployment🏆
NVIDIA's Nemotron-4 beats 4x larger multilingual AI models
👩💻 GitHub launches Copilot Enterprise for customized AI coding
⏱️ Slack study shows AI frees up 41% of time spent on low-value work📸
Alibaba's EMO makes photos come alive (and lip-sync!)
💻 Microsoft introduces 1-bit LLM
🖼️ Ideogram launches text-to-image model version 1.0
🎥 Sora generates videos with stunning geometrical consistency
💼 Microsoft introduces Copilot for Finance, its newest AI offering
🤖 Figure & OpenAI to develop next-gen AI models for humanoid robots
Let’s go!
Microsoft eases AI testing with new red teaming tool
Microsoft has released an open-source automation toolkit called PyRIT to help security researchers test for risks in generative AI systems before public launch. Historically, "red teaming" AI has been an expert-driven manual process requiring security teams to create edge case inputs and assess whether the system's responses contain security, fairness, or accuracy issues. PyRIT aims to automate parts of this tedious process for scale.
Transformers learn to plan better with Searchformer
A new paper from Meta introduces Searchformer, a Transformer model that exceeds the performance of traditional algorithms like A* search in complex planning tasks such as maze navigation and Sokoban puzzles. Searchformer is trained in two phases: first imitating A* search to learn general planning skills, then fine-tuning the model via expert iteration to find optimal solutions more efficiently.
The key innovation is the use of search-augmented training data that provides Searchformer with both the execution trace and final solution for each planning task. This enables more data-efficient learning compared to models that only see solutions. However, encoding the full reasoning trace substantially increases the length of training sequences. Still, Searchformer shows promising techniques for training AI to surpass symbolic planning algorithms.
YOLOv9 sets a new standard for real-time object recognition
YOLO (You Only Look Once) is open-source software that enables real-time object recognition in images, allowing machines to “see” like humans. Researchers have launched YOLOv9, the latest iteration that achieves state-of-the-art accuracy with significantly less computational cost.
By introducing two new techniques, YOLOv9 reduces parameters by 49% and computations by 43% versus predecessor YOLOv8, while boosting accuracy on key benchmarks by 0.6%.
Mistral Large: The new rival to GPT-4, 2nd best LLM of all time
The French AI startup Mistral has launched its largest-ever LLM and flagship model to date, Mistral Large, with a 32K context window. The model has top-tier reasoning capabilities, and you can use it for complex multilingual reasoning tasks, including text understanding, transformation, and code generation.
Due to a strong multitasking capability, Mistral Large is the world's second-ranked model on MMLU (Massive multitask language understanding).
It is natively fluent in English, French, Spanish, German, and Italian, with a nuanced understanding of grammar and cultural context. It also shows top performance in coding and math tasks.
Mistral Large is now available via the in-house platform "La Plateforme" and Microsoft's Azure AI via API.
DeepMind’s new gen-AI model creates video games in a flash
Google DeepMind has launched a new generative AI model - Genie (Generative Interactive Environment), that can create playable video games from a simple prompt after learning game mechanics from hundreds of thousands of gameplay videos. It can create side-scrolling 2D platformer games based on user prompts, like Super Mario Brothers and Contra, using a single image.
Genie can be prompted with images it has never seen before, such as real-world photographs or sketches, enabling people to interact with their imagined virtual worlds– essentially acting as a foundation world model. This is possible despite training without any action labels.
Meta’s MobileLLM enables on-device AI deployment
Meta has released a research paper that addresses the need for efficient LLMs that can run on mobile devices. The focus is on designing high-quality models with under 1 billion parameters, as this is feasible for mobile deployment.
By using deep and thin architectures, embedding sharing, and grouped-query attention, they developed a strong baseline model called MobileLLM, which achieves 2.7%/4.3% higher accuracy compared to previous 125M/350M state-of-the-art models.
NVIDIA's Nemotron-4 beats 4x larger multilingual AI models
Nvidia has announced Nemotron-4 15B, a 15-billion parameter multilingual language model trained on 8 trillion text tokens. It shows exceptional performance in English, coding, and multilingual datasets. It outperforms all other open models of similar size on 4 out of 7 benchmarks. It has the best multilingual capabilities among comparable models, even better than larger multilingual models.
Nemotron-4 scales model training data in line with parameters instead of just increasing model size. Thus, inferences are computed faster, and latency is reduced.
GitHub launches Copilot Enterprise for customized AI coding
GitHub has launched Copilot Enterprise, an AI assistant for developers at large companies. It provides customized code suggestions and other programming support based on an organization's codebase and best practices. It integrates across the coding workflow to boost productivity. Early testing by partners like Accenture found major efficiency gains, with a 50% increase in builds from autocomplete alone.
Slack study shows AI frees up 41% of time spent on low-value work
Slack's latest workforce survey shows a surge in the adoption of AI tools among desk workers. There has been a 24% increase in usage over the past quarter, and 80% of users are already seeing productivity gains. However, only <half of companies have guidelines around AI adoption, which may inhibit experimentation. It also spotlights an opportunity to use AI to automate the 41% of workers' time spent on repetitive, low-value tasks. And focus efforts on meaningful, strategic work.
Enjoying the weekly updates?
Refer your pals to subscribe to our newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Alibaba's EMO makes photos come alive (and lip-sync!)
Researchers at Alibaba have introduced an AI system called “EMO” (Emote Portrait Alive) that can generate realistic videos of you talking and singing from a single photo and an audio clip. It captures subtle facial nuances without relying on 3D models.
EMO uses a two-stage deep learning approach with audio encoding, facial imagery generation via diffusion models, and reference/audio attention mechanisms.
Experiments show that the system significantly outperforms existing methods in terms of video quality and expressiveness.
Microsoft introduces 1-bit LLM
Microsoft has launched a radically efficient AI language model dubbed 1-bit LLM. It uses only 1.58 bits per parameter instead of the typical 16, yet performs on par with traditional models of equal size for understanding and generating text.
Building on research like BitNet, this drastic bit reduction per parameter boosts cost-effectiveness relating to latency, memory, throughput, and energy usage by 10x. Despite using a fraction of the data, 1-bit LLM maintains accuracy.
Ideogram launches text-to-image model version 1.0
Ideogram has launched a new text-to-picture app called Ideogram 1.0. It's their most advanced ever. Dubbed a "creative helper," it generates highly realistic images from text prompts with minimal errors.
Ideogram 1.0 significantly cuts image generation errors in half compared to other apps. And users can make custom picture sizes and styles. So it can do memes, logos, old-timey portraits, anything.
Magic Prompt takes basic prompts like "vegetables orbiting the sun" and turns them into full scenes with backstories. That would take regular people hours to write out word-for-word.
Tests show that Ideogram 1.0 beats DALL-E 3 and Midjourney V6 at matching prompts, making sensible pictures, looking realistic, and handling text.
Sora showcases jaw-dropping geometric consistency
Sora from OpenAI has been remarkable in video generation compared to other leading models like Pika and Gen2. In a recent benchmarking test conducted by ByteDanc.Inc in collaboration with Wuhan and Nankai University, Sora showcased video generation with high geometric consistency.
The benchmark test assesses the quality of generated videos based on how it adhere to the principles of physics in real-world scenarios.
Microsoft introduces Copilot for finance
Microsoft has launched Copilot for Finance. It aims to transform how finance teams approach their daily work with intelligent workflow automation, recommendations, and guided actions. This Copilot aims to simplify data-driven decision-making, helping finance professionals have more free time by automating manual tasks like Excel and Outlook.
Dentsu, Northern Trust, Schneider Electric, and Visa plan to use it alongside Copilot for Sales and Service to increase productivity, reduce case handling times, and gain better decision-making insights.
OpenAI and Figure team up to develop AI for robots
Figure has raised $675 million in series B funding with investments from OpenAI, Microsoft, and NVIDIA. In conjunction with the investment, Figure and OpenAI have entered a collaboration agreement to develop next-generation AI models for humanoid robots. This combines OpenAI's research with Figure's deep understanding of robotics hardware and software.
That's all for now!
Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you on Monday. 😊