13 Most Important AI Updates From 2024's First Quarter
A recap of the biggest news and advancements in AI for 2024 until now.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 238th edition of The AI Edge newsletter. This edition will speed you up on all the major happenings in AI in 2024 so far. Buckle up!
Here’s a quick rundown:
🎉 AI takes over CES 2024 with big reveals
🖥️ Rabbit’s AI device can do things phones never will🏥
Google’s new medical AI, AMIE, beats doctors🧠
Google Deepmind AI solves Olympiad-level math
🌐 Meta to build Open-Source AGI, Zuckerberg says🚀
OpenAI launches Sora, a text-to-video model
💎 Google announces Gemini 1.5 with 1 million tokens!
🏆 Anthropic launches Claude 3 models; beats GPT-4
👨💻 Devin: The first AI software engineer is here
🤖 ChatGPT gets a body, thanks to OpenAI-Figure collaboration
💻 Nvidia launches 'world's most powerful AI chip' yet
🤔 Sam Altman hints at an “amazing Model"; could be GPT-5
👤 Nvidia unveils GR00T that acts as the minds of robots
Let’s go!
AI takes over CES 2024 with big reveals
This year’s CES, the world's largest tech event, showcased cutting-edge technologies and innovations from various sectors with a focus on AI. Here are some standout highlights.
Samsung’s AI-enabled visual display products and digital appliances will introduce novel home experiences. Samsung also announced Ballie, a pet-like robotic companion.
LG announced AI Smart Home Agents. Plus, it revealed its new Alpha 11 AI processor.
Nvidia unveiled its GeForce RTX, including the GeForce RTX 40 Super series of desktop graphics cards and a new wave of AI-ready laptops. Read more here.
AMD debuted its new Ryzen 8000G processors for the desktop, with a big focus on their AI capabilities.
Volkswagen plans to integrate an AI-powered chatbot called ChatGPT into its cars and SUVs equipped with its IDA voice assistant.
BMW’s operating system will feature AR and AI to enhance car and driver communication.
Swift Robotics unveiled AI-powered strap-on shoes called 'Moonwalkers' that increase walking speed while maintaining a natural gait.
Amazon integrated with Character AI to bring conversational AI companions to devices.
L'Oreal revealed an AI chatbot that gives beauty advice based on an uploaded photograph.
Swarovski's $4,799 smart AI-powered binoculars can identify birds and animals for you.
Why does this matter?
The wide range of AI innovations at the CES 2024 shows how rapidly this technology is evolving and seeping into our daily lives. From pet-like companions to binoculars, AI is being applied across industries to enhance products and services in healthcare, beauty, transportation, and more.
Rabbit unveils r1, an AI pocket device to do tasks for you
Tech startup Rabbit unveiled r1, an AI-powered companion device that does digital tasks for you. r1 operates as a standalone device, but its software is the real deal– it operates on Rabbit OS and the AI tech underneath. Rather than a ChatGPT-like LLM, this OS is based on a “Large Action Model” (a sort of universal controller for apps).
The Rabbit OS introduces “rabbits”– AI agents that execute a wide range of tasks, from simple inquiries to intricate errands like travel research or grocery shopping. And the LAM removes the need for complex integrations like APIs and apps, enabling seamless task execution across platforms without having to download multiple apps.
Why does this matter?
If Humane can’t do it, Rabbit just might. This can usher in a new era of human-device interaction where AI doesn’t just understand natural language; it performs actions based on users’ intentions to accomplish tasks. It will revolutionize the online experience by efficiently navigating multiple apps using natural language commands.
Google’s new medical AI, AMIE, beats doctors
Google developed Articulate Medical Intelligence Explorer (AMIE), an LLM-based research AI system optimized for diagnostic reasoning and conversations.
AMIE's performance was compared to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors.
Why does this matter?
While further research is required before AMIE can be translated to real-world settings, it represents a milestone towards conversational diagnostic AI. If successful, AI systems such as AMIE can be at the core of next-generation learning health systems that help scale world-class healthcare to everyone.
Google Deepmind AI solves Olympiad-level math
DeepMind unveiled AlphaGeometry– an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist. It is a breakthrough in AI performance.
In a benchmarking test of 30 Olympiad geometry problems, AlphaGeometry solved 25 within the standard Olympiad time limit. For comparison, the previous state-of-the-art system solved 10 of these geometry problems, and the average human gold medalist solved 25.9 problems.
Why does this matter?
It marks an important milestone towards advanced reasoning, which is the key prerequisite for AGI. Moreover, its ability to learn from scratch without human demonstrations is particularly impressive. This hints AI may be close to outperforming humans (at least in geometry) or human-like reasoning.
Meta to build open-source AGI, Zuckerberg says
Meta’s CEO Mark Zuckerberg shared their recent AI efforts:
They are working on artificial general intelligence (AGI) and Llama 3, an improved open-source large language model.
The FAIR AI research group will be merged with the GenAI team to pursue the AGI vision jointly.
Meta plans to deploy 340,000 Nvidia H100 GPUs for AI training by the end of the year, bringing the total number of AI GPUs available to 600,000.
Highlighted the importance of AI in the metaverse and the potential of Ray-Ban smart glasses.
Why does this matter?
Meta's pursuit of AGI could accelerate AI capabilities far beyond current systems. It may enable transformative metaverse experiences while also raising concerns about technological unemployment.
OpenAI launches Sora, a text-to-video model
Out of nowhere, OpenAI drops a video generation model. Sora can create 1-minute videos from text or a still image while maintaining visual quality and adherence to the user’s prompt. It can also “extend” existing video clips, filling in the missing details.
Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. It understands not only what the user has asked for in the prompt but also how those things exist in the physical world.
Sora is currently in research preview, and OpenAI is working with red teamers who are adversarially testing the model.
Why does this matter?
OpenAI has entered the video generation race with Runway, Pika, and more and might completely change it (probably for the better). Its cheery-picked samples do look quite impressive compared to others. But if Sora builds on past research in DALL·E and GPT models, it gives OpenAI an edge.
Google announces Gemini 1.5 with 1 million tokens!
After launching Gemini Advanced last week, Google has now launched Gemini 1.5. It delivers dramatically enhanced performance, with a breakthrough in long-context understanding across modalities. It can process up to 1 million tokens consistently!
Gemini 1.5 is more efficient to train and serve, with a new Mixture-of-Experts (MoE) architecture. [In simple terms, the MoE approach is like sifting through only relevant training bits for faster, more focused answers to queries.]
Gemini 1.5 Pro comes with a standard 128,000 token context window. However, a limited group of developers and enterprise customers can try it with 1 million tokens via AI Studio and Vertex AI in private preview.
Why does this matter?
Google has achieved the longest context window of any large-scale foundation model yet. More information in a prompt means more consistent, relevant, and useful output. A million tokens mean huge possibilities for devs– upload hundreds of pages of text, entire code repos, and long videos and let Gemini reason across them. It can probably learn a whole new skill with just a prompt!
Anthropic’s Claude 3 beats OpenAI’s GPT-4
Anthropic has launched Claude 3, a new family of models that has set new industry benchmarks across a wide range of cognitive tasks. The family comprises three state-of-the-art models in ascending order of cognitive ability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Each model provides an increasing level of performance, and you can choose the one according to your intelligence, speed, and cost requirements.
Opus and Sonnet are now available via claude.ai and the Claude API in 159 countries, and Haiku will join that list soon.
Claude 3 also displays solid visual processing capabilities and can process a wide range of visual formats, including photos, charts, graphs, and technical diagrams. Compared to Claude 2.1, Claude 3 exhibits 2x accuracy and precision for responses and correct answers.
Why does it matter?
In 2024, Gemini and ChatGPT caught the spotlight, but now Claude 3 has emerged as the leader in AI benchmarks. While benchmarks matter, only the practical usefulness of Claude 3 will tell if it is truly superior. This might also prompt OpenAI to release a new ChatGPT upgrade. However, with AI models becoming more common and diverse, it's unlikely that one single model will emerge as the ultimate winner.
Devin: The first AI software engineer
In a groundbreaking development, the US-based startup Cognition AI has unveiled Devin, the world’s first AI software engineer. It is an autonomous agent that solves engineering tasks using its own shell, code editor, and web browser. It performs tasks like planning, coding, debugging, and deploying projects autonomously.
When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art (SOTA) model performance of 1.96% unassisted and 4.80% assisted. It has also successfully passed practical engineering interviews with leading AI companies and even completed real Upwork jobs.
Why does it matter?
Now that Devin is here, the debate of whether AI will replace software engineers is inevitable. However, most production-grade software is too complex, unique, or domain-specific to be fully automated. Perhaps, for now, Devin could start handling simpler tasks and assist developers in quickly prototyping, bootstrapping, and autonomously launching MVP for apps and websites.
ChatGPT now has a body: Figure 01
Figure, in collaboration with OpenAI, has developed a groundbreaking robot called "Figure 01" that can engage in full conversations and execute tasks based on verbal requests, even those that are ambiguous or context-dependent. This is made possible by connecting the robot to a multimodal AI model trained by OpenAI, which integrates language and vision.
The AI model processes the robot's entire conversation history, including images, enabling it to generate appropriate verbal responses and select the most suitable learned behaviors to carry out given commands. The robot's actions are controlled by visuomotor transformers that convert visual input into precise physical movements.
Why does this matter?
As robots become more adept at understanding and responding to human language, questions arise about their autonomy and potential impact on humanity. Collaboration between the robotics industry and AI policymakers is needed to establish regulations for the safe deployment of AI-powered robots. If deployed safely, these robots could become trusted partners, enhancing productivity, safety, and quality of life in various domains.
Nvidia launches 'world's most powerful AI chip'
Nvidia has revealed its new Blackwell B200 GPU and GB200 "superchip", claiming it to be the world's most powerful chip for AI. Both B200 and GB200 are designed to offer powerful performance and significant efficiency gains.
Key takeaways:
The B200 offers up to 20 petaflops of FP4 horsepower, and Nvidia says it can reduce costs and energy consumption by up to 25 times over an H100.
The GB200 "superchip" can deliver 30X the performance for LLM inference workloads while also being more efficient.
Nvidia claims that just 2,000 Blackwell chips working together could train a GPT -4-like model comprising 1.8 trillion parameters in just 90 days.
Why does this matter?
A major leap in AI hardware, the Blackwell GPU boasts redefined performance and energy efficiency. This could lead to lower operating costs in the long run, making high-performance computing more accessible for AI research and development, all while promoting eco-friendly practices.
OpenAI CEO hints at "amazing model,” maybe ChatGPT-5
OpenAI CEO Sam Altman has announced that the company will release an "amazing model" in 2024, although the name has not been finalized. Altman also mentioned that OpenAI plans to release several other important projects before discussing GPT-5, one of which could be the Sora video model.
Altman declined to comment on the Q* project, which is rumored to be an AI breakthrough related to logic. He also expressed his opinion that GPT-4 Turbo and GPT-4 "kind of suck" and that the jump from GPT-4 to GPT-5 could be as significant as the improvement from GPT-3 to GPT-4.
Why does this matter?
This could mean that after Google Gemini and Claude-3’s latest version, a new model, possibly ChatGPT-5, could be released in 2024. Altman's candid remarks about the current state of AI models also offer valuable context for understanding the anticipated advancements and challenges in the field.
Nvidia’s GR00T acts as the minds of robots
Nvidia introduced Project GR00T, a general-purpose multimodal foundation model for humanoids that acts as the minds of robots, making them capable of learning skills to solve a variety of helpful tasks. It enables humanoid robots to take text, speech, videos, or even live demonstrations as input and process them to take specific general actions. It has been developed with the help of Nvidia’s Isaac Robotic Platform tools, including a new Isaac Lab for reinforcement learning.
Why does this matter?
The robots are coming! This can make it very easy to develop and deploy humanoid robots. If the most prominent players in Generative AI, like OpenAI and NVIDIA, are working on embodying AI in the physical world, it's a clear sign that the future of robotics isn't some distant dystopia anymore.
(Source)
That's all for now!
If you are new to The AI Edge, subscribe now and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you tomorrow.😊