AI Weekly Rundown (April 27 to May 3)
Major AI announcements from Apple, Amazon, Google, OpenAI, Scale AI, and more.
Hello Engineering Leaders and AI Enthusiasts!
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
🍎 iOS 18 may have OpenAI-powered gen AI Capabilities
🎥 China's Vidu generates 16-second 1080P videos, matching OpenAI's Sora
🤖 New S1 robot mimics human-like movements, speed, and precision
🚀 Gradient AI releases Llama-3 8B with 1M context
🤔 Mysterious “gpt2-chatbot” AI model bemuses experts
💻 GitHub’s Copilot Workspace turns ideas into AI-powered software
🏆 Amazon launches Amazon Q, the world’s most capable Gen AI assistant
🏥 Google’s Med-Gemini models outperform doctors
🕵️♂️ Apple has set up a secretive AI lab in Switzerland
📈 Better and faster LLMs via multi-token prediction: New research
📱 Anthropic launches an iOS app and a new plan for teams
💸 Google's AI advancements urged Microsoft's billion-$ OpenAI investment
🔍 Scale AI’s study finds popular LLMs overfit public benchmarks
🌍 Ukraine debuts the world's first AI diplomat, Victoria Shi
🧠 Sam Altman is ready to spend $50 billion a year to build AGI
Let’s go!
iOS 18 may have OpenAI-powered gen AI capabilities
Apple has reportedly reinitiated talks with OpenAI to incorporate generative AI capabilities into the upcoming iOS 18 operating system, which will power the next generation of iPhones. The tech giant has been quietly exploring ways to enhance Siri and introduce new AI-powered features across its ecosystem. As of now, the companies are reportedly actively negotiating the terms of the agreement.
Apple is also in discussions with Google about licensing its Gemini chatbot technology. As of now, Apple hasn't made a final decision on which partners it will work with, and there's no guarantee that a deal will be finalized. The company may ultimately reach agreements with both OpenAI and Google or choose another provider entirely.
China's Vidu generates 16-second 1080P videos, matching OpenAI's Sora
At the ongoing Zhongguancun Forum in Beijing, Chinese tech firm ShengShu-AI and Tsinghua University have unveiled Vidu, a text-to-video AI model. Vidu is said to be the first Chinese AI model on par with OpenAI's Sora, capable of generating 16-second 1080P video clips with a single click. The model is built on a self-developed visual transformation model architecture called Universal Vision Transformer (U-ViT), which integrates two text-to-video AI models: the Diffusion and the Transformer.
During a live demonstration, Vidu showcased its ability to simulate the real physical world, generating scenes with complex details that adhere to real physical laws, such as realistic light and shadow effects and intricate facial expressions. Vidu has a deep understanding of Chinese factors and can generate images of unique Chinese characters like pandas and loong (Chinese dragons).
New S1 robot mimics human-like movements, speed, and precision
Chinese robotics firm Astribot, a subsidiary of Stardust Intelligence, has previewed its advanced humanoid robot assistant, the S1. In a recently released video, the S1 shows remarkable agility, dexterity, and speed while doing various household tasks, marking a significant milestone in the development of humanoid robots.
Utilizing imitation learning, the S1 robot can execute intricate tasks at a pace matching adult humans. The video showcases the robot's impressive capabilities, like smoothly pulling a tablecloth from beneath a stack of wine glasses, opening and pouring wine, delicately shaving a cucumber, flipping a sandwich, etc. Astribot claims that the S1 is currently undergoing rigorous testing and is slated for commercial release in 2024.
Gradient AI releases Llama-3 8B with 1M context
Gradient AI has released a new Llama-3 8B language model version called Llama-3-8B-Instruct-Gradient-1048k. This model's key feature is its ability to handle extremely long context lengths up to 1 million tokens.
To extend the context window to 1 million tokens, Gradient AI used techniques like NTK-aware initialization of positional encodings, progressive training on increasing context lengths similar to prior work on long context modeling, and optimizations to train on huge GPU clusters efficiently. The model was trained on 1.4 billion tokens, a tiny fraction of Llama-3's original pretraining data.
Mysterious “gpt2-chatbot” AI model bemuses experts
A mysterious new AI model called “gpt2-chatbot” is going viral. It was released without official documentation, and there is speculation that it could be OpenAI's next model.
gpt2-chatbot shows incredible reasoning skills. It also gets difficult AI questions right with a more human-like tone.
On a math test, gpt2-chatbot solved an International Math Olympiad (IMO) problem in one try. This does not apply to all IMO problems, but it is still insanely impressive.
Also, many AI experts discuss the gpt2-chatbot's better coding skills than the newest version, GPT-4 or Claude Opus. Without official documentation, we still don’t know who released it and for what purpose.
However, there are a couple of speculations going around in the industry that gpt2-chatbot is:
It's secretly GPT-5 released early OpenAI can benchmark it
It's OpenAI's GPT-2 from 2019 finetuned with modern assistant datasets
You can try out gpt2-chatbot for free by visiting https://chat.lmsys.org direct chat. Unfortunately, with so many people trying it right now, there are slow response times and a maximum of 8 turns per conversation.
GitHub’s Copilot Workspace turns ideas into AI-powered software
GitHub is releasing a new AI-powered developer environment called Copilot Workspace. It allows developers to turn an idea into software code using natural language and provides AI assistance throughout the development process—planning the steps, writing the actual code, testing, debugging, etc.
The developer just needs to describe what they want in plain English, and Copilot Workspace will generate a step-by-step plan and the code itself. By automating repetitive tasks and providing step-by-step plans, Copilot Workspace aims to reduce developers' cognitive strain and enable them to focus more on creativity and problem-solving. This new Copilot-native developer environment is designed for any device, making it accessible to developers anywhere.
Amazon has launched Amazon Q, a Gen AI assistant for businesses and developers
Amazon has launched Amazon Q, a generative AI assistant designed for developers and businesses. It comes in three distinct offerings:
Amazon Q Developer frees up precious time by handling tedious tasks like testing, debugging, and optimizing AWS resources so developers can focus on core coding and innovation.
Amazon Q Business connects to 40+ enterprise data sources and equips employees with a data-driven digital assistant to answer questions, create reports, and provide insights based on enterprise data repositories.
Amazon Q Apps allows non-technical employees to build generative AI applications using natural language prompts.
Amazon is driving real-world impact by offering a free tier for Q Developer and reporting early customer productivity gains of over 80%. Amazon Q Developer Pro is available for $19/user/month and Amazon Q Business Pro for $20/user/month. A free trial of both Pro tiers is available until June 30, 2024.
Google’s Med-Gemini models outperform doctors
Researchers from Google and DeepMind have introduced Med-Gemini, a family of highly capable multimodal AI models specialized in medicine. Based on the strengths of the Gemini models, Med-Gemini shows significant improvements in clinical reasoning, multimodal understanding, and long-context understanding. Models can be customized to fit novel medical modalities through specialized encoders, and web searches can be used for up-to-date information.
Med-Gemini has shown state-of-the-art performance on 10 of 14 medical benchmarks, including text, multimodal, and long-context applications. Moreover, the models achieved 91.1% accuracy on the MedQA (USMLE) benchmark, exceeding the previous best models by 4.6%. Its strong performance in summarizing medical notes, generating clinical referral letters, and answering electronic health record questions confirms Med-Gemini's potential real-world use.
Apple has set up a secretive AI lab in Switzerland
Since 2018, the company has quietly hired 36 AI experts from Google, including notable figures like Bengio and Ruoming Pang, for its secretive "Vision Lab." The lab focuses on building advanced AI models and products, and it is particularly interested in text and visual-based AI systems akin to OpenAI's ChatGPT. Apple has also acquired AI startups FaceShift and Fashwall, which are likely contributing to the establishment of the new lab.
Enjoying the weekly updates?
Refer your pals to subscribe to our newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Better and faster LLMs via multi-token prediction: New research
New research, apparently from Meta, has proposed a novel approach to training language models (LMs). It suggests that training LMs to predict multiple future tokens at once instead of predicting only the next token in a sequence results in higher sample efficiency. The architecture is simple, with no train time or memory overhead.
Figure: Overview of multi-token prediction
The research also provides experimental evidence that this training paradigm is increasingly useful for larger models and in particular, shows strong improvements for code tasks. Multi-token prediction also enables self-speculative decoding, making models up to 3 times faster at inference time across a wide range of batch sizes.
Anthropic launches an iOS app and a new plan for teams
Anthropic, the creator of the Claude 3 AI models, released a new iOS app named Claude. The app enables users to access AI models, chat with them, and analyze images by uploading them.
Anthropic also introduced a paid team plan, offering enhanced features like more chat queries and admin control for groups of five or more. The app is free for all users of Claude AI models, including free users, Claude Pro subscribers, and team plan members. The company will also roll out an Android version soon.
Google's AI advancements may have urged Microsoft's billion-dollar OpenAI investment
Internal emails have revealed that Microsoft invested $1 billion in OpenAI in 2019 out of fear that Google was significantly ahead in its AI efforts.
Microsoft CTO Kevin Scott sent a lengthy email to CEO Satya Nadella and Bill Gates stating Google’s AI-powered “auto complete in Gmail” was getting “scarily good” and added that Microsoft was years behind in terms of ML scale.
The emails, with the subject line “Thoughts on OpenAI,” were made public on Tuesday as part of the Department of Justice's antitrust case against Google. A large section of Scott's email was redacted. Check out the email here.
How much do LLMs overfit public benchmarks?
A new study by Scale AI raises concerns about the reliability of LLM benchmark tests. It uncovers LLM overfitting by evaluating them on a new (designed from scratch) dataset, GSM1k that mimics a popular benchmark, GSM8k.
Key findings:
Many LLMs performed significantly worse on GSM1k compared to GSM8k, with some models dropping by as much as 13%. This suggests they've simply memorized the answers to benchmark problems rather than learning true reasoning skills.
Certain LLM families, particularly Mistral and Phi, showed consistent overfitting across different model sizes.
Newer, more advanced LLMs showed minimal signs of overfitting, suggesting they may be achieving genuine reasoning abilities.
Analysis suggests data contamination from benchmark sets may be one factor contributing to overfitting.
Even overfitting models exhibited some capability to solve novel problems, although not at the level their benchmark scores suggested.
Ukraine debuts the world's first AI diplomat
Ukraine has deployed the world's first AI-generated digital spokesperson named Victoria Shi to deliver official statements on behalf of the country's Ministry of Foreign Affairs.
While the visual avatar is AI-generated, the statements will be written and verified by human diplomats. This move aims to save Ukrainian diplomats time and resources.
The main points about the AI diplomat are:
Victoria Shi's voice and tone are modeled after Rosalie Nombre, a Ukrainian singer and TV celebrity who participated free of charge.
Each statement read by Shi will include a unique QR code linking to the official text on the Ministry's website to combat deepfake issues.
Shi was created by a team called The Game Changers, who previously made content related to the war in Ukraine.
Sam Altman’s stance on the future of AI
During a recent appearance at Stanford University, Altman talked about the future of AI, calling GPT-4, a currently impressive AI model, to be the “dumbest model” compared to future iterations. According to Altman, the future is dominated by "intelligent agents," AI companions that can not only follow instructions but also solve problems, brainstorm solutions, and even ask clarifying questions.
Their next-generation model, GPT-5, is rumored for a mid-2024 release and might boast video generation capabilities alongside text and image.
But the real moonshot is their active participation in developing AGI.
Despite the significant costs involved, Altman remains undeterred. He believes that the potential benefits, such as solving complex problems across various industries, outweigh the financial burden.
Watch the whole Q&A session here.
That's all for now!
Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you on Monday. 😊