Microsoft's Orca AI Beats 10x Bigger Models In Math
Plus: GPT-4V is winning at turning design to code, DeepMind alums' launches Haiper
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 225th edition of The AI Edge newsletter. This edition brings you Microsoft’s new math model and how it is outperforming larger models.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🏆
Microsoft's Orca AI beats 10x bigger models in math
🎨 GPT-4V wins at turning designs into code
🎥 DeepMind alums' Haiper joins the AI video race
📚 Knowledge Nugget: Is Synthetic Data the Key to AGI? by
Let’s go!
Microsoft's Orca AI beats 10x bigger models in math
Microsoft's Orca team has developed Orca-Math, an AI model that excels at solving math word problems despite its compact size of just 7 billion parameters. It outperforms models ten times larger on the GSM8K benchmark, achieving 86.81% accuracy without relying on external tools or tricks. The model's success is attributed to training on a high-quality synthetic dataset of 200,000 math problems created using multi-agent flows and an iterative learning process involving AI teacher and student agents.
The Orca team has made the dataset publicly available under the MIT license, encouraging researchers and developers to innovate with the data. The small dataset size highlights the potential of using multi-agent flows to generate data and feedback efficiently.
Why does this matter?
Orca-Math's breakthrough performance shows the potential for smaller, specialized AI models in niche domains. This development could lead to more efficient and cost-effective AI solutions for businesses, as smaller models require less computational power and training data, giving companies a competitive edge.
GPT-4V wins at turning designs into code
With unprecedented capabilities in multimodal understanding and code generation, GenAI can enable a new paradigm of front-end development where LLMs directly convert visual designs into code implementation. New research formalizes this as “Design2Code” task and conduct comprehensive benchmarking. It also:
Introduces Design2Code benchmark consisting of diverse real-world webpages as test examples
Develops comprehensive automatic metrics that complement human evaluations
Proposes new multimodal prompting methods that improve over direct prompting baselines.
Finetunes open-source Design2Code-18B model that matches the performance of Gemini Pro Vision on both human and automatic evaluation
Moreover, it finds 49% of the GPT-4V-generations webpages were good enough to replace the original references, while 64% were even better designed than the original references.
Why does this matter?
This research could simplify web development for anyone to build websites from visual designs using AI, much like word processors made writing accessible. For enterprises, automating this front-end coding process could improve collaboration between teams and speed up time-to-market across industries if implemented responsibly alongside human developers.
DeepMind alums' Haiper joins the AI video race
DeepMind alums Yishu Miao and Ziyu Wang have launched Haiper, a video generation tool powered by their own AI model. The startup offers a free website where users can generate short videos using text prompts, although there are limitations on video length and quality.
The company has raised $19.2 million in funding and focuses on improving its AI model to deliver high-quality, realistic videos. They aim to build a core video generation model that can be offered to developers and address challenges like the "uncanny valley" problem in AI-generated human figures.
Why does this matter?
Haiper signals the race to develop video AI models that can disrupt industries like marketing, entertainment, and education by allowing businesses to generate high-quality video content cost-effectively. However, the technology is at an early stage, so there is room for improvement, highlighting the need for responsible development.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: Is Synthetic Data the Key to AGI?
In this article,
explores how modern AI models rely heavily on vast amounts of high-quality training data, and we may soon exhaust the available supply. Estimates suggest we currently have about 9 trillion words of useful data, growing around 4-5% annually, but achieving human-level AI could require 100,000 to 1,000,000 times more. Challenges like paywalled datasets, copyright claims, and dilution by AI-generated content further complicate data acquisition.However, synthetic data - artificially generated by machines for self-training - could be the key to overcoming this limitation. Examples like AlphaZero for chess and OpenAI's Sora for video synthesis show the potential. Techniques like AI-assisted dataset pruning and rephrasing are also being explored to improve efficiency. If these approaches succeed for text data, AI progress could speed up dramatically, reshaping internet business models, causing antitrust concerns, and potentially limiting open-source initiatives without coordinated data access efforts.
Why does this matter?
Limited high-quality training data could slow down AI development. Companies with exclusive access to valuable datasets, like social media giants or publishers, might build powerful AI models. But this will widen the gap between large and small players, forcing businesses to strategically acquire data or form partnerships to stay competitive.
What Else Is Happening❗
📸 Kayak's AI finds cheaper flights from screenshots
Kayak introduced two new AI features: PriceCheck, which lets users upload flight screenshots to find cheaper alternatives and Ask Kayak, a ChatGPT-powered travel advice chatbot. These additions position Kayak alongside other travel sites, using generative AI to improve trip planning and flight price comparisons in a competitive market. (Link)
🎓 Accenture invests $1B in LearnVantage for AI upskilling
Accenture is launching LearnVantage, investing $1 billion over three years to provide clients with customized technology learning and training services. Accenture is also acquiring Udacity to scale its learning capabilities and meet the growing demand for technology skills, including generative AI, so organizations can achieve business value using AI. (Link)
🤝 Snowflake brings Mistral's LLMs to its data cloud
Snowflake has partnered with Mistral AI to bring Mistral's open LLMs into its Data Cloud. This move allows Snowflake customers to build LLM apps directly within the platform. It also marks a significant milestone for Mistral AI, which has recently secured partnerships with Microsoft, IBM, and Amazon. The deal positions Snowflake to compete more effectively in the AI space and increases Mistral AI visibility. (Link)
🛡️ Dell & CrowdStrike unite to fight AI threats
Dell and CrowdStrike are partnering to help businesses fight cyberattacks using AI. By integrating CrowdStrike's Falcon XDR platform into Dell's MDR service, they aim to protect customers against threats like generative AI attacks, social engineering, and endpoint breaches. (Link)
📱 AI app diagnoses ear infections with a snap
Physician-scientists at UPMC and the University of Pittsburgh have developed a smartphone app that uses AI to accurately diagnose ear infections (acute otitis media) in young children. The app analyzes short videos of the eardrum captured by an otoscope connected to a smartphone camera. It could help decrease unnecessary antibiotic use by providing a more accurate diagnosis than many clinicians. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊