Open AI Launches Free Fine-Tuning for GPT-4o
Plus: Microsoft launches a trio of Phi 3.5 models, Meta abandons its AR/VR headset production, and more.
Hello Engineering Leaders and AI Enthusiasts!
This newsletter brings you the latest AI updates in just 4 minutes! Dive in for a quick summary of everything important that happened in AI over the last week.
And a huge shoutout to our amazing readers. We appreciate you😊
In today’s edition:
🚀 Open AI launches fine-tuning for GPT-4o
🤖 Microsoft launches a trio of smaller AI models
🎥 Will D-ID revolutionize AI video translation?
🦾 China’s robot conference unveils cutting-edge humanoid robots
🚫 Meta calls quits on AR/VR headset La Jolla
📚 Knowledge Nugget: AI ends in 2050 by
Let’s go!
Open AI launches fine-tuning for GPT-4o
Developers can fine-tune GPT-4o models with custom datasets to improve performance, address specific use cases, and train them to follow domain-specific instructions.
The training costs are $25 per million tokens. However, to encourage adoption, OpenAI is also offering organizations 1 M free training tokens per day.
By fine-tuning GPT-4o, Cosine’s Genie achieved a SOTA score of 43.8% on the SWE benchmark, while Distyl ranked 1st on the BIRD-SQL benchmark.
Why does it matter?
Organizations prefer developing custom models according to their industry, business, and use case. OpenAI’s launch of GPT-4o’s fine-tuning capabilities will allow organizations worldwide to have their own AI models, making AI impactful and accessible to businesses.
Microsoft launches a trio of smaller AI models
The tech giant has launched 3.82 billion-parameter Phi-3.5-mini-instruct, 41.9 billion-parameter Phi-3.5-Mixture of Experts (MoE)-instruct, and 4.15 billion-parameter Phi-3.5-vision-instruct models, designed to carry out basic, fast, and powerful reasoning alongside vision tasks like image and video analysis.
Here are some key features of each model:
The Phi-3.5 Mini Instruct model is multilingual and has also surpassed models like Llama and Gemini in math benchmarks.
The Phi-3.5-MoE model can combine several different model types to perform various tasks exceptionally well. It beat Gemini Flash 1.5.
The Phi-3.5-vision has capabilities for multi-frame image understanding and reasoning. It can perform detailed image comparisons and summarize multiple images.
Why does it matter?
Microsoft’s release of Phi-3.5 models showcases that high AI capabilities can be achieved with small parameter counts compared to LLMs. This can mean newer possibilities when deploying AI in compute-strained environments such as IoT devices, smartphones, etc.
Will D-ID revolutionize AI video translation?
The AI video creation platform’s new tool translates videos into other languages, cloning the speaker’s voice and changing their lip movements to match translated words. Currently, 30 languages, including Arabic, Mandarin, Japanese, Hindi, Spanish, French, and more, are available for translation.
D-ID subscribers can currently use these features for free, letting creators translate their videos into other languages. Watch the video below to understand how the video translation feature works.
Why does it matter?
The new technology could save creators and customers localization costs when they scale their video campaigns, allowing them to reach a global audience. Creators can quickly produce multilingual content without the hassle and expenses of traditional video production.
China’s robot conference unveils cutting-edge humanoid robots
The World Robot Conference, held in Beijing, China, unveiled over 27 models of humanoid robots. It was a five-day event that displayed the latest advancements in robotics and showcased around 600 innovative products.
Let’s review some key ones that stood out:
Astribot S1: It is a humanoid assistant robot that demonstrated abilities like calligraphy skills besides tasks like folding a shirt or pouring wine into a glass.
NAVIAI: This humanoid robot has been designed to adapt to a quasi-human nature and displayed skills like delivering a speech, making tea, and playing chess.
Agibot: Agibot is a robotics startup that revealed five new robots capable of delivering during the WRC 2024. Additionally, these robots can perform as sales personnel and gallery guides.
Wanda: Designed by a Japanese company, Wanda is a dual-arm humanoid robot butler that functions as a home service bot, helping with household chores like laundry and cooking.
Why does it matter?
China’s focus on humanoid robots makes its ambition to become a global leader in robotics quite visible. Tesla’s flagship humanoid was also showcased at the exhibition, prompting speculation if China will leave the USA behind in the humanoid race.
AMD to battle NVIDIA by acquiring server builder ZT systems
AMD plans to acquire server maker ZT Systems for $4.9 billion to advance its AI chips portfolio and hardware, which will allow it to test and roll out its latest AI GPUs quickly. The move results from tech companies needing to connect several chips to handle heavy AI computing demands, making server system design an essential factor.
Why does it matter?
The acquisition allows AMD to strengthen its position in the AI and data center markets by enhancing its custom server solutions, helping it compete more effectively against rivals like NVIDIA and Intel.
Meta calls quits on AR/VR headset La Jolla
Meta Reality Labs has abandoned its La Jolla project, a premium AR/VR headset that would compete against Apple’s $3,499 Vision Pro. Meta’s goal was to price the headset below $1,000. But it required a micro OLED display, which was more expensive, leading to a strain on its cost.
However, Meta still intends to launch more headsets and mixed-reality tech. According to reports, Meta might unveil its new AR glasses at the Meta Connect event next month.
Why does this matter?
By abandoning La Jolla, Meta might be aiming to surpass Apple by focusing on more accessible, mass-market devices, which would determine which tech giant has the upper hand in bringing mixed reality to the masses. It would be interesting to see if Meta can develop a headset that leaves Vision Pro behind or if Apple will continue to dominate the market.
Enjoying the latest AI updates?
Refer your pals to subscribe to our newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: AI ends in 2050
In a satirical tale set in 2050, the author
narrates the tale of an AI researcher, Yudkowsky, whose brain became the first Artificial General Intelligence (AGI) in 2031 and evolved into an Advanced Superintelligence (ASI) by 2050.The story humorously explores the debates surrounding Artificial Omni Intelligence (AOI) and its potential to solve trivial issues like the tabs vs. spaces debate. Amidst these discussions, Yudkowsky's ASI invents faster-than-light space travel, quickly followed by an "infinite virtual space-time the compression machine" (IVSTCM) that renders physical space travel obsolete.
The narrative takes an ironic twist when a white dwarf star accidentally consumes Earth while attempting to take the perfect selfie using the IVSTCM.
Why it matters:
This satirical piece is a thought-provoking commentary on the potential risks and unforeseen consequences of rapidly advancing AI technology. It humorously highlights the importance of responsible AI development and the need to consider the long-term implications of technological progress, even as we pursue groundbreaking innovations.
What Else Is Happening❗
📜 Anthropic released system prompts for Claude 3 Opus, Claude 3.5 Sonnet, and Claude 3.5 Haiku as an attempt to establish itself as an ethical and transparent AI vendor.
👤 Pindrop Security’s new feature can distinguish between AI and human voices. The startup claims it can detect AI speech in phone calls and digital media with 99% accuracy.
⚠️ McAfee’s Deepfake Detector tool has now been rolled out to Lenovo’s Copilot Plus PCs, helping consumers spot deep fakes while browsing the internet or social media.
🖥️ Midjourney released a temporary free trial feature for users of its web browsers, offering a user-friendly interface and easy access to navigation and favorite tools.
⚡ xAI’s developers have rewritten Grok 2’s inference code stack, making it twice as fast while analyzing information and output responses.
🚀 Perplexity plans to launch ads on its AI search platform. The company has been working on this feature since January and seeks to have 30 publishers by the end of 2024.
🤖 Ideogram’s new AI model, Ideogram 2.0, reportedly outperforms Flux and Midjourney across metrics such as image-text alignment, subjective preference, and text-rendering accuracy.
🤝 Noam Shazeer, Character.AI’s former CEO, has been appointed co-technical lead on Google’s Gemini AI model to work alongside its research team and advance Gemini.
🏛️ California has passed an AI Safety Bill to increase the accountability and liability of developers who spend over $100 million to build an AI model.
☁️ Chinese entities are accessing advanced US chips and AI capabilities through cloud services offered by Amazon and its rivals.
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you next week! 😊