Google Says "Step Aside, ChatGPT!"
Google Speeds Up AI Game: Launches new methods for training robots, announces Instruct-Imagen, and reportedly develops an advanced paid version of Bard.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 182nd edition of The AI Edge newsletter. This edition brings you Google speeding up its AI game to overtake GPT-4.
And a huge shoutout to our incredible readers. We appreciate you😊
In today’s edition:
🤖 Google’s new methods for training robots with video and LLMs
and📢
Google DeepMind announced Instruct-Imagen for complex image-gen tasks
💰 Google reportedly developing paid Bard powered by Gemini Ultra
💡 Knowledge Nugget: 2024 AI Predictions by
Let’s go!
Google’s new methods for training robots with video and LLMs
Google's DeepMind Robotics researchers have announced three advancements in robotics research: AutoRT, SARA-RT, and RT-Trajectory.
1) AutoRT combines large foundation models with robot control models to train robots for real-world tasks. It can direct multiple robots to carry out diverse tasks and has been successfully tested in various settings. The system has been tested with up to 20 robots at once and has collected over 77,000 trials.
2) SARA-RT converts Robotics Transformer (RT) models into more efficient versions, improving speed and accuracy without losing quality.
3) RT-Trajectory adds visual outlines to training videos, helping robots understand specific motions and improving performance on novel tasks. This training method had a 63% success rate compared to 29% with previous training methods.
Why does this matter?
Google’s 3 advancements will bring us closer to a future where robots can understand and navigate the world like humans. It can potentially unlock automation's benefits across sectors like manufacturing, healthcare, and transportation.
Google DeepMind announced Instruct-Imagen for complex image-gen tasks
Google released Instruct-Imagen: Image Generation with Multi-modal Instruction, A model for image generation that uses multi-modal instruction to articulate a range of generation intents. The model is built by fine-tuning a pre-trained text-to-image diffusion model with a two-stage framework.
- First, the model is adapted using retrieval-augmented training to enhance its ability to ground generation in an external multimodal context.
- Second, the model is fine-tuned on diverse image generation tasks paired with multi-modal instructions. Human evaluation shows that instruct-imagen performs as well as or better than prior task-specific models and demonstrates promising generalization to unseen and more complex tasks.
Why does this matter?
Instruct-Imagen highlights Google's command of AI necessary for next-gen applications. This demonstrates Google's lead in multi-modal AI - using both images and text to generate new visual content. For end users, it enables the creation of custom visuals from descriptions. For creative industries, Instruct-Imagen points to AI tools that expand human imagination and productivity.
Google reportedly developing paid Bard powered by Gemini Ultra
Google is reportedly working on an upgraded, paid version of Bard - "Bard Advanced," which will be available through a paid subscription to Google One. It might include features like creating custom bots, an AI-powered "power up" feature, a "Gallery" section to explore different topics and more. However, it is unclear when these features will be officially released.
All screenshots were leaked by @evowizz on X.
Why does this matter?
This shows Google upping its AI game to directly compete with ChatGPT. For end users, it means potentially more advanced conversational AI. Competitors like OpenAI pressure Google to stay ahead. And across sectors like education, finance, and healthcare, Bard Advanced could enable smarter applications.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: 2024 AI Predictions
This interesting article by
and predicts various developments in the field of generative AI and language models for the year 2024. The predictions are:OpenAI's progress in cost-cutting will continue, but they will not release GPT-5.
GPT-4 will significantly reduce costs and be at the top of the LMSys Leaderboard.
OpenAI is expected to have more enterprise usage than Amazon and Google combined.
The article also discusses the future of open-source LLMs, funding rounds, and the involvement of government agencies. Additionally, it predicts advancements in fine-tuning services and multimodal capabilities. The article concludes with some wild predictions generated by an LLM.
Why does this matter?
These AI developments predictions promise to transform how we work, create, and communicate. It indicates how AI advancements may enable new applications and value creation.
What Else Is Happening❗
🔑 Microsoft introduces a new Copilot key to Windows 11 PCs
Marking a significant step towards a more personal and intelligent computing future. They aim to integrate AI into Windows seamlessly and simplify + amplify the computing experience. The Copilot key, joining the Windows key, will invoke the Copilot in Windows experience, making it easier for users to engage with AI. The key will be available on new Windows 11 PCs from ecosystem partners, including upcoming Surface devices. (Link)
🛒 OpenAI will launch its custom ChatGPT store next week
It will allow users to share and sell their customized AI agents. The store was initially scheduled to launch in November but was delayed for various reasons. It will enable users to monetize their GPT creations and share them with others. The company plans to pay GPT creators based on the usage of their AI agents, although further details about the payment plan have not been disclosed. The store will be available to ChatGPT Plus and enterprise subscribers. (Link)
💰 OpenAI offers media outlets as little as $1M to use their news articles to train AI models like ChatGPT
The proposed licensing fees of $1 million to $5 million are considered small even for small publishers. OpenAI is reportedly negotiating with up to a dozen media outlets, focusing on global news operations. The company has previously signed deals with Axel Springer and the Associated Press, with Axel Springer receiving tens of millions of dollars over several years. (Link)
🖼️ Researchers from the University of California, Los Angeles, and Snap have developed a method for personalized image restoration called Dual-Pivot Tuning
It is an approach used to customize a text-to-image prior in the context of blind image restoration. It leverages personal photos to customize image restoration models, better preserving individual facial features. (Link)
🤖 CES 2024 tech trade show in Las Vegas will focus on AI: What To Expect?
AI will be the show's major theme and focus, with companies like Intel, Walmart, Best Buy, and Snap expected to showcase AI-enabled products and services.
Generative AI art was used to create the CES 2024 promotional imagery. GenAI, more broadly will have a big presence.
AR & VR headsets will be showcased, with companies like Meta, Vuzix, and others exhibiting. This is timed with the expected launch of Apple's headset in 2024.
Robots across categories like vacuums, bartenders, and restaurants will be present, and much more. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, We break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊