AI Weekly Rundown (February 3 to February 9)
Major AI announcements from Google, Hugging Face, Apple, OpenAI, and more.
Hello Engineering Leaders and AI Enthusiasts!
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
📱 Google’s MobileDiffusion: AI Image generation in <1s on phones
🤖 Hugging Face enables custom chatbot creation in 2-clicks
🚀 Google to release ChatGPT Plus competitor 'Gemini Advanced'
🆕 Qwen 1.5: Alibaba's 72 B, multilingual GenAI model
🏛️ AI software reads ancient words unseen since Caesar's era
⌚️ Roblox users can chat cross-lingually in milliseconds
🖌️ Apple’s MGIE: A Breakthrough in image editing AI
🏷️ Meta will label AI-generated images on all its platforms
👑 Smaug-72B: The king of open-source AI is here!
🦾 Microsoft pushes Copilot ahead of the Super Bowl
🧠 Deepmind presents ‘self-discover’ framework for LLM improvement
🎥 YouTube reveals plans to use AI tools to empower human creativity
🎭 Google Bard is dead, and Gemini Advanced is in!
🔄 OpenAI is developing 2 types of AI agents to automate work
👓 Brilliant Labs announces multimodal AI glasses, with Perplexity's AI
Let’s go!
Google MobileDiffusion: AI Image generation in <1s on phones
Google Research introduced MobileDifussion, which can generate images from Android and iPhone with a resolution of 512*512 pixels in about half a second. What’s impressive about this is its comparably small model size of just 520M parameters, which makes it uniquely suited for mobile deployment. This is significantly less than the Stable Diffusion and SDX, which boast a billion parameters.
MobileDiffusion can also enable a rapid image generation experience while typing text prompts. Real-time!
Hugging Face enables custom chatbot creation in 2-clicks
Hugging Face tech lead Philipp Schmid announced on X that users can now create custom chatbots in “two clicks” using “Hugging Chat Assistant.” Users’ creations are then publicly available. Schmid compares the feature to OpenAI’s GPTs feature and adds they can use “any available open LLM, like Llama2 or Mixtral.”
Google to release ChatGPT Plus competitor 'Gemini Advanced' next week
A leaked web text suggested Google might release its ChatGPT Plus competitor named "Gemini Advanced" and a name change for Bard. The Gemini Advanced ChatBot will be powered by the eponymous Gemini model in the Ultra 1.0 release.
According to Google, Gemini Advanced is far more capable of complex tasks like coding, logical reasoning, following nuanced instructions, and creative collaboration. Google also wants to include multimodal capabilities, coding features, and detailed data analysis. Currently, the model is optimized for English but can respond to other global languages sooner.
Qwen 1.5: Alibaba's 72 B, multilingual Gen AI model
Alibaba released Qwen 1.5, the latest iteration of its open-source generative AI model series. Key upgrades include expanded model sizes up to 72 billion parameters, integration with HuggingFace Transformers for easier use, and multilingual capabilities covering 12 languages.
Comprehensive benchmarks demonstrate significant performance gains over the previous Qwen version across metrics like reasoning, human preference alignment, and long-context understanding. Here’s a comparison with GPT-3.5.
The unified release aims to provide researchers and developers an advanced foundation model for possible downstream applications. Quantized versions allow low-resource deployment. Overall, Qwen 1.5 represents steady progress towards Alibaba's goal of creating a "truly 'good'' generative model aligned with ethical objectives.
AI software reads ancient words unseen since Caesar's era
Nat Friedman (former CEO of Github) uses AI to decode ancient Herculaneum scrolls charred in the 79AD eruption of Mount Vesuvius. These unreadable scrolls are believed to contain a vast trove of texts that could reshape our view of figures like Caesar and Jesus Christ.
A $1 million AI contest was launched ten months ago, attracting coders worldwide. The winning method successfully reconstructed over a dozen readable columns of Greek text from one scroll using AI.
Roblox users can chat cross-lingually in milliseconds
Roblox has developed a real-time multilingual chat translation system, allowing users speaking different languages to communicate seamlessly while gaming. It required building a high-speed unified model covering 16 languages rather than separate models. Comprehensive benchmarks show the model outperforms commercial APIs in translating Roblox slang and linguistic nuances.
Roblox aims to eventually support all linguistic communities on its platform as translation capabilities expand. Long-term goals include exploring automatic voice chat translation to better convey tone and emotion.
Apple’s MGIE: Making the sky bluer with each prompt!
Apple released a new open-source AI model called MGIE(MLLM Guided Image Editing). It has editing capabilities based on natural language instructions. MGIE leverages multimodal large language models to interpret user commands and perform pixel-level image manipulation. It can handle editing tasks like Photoshop-style modifications, optimizations, and local editing.
Meta will label your content if you post an AI-generated image
Meta is developing advanced tools to label metadata for each image posted on their platforms like Instagram, Facebook, and Threads. Labeling will be aligned with “AI-generated” information in the C2PA and IPTC technical standards. These standards will allow Meta to detect AI-generated images from other platforms like Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock.
Smaug-72B: The king of open-source AI is here!
Abacus AI released a new open-source language model called Smaug-72B. It outperforms GPT-3.5 and Mistral Medium in several benchmarks. Smaug 72B is the first open-source model with an average score of over 80 in major LLM evaluations. According to the latest rankings from Hugging Face, it is one of the leading platforms for NLP research and applications. Smaug 72B is a fine-tuned version of Aliaba’s Qwen 72B.
Enjoying the weekly updates?
Refer your pals to subscribe to our newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Microsoft pushes Copilot ahead of the Super Bowl
Microsoft announced updates to their Android and iOS applications to make the user interface more sleek and user-friendly, along with a carousel for follow-up prompts.
It also introduced new features to Designer in Copilot to take image generation a step further with the option to edit generated images using follow-up prompts. The customizations can be anything from highlighting the image subject to enhancing colors and modifying the background. For Copilot Pro users, additional features such as resizing the images and changing the aspect ratio are also available.
Google Deepmind presents GPT-4 performance improvement using ‘self-discover’ framework
Google Deepmind, with the University of Southern California, has proposed a ‘self-discover’ prompting framework to enhance the performance of LLMs. Models such as GPT-4 and Google’s Palm 2 have witnessed a performance improvement on challenging reasoning benchmarks by 32% compared to the Chain of Thought (CoT) framework.
The framework also works with 10-40 times less inference computation, which means that the output will be generated faster using the same computational resources.
YouTube reveals plans to use AI tools to empower human creativity
YouTube CEO Neal Mohan revealed 4 new bets they have placed for 2024, with the first bet being on AI tools to empower human creativity on the platform. These AI tools include:
Dream Screen, which lets content creators generate custom backgrounds through AI with simple prompts of an idea.
Dream Track will allow content creators to generate custom music by just typing in the music theme and the artist they want to feature.
These new tools are mainly aimed to be used in YouTube Shorts and highlight a priority to move towards short-form content.
Google Bard is dead, Gemini Advanced is in!
Google has rebranded Bard to Gemini.
Google launches Gemini Advanced
Google launched the Gemini Advanced chatbot with its Ultra 1.0 AI model.
Google rollouts Gemini mobile apps
Gemini’s also moving into Android and iOS phones as pocket pals ready to share creative fire 24/7 via voice commands, screen overlays, or camera scans. The ‘droid rollout has started for the US and some Asian countries. It will gradually expand globally.
OpenAI is developing AI agents to automate work
OpenAI is developing AI "agents" that can autonomously take over a user's device and execute multi-step workflows.
One type of agent takes over a user's device and automates complex workflows between applications, like transferring data from a document to a spreadsheet for analysis. This removes the need for manual cursor movements, clicks, and typing between apps.
Another agent handles web-based tasks like booking flights or creating itineraries without needing access to APIs.
While OpenAI's ChatGPT can already do some agent-like tasks using APIs, these AI agents will be able to do more unstructured, complex work with little explicit guidance.
Brilliant Labs announces multimodal AI glasses, with Perplexity's AI
Brilliant Labs announces Frames
While Apple hogged the spotlight with its chunky new Vision Pro, a Singapore startup, Brilliant Labs, quietly showed off its AR glasses packed with a multi-modal voice/vision/text AI assistant named Noa.
These lightweight smart glasses, dubbed “Frame,” are powered by models like GPT-4 and Stable Diffusion. The best part is that programmers can build on these AI glasses thanks to their open-source design.
Perplexity to integrate AI Chatbot into the Frames
Noa would also provide rapid answers using Perplexity's real-time chatbot so Frame responses stay sharp.
That's all for now!
Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you on Monday. 😊