xAI's First Multimodal Model with a Unique Dataset
Plus: Infini-Attention: Google's breakthrough gives LLMs limitless context, Adobe's Firefly AI trained on competitor's images: Bloomberg report
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 253rd edition of The AI Edge newsletter. This edition brings you details on xAI’s first multimodal AI model and its unique dataset.
And a huge shoutout to our incredible readers. We appreciate you😊
In today’s edition:
📊 xAI’s first multimodal model with a unique dataset
♾️ Infini-Attention: Google's breakthrough gives LLMs limitless context
⚠️ Adobe's Firefly AI trained on competitor's images: Bloomberg report
💡 Knowledge Nugget: The State of Progress in AI by
Let’s go!
xAI’s first multimodal model with a unique dataset
xAI, Elon Musk’s AI startup, has released the preview of Grok-1.5V, its first-generation multimodal AI model. This new model combines strong language understanding capabilities with the ability to process various types of visual information, like documents, diagrams, charts, screenshots, and photographs.
The startup claims Grok-1.5V has shown competitive performance across several benchmarks, including tests for multidisciplinary reasoning, mathematical problem-solving, and visual question answering. One notable achievement is its exceptional performance on the RealWorldQA dataset, which evaluates real-world spatial understanding in AI models.
Developed by xAI, this dataset features over 700 anonymized images from real-world scenarios, each accompanied by a question and verifiable answer. The release of Grok-1.5V and the RealWorldQA dataset aims to advance the development of AI models that can effectively comprehend and interact with the physical world.
Why does this matter?
What makes Grok-1.5V unique is its integration with the RealWorldQA dataset, which focuses on real-world spatial understanding crucial for AI systems in physical environments. The public availability of this dataset could significantly advance the development of AI-driven robotics and autonomous systems. With Musk's backing, xAI could lead in multimodal AI and contribute to reshaping human-AI interaction.
Infini-Attention: Google's breakthrough gives LLMs limitless context
Google researchers have developed a new technique called Infini-attention that allows LLMs to process text sequences of unlimited length. By elegantly modifying the Transformer architecture, Infini-attention enables LLMs to maintain strong performance on input sequences exceeding 1 million tokens without requiring additional memory or causing exponential increases in computation time.
The key innovation behind Infini-attention is the addition of a "compressive memory" module that efficiently stores old attention states once the input sequence grows beyond the model's base context length. This compressed long-range context is then aggregated with local attention to generate coherent and contextually relevant outputs.
In benchmark tests on long-context language modeling, summarization, and information retrieval tasks, Infini-attention models significantly outperformed other state-of-the-art long-context approaches while using up to 114 times less memory.
Why does this matter?
Infini-attention can help AI systems expertly organize, summarize, and surface relevant information from vast knowledge bases. Additionally, infinite contextual understanding can help AI systems generate more nuanced and contextually relevant long-form content like articles, reports, and creative writing pieces. Overall, we can expect AI tools to generate more valuable and less generic content with this technique.
Adobe's Firefly AI trained on competitor's images: Bloomberg report
In a surprising revelation, Adobe's AI image generator Firefly was found to have been trained not just on Adobe's own stock photos but also on AI-generated images from rival platforms like Midjourney and DALL-E. The Bloomberg report, which cites insider sources, notes that while these AI images made up only 5% of Firefly's training data, their inclusion has sparked an internal ethics debate within Adobe.
The news is particularly noteworthy given Adobe's public emphasis on Firefly's "ethical" sourcing of training data, a stance that aimed to differentiate it from competitors. The company had even set up a bonus scheme to compensate artists whose work was used to train Firefly. However, the decision to include AI-generated images, even if labeled as such by the submitting artists, has raised questions about the consistency of Adobe's ethical AI practices.
Why does it matter?
As AI systems learn from one another in a continuous feedback loop, the distinction between original creation, inspiration, and imitation becomes blurred. This raises complex issues around intellectual property rights, consent, and the difference between remixing and replicating. Moreover, the increasing prevalence of AI-generated content in training data sets could lead to a homogenization of AI outputs, potentially stifling creativity and diversity.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: The State of Progress in AI
In this thought-provoking piece,
discusses Amar Bhide's skeptical view on the current state and future of artificial intelligence. While AI has made big technical advances recently, Bhide argues that truly transformative technologies develop slowly as they spread through society. He notes that AI has been gradually added to various applications over many decades, but progress in solving real problems has been uneven.Moreover, Kling says, there's a gap between how quickly AI is getting better technically, like with big language models (LLMs), and how slowly we're making practical uses of these improvements. This is because it's hard to predict how AI will improve when we design applications, and we run into unexpected problems when we try to use AI in meaningful ways. Also, new technologies like AI tend to spread slowly as businesses and people get used to them.
Kling concludes by noting that while leading-edge AI research achievements hint at transformative potential, the actual existential risks will depend on the real-world applications that emerge more gradually.
Why does it matter?
Bhide's perspective is a critical counterpoint to the AI hype, reminding us that the path from technical breakthroughs to real-world impact is long and uncertain. As businesses and investors pour resources into AI, they should balance their expectations and plan for a gradual, iterative process of application development and user adoption.
What Else Is Happening❗
🤖 Meta trials AI chatbot on WhatsApp, Instagram, and Messenger
Meta is testing its AI chatbot, Meta AI, with WhatsApp, Instagram, and Messenger users in India and parts of Africa. The move allows Meta to leverage its massive user base across these apps to scale its AI offerings. Meta AI can answer user queries, generate images from text prompts, and assist with Instagram search queries. (Link)
🎨 Ideogram introduces new features to its AI image generation model
Ideogram's AI image generation model now offers enhanced capabilities like description-based referencing, negative prompting, and options for generating images at varying speeds and quality levels. The upgrade aims to improve image coherence, photorealism, and text rendering quality, with human raters showing a 30-50% preference for the new version over the previous one. (Link)
🖼️ New Freepik AI tool redefines image generation with realism and versatility
Freepik has launched the latest version of its AI Image Generator that offers real-time generation, infinite variations, and photorealistic results. The tool allows users to create infinite variations of an image with intuitive prompts, combining colors, settings, characters, and scenarios. It delivers highly realistic results and offers a streamlined workflow with real-time generation and infinite scrolling. (Link)
💼 OpenAI promoted ChatGPT Enterprise to corporates with road-show-like events
OpenAI CEO Sam Altman recently hosted events in San Francisco, New York, and London, pitching ChatGPT Enterprise and other AI services to hundreds of Fortune 500 executives. This move is part of OpenAI's strategy to diversify revenue streams and compete with partner Microsoft in selling AI products to enterprises. The events showcased applications such as call center management, translation, and custom AI solutions. (Link)
📔 Google's Notes tool now offers custom AI-generated backgrounds
Google has introduced an AI-powered background generation feature for its experimental Notes tool, allowing users to personalize their notes with custom images created from text prompts. The feature, currently available for select users in the US and India, utilizes Google's Gemini AI model for image generation. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From ML to ChatGPT to generative AI and LLMs, We break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊