Meta Merges ChatGPT & Midjourney into One
Plus: NaViT reshaping the future of Vision Transformers & Introducing Air AI.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 64th edition of The AI Edge newsletter. This edition brings you Meta’s CM3leon, a breakthrough in multimodal AI.
Our amazing readers deserve a special mention. Your support is invaluable to us!😊
In today’s edition:
🚀 Meta merges ChatGPT & Midjourney into one
🌟 NaViT: AI generates images in any resolution, any aspect ratio
💬 Air AI: AI to replace sales & CSM teams
📚 Knowledge Nugget: How to write your code so that LLMs can extend it by
Let’s go!
Meta merges ChatGPT & Midjourney into one
Meta has launched CM3leon (pronounced chameleon), a single foundation model that does both text-to-image and image-to-text generation. So what’s the big deal about it?
LLMs largely use Transformer architecture, while image generation models rely on diffusion models. CM3leon is a multimodal language model based on Transformer architecture, not Diffusion. Thus, it is the first multimodal model trained with a recipe adapted from text-only language models.
CM3leon achieves state-of-the-art performance despite being trained with 5x less compute than previous transformer-based methods. It performs a variety of tasks– all with a single model:
Text-guided image generation and editing
Text-to-image
Text-guided image editing
Text tasks
Structure-guided image editing
Segmentation-to-image
Object-to-image
Why does this matter?
This greatly expands the functionality of previous models that were either only text-to-image or only image-to-text. Moreover, Meta’s new approach to image generation is more efficient and opens up possibilities for generating and manipulating multimodal content with a single model and paves way for advanced AI applications.
NaViT: AI generates images in any resolution, any aspect ratio
NaViT (Native Resolution ViT) by Google Deepmind is a Vision Transformer (ViT) model that allows processing images of any resolution and aspect ratio. Unlike traditional models that resize images to a fixed resolution, NaViT uses sequence packing during training to handle inputs of varying sizes.
This approach improves training efficiency and leads to better results on tasks like image and video classification, object detection, and semantic segmentation. NaViT offers flexibility at inference time, allowing for a smooth trade-off between cost and performance.
Why does this matter?
NaViT showcases the versatility and adaptability of ViTs, thereby influencing the development and training of future AI architectures and algorithms. It can be a transformative step towards more advanced, flexible, and efficient computer vision and AI systems.
Air AI: AI to replace sales & CSM teams
Introducing Air AI, a conversational AI that can perform full 5-40 minute long sales and customer service calls over the phone that sound like a human. And it can perform actions autonomously across 5,000 unique applications.
According to one of its co-founders, Air is currently on live calls talking to real people, profitably producing for real businesses. And it’s not limited to any one use case. You can create an AI SDR, 24/7 CS agent, Closer, Account Executive, etc., or prompt it for your specific use case and get creative (therapy, talk to Aristotle, etc.)
Why does this matter?
Adoption of such AI systems marks a significant milestone in the advancement and evolution of AI technologies, transforming how businesses interact with their customers. It also paves the way for AI developers and builders to create novel applications and solutions on top of it, accelerating innovation in AI.
Knowledge Nugget: How to write your code so that LLMs can extend it
Coding LLMs are here to stay. But while they show remarkable coding abilities in ideal conditions, real-world scenarios often fall short due to limited context and complex codebases.
In this insightful article,
proposes six principles for adapting coding style to optimize LLM performance. The improved code quality not only benefits LLM performance but also enhances human collaboration and understanding within the codebase, leading to overall better coding experiences.Why does this matter?
By adhering to these coding principles, developers create codebases that are more conducive to LLMs' capabilities and enable them to generate more accurate, relevant, and reliable code. It can also lead to broader adoption and integration of AI in the software development landscape.
What Else Is Happening❗
🎯Ensuring accuracy in AI and 3D tasks with ReshotAI keypoints! (Link)
🧪Samsung could be testing ChatGPT integration for its own browser (Link)
🧑💻ChatGPT becomes study buddy for Hong Kong school students (Link)
⚠️WormGPT, the cybercrime tool, unveils the dark side of generative AI (Link)
🏦Bank of America is using AI, VR, and Metaverse to train new hires (Link)
🤗Transformers now supports dynamic RoPE-scaling to extend the context length of LLMs (Link)
🇮🇱Israel has started using AI to select targets for air strikes and organize wartime logistics (Link)
🛠️ Trending Tools
Sidekik: AI assistant for enterprise apps like Salesforce, Netsuite, and Microsoft. Get instant answers tailored to your org.
Domainhunt AI: Describe your startup idea and let AI find the perfect domain name for your business.
Indise: Create stunning interior images using AI. Explore design options in a virtual environment.
Formsly: Build forms and surveys with Formsly AI Builder. Try the beta version.
AI Mailman: Craft powerful emails in seconds by filling out a small form. Get an email template generated by AI.
PhotoEcom: Snap a picture of your product and let the advanced AI algorithms work their magic.
Outboundly: Research prospects, website, and social media. Generate hyper-personalized messages using GPT-4 with this Chrome extension.
BrainstormGPT: Streamline topic-to-meeting report conversion with multi-agent, LLM & auto-search. Custom topics, user-defined roles, and more.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI enthusiasts.
Thanks for reading, and see you tomorrow. 😊