AI Weekly Rundown (July 15 to July 21)
News from Meta, Apple, Google, OpenAI, Wix, and other big players.
Hello, Engineering Leaders and AI Enthusiasts,
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
✅ Meta merges ChatGPT & Midjourney into one
✅ NaViT: AI generates images in any resolution, any aspect ratio
✅ Air AI: AI to replace sales & CSM teams
✅ Wix’s new AI tool creates entire websites
✅ MedPerf makes AI better for Healthcare
✅ LLMs benefiting robotics and beyond
✅ Meta unveils Llama 2, a worthy rival to ChatGPT
✅ Microsoft furthers its AI ambitions with major updates
✅ How is ChatGPT's behavior changing over time?
✅ Apple Trials a ChatGPT-like AI Chatbot
✅ Google AI’s SimPer unlocks potential of periodic learning
✅ OpenAI doubles GPT-4 message cap to 50
✅ Google presents brain-to-music AI
✅ ChatGPT will now remember who you are & what you want
✅ Meta-Transformer lets AI models process 12 modalities
Let’s go!
Meta merges ChatGPT & Midjourney into one
Meta has launched CM3leon (pronounced chameleon), a single foundation model that does both text-to-image and image-to-text generation. So what’s the big deal about it?
LLMs largely use Transformer architecture, while image generation models rely on diffusion models. CM3leon is a multimodal language model based on Transformer architecture, not Diffusion. Thus, it is the first multimodal model trained with a recipe adapted from text-only language models.
CM3leon achieves state-of-the-art performance despite being trained with 5x less compute than previous transformer-based methods. It performs a variety of tasks– all with a single model:
Text-guided image generation and editing
Text-to-image
Text-guided image editing
Text tasks
Structure-guided image editing
Segmentation-to-image
Object-to-image
NaViT: AI generates images in any resolution, any aspect ratio
NaViT (Native Resolution ViT) by Google Deepmind is a Vision Transformer (ViT) model that allows processing images of any resolution and aspect ratio. Unlike traditional models that resize images to a fixed resolution, NaViT uses sequence packing during training to handle inputs of varying sizes.
This approach improves training efficiency and leads to better results on tasks like image and video classification, object detection, and semantic segmentation. NaViT offers flexibility at inference time, allowing for a smooth trade-off between cost and performance.
Air AI: AI to replace sales & CSM teams
Introducing Air AI, a conversational AI that can perform full 5-40 minute long sales and customer service calls over the phone that sound like a human. And it can perform actions autonomously across 5,000 unique applications.
According to one of its co-founders, Air is currently on live calls talking to real people, profitably producing for real businesses. And it’s not limited to any one use case. You can create an AI SDR, 24/7 CS agent, Closer, Account Executive, etc., or prompt it for your specific use case and get creative (therapy, talk to Aristotle, etc.)
Wix’s new AI tool creates entire websites
Website-building platform Wix is introducing a new feature that allows users to create an entire website using only AI prompts. While Wix already offers AI generation options for site creation, this new feature relies solely on algorithms instead of templates to build a custom site. Users will be prompted to answer a series of questions about their preferences and needs, and the AI will generate a website based on their responses.
By combining OpenAI's ChatGPT for text creation and Wix's proprietary AI models for other aspects, the platform delivers a unique website-building experience. Upcoming features like the AI Assistant Tool, AI Page, Section Creator, and Object Eraser will further enhance the platform's capabilities. Wix's CEO, Avishai Abrahami, reaffirmed the company's dedication to AI's potential to revolutionize website creation and foster business growth.
MedPerf makes AI better for Healthcare
MLCommons, an open global engineering consortium, has announced the launch of MedPerf, an open benchmarking platform for evaluating the performance of medical AI models on diverse real-world datasets. The platform aims to improve medical AI's generalizability and clinical impact by making data easily and safely accessible to researchers while prioritizing patient privacy and mitigating legal and regulatory risks.
MedPerf utilizes federated evaluation, allowing AI models to be assessed without accessing patient data, and offers orchestration capabilities to streamline research. The platform has already been successfully used in pilot studies and challenges involving brain tumor segmentation, pancreas segmentation, and surgical workflow phase recognition.
LLMs benefiting robotics and beyond
This study shows that LLMs can complete complex sequences of tokens, even when the sequences are randomly generated or expressed using random tokens, and suggests that LLMs can serve as general sequence modelers without any additional training. The researchers explore how this capability can be applied to robotics, such as extrapolating sequences of numbers to complete motions or prompting reward-conditioned trajectories. Although there are limitations to deploying LLMs in real systems, this approach offers a promising way to transfer patterns from words to actions.
Meta unveils Llama 2, a worthy rival to ChatGPT
Meta has introduced Llama 2, the next generation of its open-source large language model. Here’s all you need to know:
It is free for research and commercial use. You can download the model here.
Microsoft is the preferred partner for Llama 2. It is also available through AWS, Hugging Face, and other providers.
Llama 2 models outperform open-source chat models on most benchmarks tested, and based on human evaluations for helpfulness and safety, they may be a suitable substitute for closed-source models.
Meta is opening access to Llama 2 with the support of a broad set of companies and people across tech, academia, and policy who also believe in an open innovation approach for AI.
Microsoft furthers its AI ambitions with major updates
At Microsoft Inspire, Meta and Microsoft announced support for the Llama 2 family of LLMs on Azure and Windows. In other news, Microsoft announced major updates for AI-powered Bing, Copilot, and more.
It announced Bing Chat Enterprise, which gives organizations AI-powered chat for work with commercial data protection.
Microsoft 365 Copilot will now be available for commercial customers for $30 per user per month. Copilot is also coming to Teams phone and chat.
It launched Vector Search in preview through Azure Cognitive search, which will capture the meaning and context of unstructured data to make search faster.
It is rolling out multimodal capabilities via Visual Search in Chat. Leveraging OpenAI’s GPT-4 model, the feature lets anyone upload images and search the web for related content.
How is ChatGPT's behavior changing over time?
GPT-3.5 and GPT-4 are the two most widely used LLM services, but how updates in each affect their behavior is unclear. A new study evaluated the behavior of the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four tasks. And here are the findings:
Solving math problems- GPT-4 got much worse, while GPT-3.5 greatly improved.
Answering sensitive/dangerous questions- GPT-4 became less willing to respond directly, while GPT-3.5 was slightly more willing.
Code generation- Both systems made more mistakes that stopped the code from running in June compared to March.
Visual reasoning- Both systems improved slightly from March to June.
It shows that the behavior of the same LLM service can change substantially in a relatively short period (and for the worse in some tasks), highlighting the need for continuous monitoring of LLM quality.
Apple Trials a ChatGPT-like AI Chatbot
Apple is developing AI tools, including its own large language model called "Ajax" and an AI chatbot named "Apple GPT." They are gearing up for a major AI announcement next year as it tries to catch up with competitors like OpenAI and Google.
The company has multiple teams developing AI technology and addressing privacy concerns. While Apple has been integrating AI into its products for years, there is currently no clear strategy for releasing AI technology directly to consumers. However, executives are considering integrating AI tools into Siri to improve its functionality and keep up with advancements in AI.
Google AI’s SimPer unlocks potential of periodic learning
Google research team’s this paper introduces SimPer, a self-supervised learning method that focuses on capturing periodic or quasi-periodic changes in data. SimPer leverages the inherent periodicity in data by incorporating customized augmentations, feature similarity measures, and a generalized contrastive loss.
SimPer exhibits superior data efficiency, robustness against spurious correlations, and generalization to distribution shifts, making it a promising approach for capturing and utilizing periodic information in diverse applications.
OpenAI doubles GPT-4 message cap to 50
OpenAI has doubled the number of messages ChatGPT Plus subscribers can send to GPT-4. Users can now send up to 50 messages in 3 hours, compared to the previous limit of 25 messages in 2 hours. And they are rolling out this update next week.
Google presents brain-to-music AI
New research called Brain2Music by Google and institutions from Japan has introduced a method for reconstructing music from brain activity captured using functional magnetic resonance imaging (fMRI). The generated music resembles the musical stimuli that human subjects experience with respect to semantic properties like genre, instrumentation, and mood.
The paper explores the relationship between the Google MusicLM (text-to-music model) and the observed human brain activity when human subjects listen to music.
ChatGPT will now remember who you are & what you want
OpenAI is rolling out custom instructions to give you more control over how ChatGPT responds. It allows you to add preferences or requirements that you’d like ChatGPT to consider when generating its responses.
ChatGPT will remember and consider the instructions every time it responds in the future, so you won’t have to repeat your preferences or information. Currently available in beta in the Plus plan, the feature will expand to all users in the coming weeks.
Meta-Transformer lets AI models process 12 modalities
New research has proposed Meta-Transformer, a novel unified framework for multimodal learning. It is the first framework to perform unified learning across 12 modalities, and it leverages a frozen encoder to perform multimodal perception without any paired multimodal training data.
Experimentally, Meta-Transformer achieves outstanding performance on various datasets regarding 12 modalities, which validates the further potential of Meta-Transformer for unified multimodal learning.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI enthusiasts.
Thanks for reading, and see you on Monday! 😊