Hello, Engineering Leaders and AI Enthusiasts!
Most people only think of Llama when it comes to Meta’s AI initiatives. However, Meta might be the most active tech giant in the AI space.
Here’s a rundown of what Zuckerberg and company have been up to.
Meta researchers find AI “Déjà Vu”ing: Suggested ways to address the privacy risks
Researchers at Meta recently discovered an anomaly common across most Self Supervised Learning (SSL) algorithms and call it Déjà Vu. They said SSL models can unintendedly memorize specific parts in individual training samples rather than learning semantically meaningful associations.
The report shares the details of studies around this unintended memorization and also explores ways of avoiding it.
Meta's ImageBind: The ultimate fusion of 6 data types in 1 AI model
Meta has announced the new open-source AI model called ‘ImageBind’ that links together multiple data streams- text, audio, visual data, temperature, and movement readings. ImageBind is the first to combine 6 data types into a single embedding space.
The company also notes that other streams of sensory input could be added to future models, including touch, speech, smell, and brain fMRI signals.
Meta's Sandbox: Where AI meets advertising
Meta has introduced an AI Sandbox for advertisers, which includes features such as alternative copy generation, background creation through text prompts, and image cropping for Facebook or Instagram ads. This new tool aims to assist advertisers in creating more diverse and engaging content using AI.
The tools are still in beta, but they have the potential to revolutionize how ads are created and delivered.
Meta bets big on AI with custom chips & a supercomputer
Meta is making a big bet on AI by developing custom chips and a supercomputer. The company is developing its own chips called the Meta Training and Inference Accelerator (MTIA), which will be optimized for AI workloads and allow for more efficient training and running of complex models.
In addition, Meta is building a supercomputer, which will be used to train large-scale AI models for natural language processing and computer vision. These investments aim to enable the development of more advanced products and services, such as virtual assistants and augmented reality applications.
Meta scaling Speech Technology to 1,100+ languages
Meta’s Massively Multilingual Speech (MMS) project aims to address the lack of speech recognition models for most of the world's languages, introduced Introducing speech-to-text, text-to-speech. Combining self-supervised learning techniques with a new dataset containing labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages.
The MMS models outperform existing ones and cover 10 times as many languages. The project's goal is to increase accessibility to information for people who rely on voice as their primary means of accessing information. The models and code are publicly available for further research and development. The project aims to contribute to the preservation of the world's diverse languages.
Meta's AI Segmentation Game Changer
Meta’s researchers have developed HQ-SAM (High-Quality Segment Anything Model), a new model that improves the segmentation capabilities of the existing SAM. SAM struggles to segment complex objects accurately, despite being trained with 1.1 billion masks. HQ-SAM is trained on a dataset of 44,000 fine-grained masks from various sources, achieving impressive results on nine segmentation datasets across different tasks.
HQ-SAM retains SAM's prompt design, efficiency, and zero-shot generalizability while requiring minimal additional parameters and computation. Training HQ-SAM on the provided dataset takes only 4 hours on 8 GPUs.
Meta plans to put AI everywhere on its platforms
Meta has announced plans to integrate generative AI into its platforms, including Facebook, Instagram, WhatsApp, and Messenger. The company shared a sneak peek of AI tools it was building, including ChatGPT-like chatbots planned for Messenger and WhatsApp that could converse using different personas. It will also leverage its image generation model to let users modify images and create stickers via text prompts.
Meta’s MusicGen: The LLaMA moment for music AI
META released MusicGen, a controllable music generation model for producing high-quality music. MusicGen can be prompted by both text and melody.
The best thing is anyone can try it for free now. It uses a single-stage transformer language model with efficient token interleaving patterns, eliminating the need for multiple models.
MusicGen will generate 12 seconds of audio based on the description provided. You can optionally provide a reference audio from which a broad melody will be extracted. Then the model will try to follow both the description and melody provided. You can also use your own GPU or a Google Colab by following the instructions on their repo.
Meta’s new human-like AI model for image creation
Meta has introduced a new model, Image Joint Embedding Predictive Architecture (I-JEPA), based on Meta’s Chief AI Scientist Yann LeCun’s vision to make AI systems learn and reason like animals and humans. It is a self-supervised computer vision model that learns to understand the world by predicting it.
The core idea: It learns by creating an internal model of the outside world and comparing abstract representations of images. It uses background knowledge about the world to fill in missing pieces of images, rather than looking only at nearby pixels like other generative AI models.
Key takeaways: The model
Captures patterns and structures through self-supervised learning from unlabeled data.
Predicts missing information at a high level of abstraction, avoiding generative model limitations.
Delivers strong performance on multiple computer vision tasks while also being computationally efficient. Less data, less time, and less compute.
Can be used for many different applications without needing extensive fine-tuning and is highly scalable.
Meta’s all-in-one generative speech AI model
Meta introduces Voicebox, the first generative AI model that can perform various speech-generation tasks it was not specifically trained to accomplish with SoTA performance. It can perform:
Text-to-speech synthesis in 6 languages
Noise removal
Content editing
Cross-lingual style transfer
Diverse sample generation
One of the main limitations of existing speech synthesizers is that they can only be trained on data that has been prepared expressly for that task. Voicebox is built upon the Flow Matching model, which is Meta’s latest advancement on non-autoregressive generative models that can learn highly non-deterministic mapping between text and speech.
Using an input audio sample of just two seconds in length, Voicebox can match the sample’s audio style and use it for text-to-speech generation.
Meta disclosed AI behind Facebook and Instagram recommendations
Meta is sharing 22 system cards that explain how AI-powered recommender systems work across Facebook and Instagram. These cards contain information and actionable insights everyone can use to understand and customize their specific AI-powered experiences in Meta’s products.
Moreover, Meta also shared its top ten most important prediction models rather than everything in the system to not dive into much technical detail can sometimes obfuscate transparency.
Meta plans to dethrone OpenAI and Google
Meta plans to release a commercial AI model to compete with OpenAI, Microsoft, and Google. The model will generate language, code, and images. It might be an updated version of Meta's LLaMA, which is currently only available under a research license.
Meta's CEO, Mark Zuckerberg, has expressed the company's intention to use the model for its own services and make it available to external parties. Safety is a significant focus. The new model will be open source, but Meta may reserve the right to license it commercially and provide additional services for fine-tuning with proprietary data.
📢 Invite friends and get rewards 🤑🎁
Enjoying the daily AI updates? Refer friends and get perks and special access to The AI Edge.
Get 400+ AI tools and 500+ prompts for 1 referral.
Get a free shoutout for 3 referrals!
Get The Ultimate Gen AI Handbook for 5 referrals.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text, email, or share it on social media with friends.
Meta merges ChatGPT & Midjourney into one
Meta has launched CM3leon (pronounced chameleon), a single foundation model that does both text-to-image and image-to-text generation. So what’s the big deal about it?
LLMs largely use Transformer architecture, while image generation models rely on diffusion models. CM3leon is a multimodal language model based on Transformer architecture, not Diffusion. Thus, it is the first multimodal model trained with a recipe adapted from text-only language models.
CM3leon achieves state-of-the-art performance despite being trained with 5x less compute than previous transformer-based methods. It performs a variety of tasks– all with a single model:
Text-guided image generation and editing
Text-to-image
Text-guided image editing
Text tasks
Structure-guided image editing
Segmentation-to-image
Object-to-image
Meta unveils Llama 2, a worthy rival to ChatGPT
Meta has introduced Llama 2, the next generation of its open-source large language model. Here’s all you need to know:
It is free for research and commercial use. You can download the model here.
Microsoft is the preferred partner for Llama 2. It is also available through AWS, Hugging Face, and other providers.
Llama 2 models outperform open-source chat models on most benchmarks tested, and based on human evaluations for helpfulness and safety, they may be a suitable substitute for closed-source models.
Meta is opening access to Llama 2 with the support of a broad set of companies and people across tech, academia, and policy who also believe in an open innovation approach for AI.
Meta-Transformer lets AI models process 12 modalities
New research has proposed Meta-Transformer, a novel unified framework for multimodal learning. It is the first framework to perform unified learning across 12 modalities, and it leverages a frozen encoder to perform multimodal perception without any paired multimodal training data.
Experimentally, Meta-Transformer achieves outstanding performance on various datasets regarding 12 modalities, which validates the further potential of Meta-Transformer for unified multimodal learning.
Meta collabs with Qualcomm to enable on-device AI apps using Llama 2
Meta and Qualcomm Technologies, Inc. are working to optimize the execution of Meta’s Llama 2 directly on-device without relying on the sole use of cloud services. The ability to run Gen AI models like Llama 2 on devices such as smartphones, PCs, VR/AR headsets, and vehicles allows developers to save on cloud costs and to provide users with private, more reliable, and personalized experiences.
Qualcomm Technologies is scheduled to make available Llama 2-based AI implementation on devices powered by Snapdragon starting from 2024 onwards.
Meta is building AI friends for you
Meta, the owner of Facebook, is developing chatbots with different personalities to increase engagement on its platforms. These chatbots, known as "personas," will mimic human conversations and may include characters like Abraham Lincoln or a surfer. The chatbots are expected to launch early in September and will provide users with search functions, recommendations, and entertainment.
The move is aimed at retaining users and competing with platforms like TikTok. However, there are concerns about privacy, data collection, and the potential for manipulation.
Meta’s AudioCraft is AudioGen + MusicGen + EnCodec
Meta has introduced AudioCraft, a new family of generative AI models built for generating high-quality, realistic audio & music from text. AudioCraft is a single code base that works for music, sound, compression & generation — all in the same place. It consists of three models– MusicGen, AudioGen, and EnCodec.
Meta is also open-sourcing these models, giving researchers and practitioners access so they can train their own models with their own datasets for the first time. AudioCraft is also easy to build on and reuse. Thus, people who want to build better sound generators, compression algorithms, or music generators can do it all in the same code base and build on top of what others have done.
MetaGPT tackling LLM hallucination
MetaGPT is a new framework that improves multi-agent collaboration by incorporating human workflows and domain expertise. It addresses the problem of hallucination in LLMs by encoding Standardized Operating Procedures (SOPs) into prompts, ensuring structured coordination.
The framework also mandates modular outputs, allowing agents to validate outputs and minimize errors. By assigning diverse roles to agents, MetaGPT effectively deconstructs complex problems.
Meta beats ChatGPT in language model generation
Shepherd is a language model designed to critique and improve the outputs of other language models. It uses a high-quality feedback dataset to identify errors and provide suggestions for refinement. Despite its smaller size, Shepherd's critiques are either equivalent or preferred to those from larger models like ChatGPT. In evaluations against competitive alternatives, Shepherd achieves a win rate of 53-87% compared to GPT-4.
Shepherd outperforms other models in human evaluation and is on par with ChatGPT. Shepherd offers a practical and valuable tool for enhancing language model generation.
Meta challenges OpenAI with code-gen free software
Meta is set to release Code Llama, an open-source code-generating AI model that competes with OpenAI's Codex. The software builds on Meta's Llama 2 model and allows developers to automatically generate programming code and develop AI assistants that suggest code.
Llama 2 disrupted the AI industry by enabling companies to create AI apps without relying on proprietary software from major players like OpenAI, Google, or Microsoft. Code Llama is expected to launch next week, further challenging the dominance of existing code-generating AI models in the market.
Meta AI’s new RoboAgent with 12 skills
Meta and CMU Robotics Institute’s New Robotics research: RoboAgent. It is a universal robotic agent that can efficiently learn and generalize a wide range of non-trivial manipulation skills. It can perform 12 skills across 38 tasks, including object manipulation and re-orientation, and adapt to unseen scenarios involving different objects and environments.
The development of the RoboAgent was made possible through a distributed robotics infrastructure, a unified framework for robot learning, and a high-quality dataset. The agent also utilizes a language-conditioned multi-task imitation learning framework to enhance its capabilities. Meta is open-sourcing RoboSet, a large, high-quality robotics dataset collected with commodity hardware, to support and accelerate open-source research in robot learning.
Meta’s SeamlessM4T: The first all-in-one, multilingual multimodal AI
Meta has introduced SeamlessM4T, the first all-in-one multilingual multimodal AI translation and transcription model. This single model can perform speech-to-text, speech-to-speech, text-to-speech, text-to-text translations, and more for up to 100 languages without relying on multiple separate models.
Compared to cascaded approaches, SeamlessM4T's single system approach reduces errors & delays, increasing translation efficiency & quality, delivering state-of-the-art results.
Meta is also releasing its training dataset called SeamlessAlign and sharing the model publicly to allow researchers and developers to build on this technology.
Meta’s coding version of Llama-2
Meta has released Code Llama, a state-of-the-art LLM capable of generating code and natural language about code from both code and natural language prompts. It is free for research and commercial use. It can also be used for code completion and debugging and supports many of the most popular languages today, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.
Code Llama is built on top of Llama 2 and is available in three models:
Code Llama, the foundational code model;
Codel Llama - Python specialized for Python;
and Code Llama - Instruct, fine-tuned for understanding natural language instructions.
It is releasing in three sizes with 7B, 13B, and 34B parameters. Meta’s benchmark testing showed that Code Llama performed better than open-source, code-specific LLMs and outperformed Llama 2. However, it seems there was no 70B variant released.
Meta to rival GPT-4 with a free Llama 3?
According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license. Jason Wei, an engineer associated with OpenAI, indicated that Meta has the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.
Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.
That's all for now!
Be in the company of industry frontrunners! Subscribe to The AI Edge and join the ranks of respected readers from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other notable organizations.
Thanks for reading, and see you tomorrow. 😊