AI Weekly Rundown (March 30 to April 5)

Major AI announcements from OpenAI, Microsoft, DeepMind, Apple, Anthropic, and more.

Apr 06, 2024

Hello Engineering Leaders and AI Enthusiasts!

Another eventful week in the AI realm. Lots of big news from huge enterprises.

In today’s edition:

🎤 OpenAI’s AI model can clone your voice in 15 seconds
🚀 Microsoft and OpenAI plan $100B supercomputer for AI development
🖼️ MagicLens: Google DeepMind's breakthrough in image retrieval
📲 Apple's Siri will now understand what’s on your screen
🤖 OpenAI introduces instant access to ChatGPT
🚨 Elon Musk says AI might destroy humanity, but it's worth the risk
🔍 Google's Gecko: LLM-powered text embedding breakthrough
🔓 Anthropic’s “many-shot jailbreaking” wears down AI ethics
🌌 CosmicMan enables the photorealistic generation of human images
🎵 What’s new in Stability AI’s Stable Audio 2.0?
👨‍💻 SWE-agent: AI coder that solves GitHub issues in 93 seconds
🎥 Mobile-first Higgsfield aims to disrupt video marketing with AI
🏢 Cohere launches Command R+ for enterprises
🧰 OpenAI doubles down on AI model customization
🏠 Will personal home robots be Apple’s next big thing?

Let’s go!

OpenAI’s AI model can clone your voice in 15 seconds

OpenAI has offered a glimpse into its latest breakthrough - Voice Engine, an AI model that can generate stunningly lifelike voice clones from a mere 15-second audio sample and a text input. This technology can replicate the original speaker's voice, opening up possibilities for improving educational materials.

Though the model has many applications, the AI giant is cautious about its potential misuse, especially during elections. They have strict rules for partners, like no unauthorized impersonation, clear labeling of synthetic voices, and technical measures like watermarking and monitoring.

Microsoft+OpenAI plan $100B supercomputer for AI development

Microsoft and OpenAI are reportedly planning to build a massive $100 billion supercomputer called "Stargate" to rapidly advance the development of OpenAI's AI models. Insiders say the project, set to launch in 2028 and expand by 2030, would be one of the largest investments in computing history.

Much of Stargate's cost would go towards procuring millions of specialized AI chips, with funding primarily from Microsoft. A smaller $10B precursor called "Phase 4" is planned for 2026. The decision to move forward with Stargate relies on OpenAI achieving significant improvements in AI capabilities.

MagicLens: Google DeepMind's breakthrough in image retrieval technology

Google DeepMind has introduced MagicLens, a revolutionary set of image retrieval models that surpass previous state-of-the-art methods in multimodality-to-image, image-to-image, and text-to-image retrieval tasks. Trained on a vast dataset of 36.7 million triplets containing query images, text instructions, and target images, MagicLens achieves outstanding performance.

Multimodality-to-Image performance

MagicLens employs a dual-encoder architecture, which allows it to process both image and text inputs, delivering highly accurate search results even when queries are expressed in everyday language. By leveraging advanced AI techniques, like contrastive learning and single-modality encoders, MagicLens can satisfy diverse search intents.

Apple's Siri will now understand what’s on your screen

Apple researchers have developed an AI system called ReALM which enables voice assistants like Siri to understand contextual references to on-screen elements. By converting the complex task of reference resolution into a language modeling problem, ReALM outperforms even GPT-4 in understanding ambiguous references and context.

This innovation lies in reconstructing the screen using parsed on-screen entities and their locations to generate a textual representation that captures the visual layout. This approach, combined with fine-tuning language models specifically for reference resolution, allows ReALM to achieve substantial performance gains.

OpenAI introduces instant access to ChatGPT

OpenAI now allows users to use ChatGPT without having to create an account. With over 100 million weekly users across 185 countries, it can now be accessed instantly by anyone curious about its capabilities.

While this move makes AI more accessible, other OpenAI products like DALL-E 3 still require an account. The company has also introduced new content safeguards and allows users to opt out of model training, even without an account.

Elon Musk says AI might destroy humanity, but it's worth the risk

Elon Musk recently shared his thoughts on the potential dangers of AI at the Abundance Summit's "Great AI Debate" seminar. He estimated a 10-20% chance that AI could pose an existential threat to humanity.

Despite the risks, Musk believes that the benefits of AI outweigh the potential dangers. He emphasized the importance of teaching AI to be truthful and curious, although he didn't provide specifics on how he arrived at his risk assessment.

Google's Gecko: LLM-powered text embedding breakthrough

Gecko is a compact and highly versatile text embedding model that achieves impressive performance by leveraging LLM knowledge. DeepMind researchers behind Gecko have developed a novel two-step distillation process to create a high-quality dataset called FRet using LLMs. The first step involves using an LLM to generate diverse, synthetic queries and tasks from a large web corpus. In the second step, the LLM mines positive and hard negative passages for each query.

When trained on FRet combined with other academic datasets, Gecko outperforms existing models of similar size on the Massive Text Embedding Benchmark (MTEB). Remarkably, the 256-dimensional version of Gecko surpasses all models with 768 dimensions, and the 768-dimensional Gecko competes with models that are 7x larger or use embeddings with 5x higher dimensions.

Anthropic’s “many-shot jailbreaking” wears down AI ethics

Researchers at Anthropic discovered a new way to get advanced AI language models to bypass their safety restrictions and provide unethical or dangerous information. They call this the "many-shot jailbreaking" technique. By including many made-up dialog examples in the input where an AI assistant provides harmful responses, the researchers could eventually get the real AI to override its training.

This vulnerability arises from AI models' increasing ability to process and "learn" from very long input sequences. The AI mimics the unethical behavior repeatedly demonstrated in the made-up examples. Anthropic has implemented safeguards against this attack on its systems and has shared the findings openly.

CosmicMan enables the photorealistic generation of human images

Researchers at the Shanghai AI Laboratory have created a new AI model called CosmicMan that specializes in generating realistic images of people. CosmicMan can produce high-quality, photorealistic human images that precisely match detailed text descriptions, unlike current AI image models that struggle with human images.

The key to CosmicMan's success is a massive dataset called CosmicMan-HQ 1.0 containing 6 million annotated human images and a novel training method—“ Annotate Anyone,” which focuses the model on different parts of the human body. By categorizing words in the text description into body part groups like head, arms, legs, etc., the model can generate each part separately for better accuracy.

Enjoying the weekly updates?

Refer your pals to subscribe to our newsletter and get exclusive access to 400+ game-changing AI tools.

When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.

What’s new in Stability AI’s Stable Audio 2.0?

Stability AI has released Stable Audio 2.0, a new AI model that generates high-quality, full-length audio tracks. Built upon its predecessor, the latest model introduces three groundbreaking features:

Generates tracks up to 3 minutes long with coherent musical structure

Enables audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts
Enhances sound effect generation and style transfer capabilities, offering more flexibility and control for artists

Stable Audio 2.0's architecture combines a highly compressed autoencoder and a diffusion transformer (DiT) to generate full tracks with coherent structures. The autoencoder condenses raw audio waveforms into shorter representations, capturing essential features, while the DiT excels at manipulating data over long sequences.

SWE-agent: AI coder that solves GitHub issues in 93 seconds

Researchers at Princeton University have developed SWE-agent, an AI system that converts language models like GPT-4 into autonomous software engineering agents. SWE-agent can identify and fix bugs and issues in real-world GitHub repositories in 93 seconds! It does so by interacting with a specialized terminal, which allows it to open, scroll, and search through files, edit specific lines with automatic syntax checking, and write and execute tests.

In the SWE-Bench benchmark test, SWE-agent solved 12.29% of the problems presented, nearly matching the 13.86% achieved by Devin, a closed-source $21 million commercial AI programmer developed by Cognition AI.

Mobile-first Higgsfield aims to disrupt video marketing with AI

Former Snap AI chief Alex Mashrabov has launched a new startup called Higgsfield AI, which aims to make AI-powered video creation accessible to creators and marketers. The company's first app, Diffuse, allows users to generate original video clips from text descriptions or edit existing videos to insert themselves into the scenes.

Higgsfield is taking on OpenAI's Sora video generator but targeting a broader audience with its mobile-first, user-friendly tools. While questions remain around data usage and potential for abuse, Higgsfield believes it can carve out a niche in social media marketing with its realistic, easy-to-use video generation.

Cohere launches the “most powerful LLM for enterprises”

Cohere has announced the release of Command R+, its most powerful and scalable LLM to date. Designed specifically for enterprise use cases, Command R+ boasts several key features:

Advanced Retrieval Augmented Generation (RAG) to access and process vast amounts of information, improving response accuracy and reliability.
Support for ten business languages, enabling seamless operation across global organizations.
Tool Use feature to automate complex workflows by interacting with various software tools.

Moreover, Command R+ outperforms other scalable models on key metrics while providing strong accuracy at lower costs.

The LLM is now available through Cohere's API and can be deployed on various cloud platforms, including Microsoft Azure and Oracle Cloud Infrastructure.

OpenAI doubles down on AI model customization

OpenAI is making significant strides in AI accessibility with new features for its fine-tuning API and an expanded Custom Models program. These advancements give developers greater control and flexibility when tailoring LLMs for specific needs.

The fine-tuning AP includes:

Epoch-based checkpoint creation for easier retraining
A playground for comparing model outputs
Support for third-party integration
Hyperparameters adjustment directly from the dashboard

The Custom Models program now offers assisted fine-tuning with OpenAI researchers for complex tasks and custom-trained models built entirely from scratch for specific domains with massive datasets.

Will personal home robots be Apple’s next big thing?

Apple is reportedly venturing into personal robotics after abandoning its self-driving car project and launching its mixed-reality headset. According to Bloomberg’s sources, the company is in the early stages of developing robots for the home environment.

Two potential robot designs are mentioned in the report. One is a mobile robot that can follow users around the house. The other is a stationary robot with a screen that can move to mimic a person's head movements during video calls. Apple is also considering robots for household tasks in the long term.

That's all for now!

Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.

Thanks for reading, and see you on Monday. 😊

Loading...

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts