This AI Model Can Clone Your Voice in 15 Seconds
Plus: Microsoft and OpenAI plan $100B supercomputer for AI development, MagicLens is Google's breakthrough in image retrieval technology
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 243rd edition of The AI Edge newsletter. This edition features an AI model that can clone your voice in 15 seconds!
And a huge shoutout to our incredible readers. We appreciate you😊
In today’s edition:
🎤 This AI model can clone your voice in 15 seconds
.
🚀 Microsoft and OpenAI plan $100B supercomputer for AI development
🖼️ MagicLens: Google DeepMind's breakthrough in image retrieval technology
💡 Knowledge Nugget: Fine-tune an open source LLM from Postgres data in 5 minutes by
Let’s go!
This AI model can clone your voice in 15 seconds
OpenAI has offered a glimpse into its latest breakthrough - Voice Engine, an AI model that can generate stunningly lifelike voice clones from a mere 15-second audio sample and a text input. This technology can replicate the original speaker's voice, opening up possibilities for improving educational materials, making videos more accessible to global audiences, assisting with communication for people with speech impairments, and more.
Reference audio:
Generated audio:
Though the model has many applications, the AI giant is cautious about its potential misuse, especially during elections. They have strict rules for partners, like no unauthorized impersonation, clear labeling of synthetic voices, and technical measures like watermarking and monitoring. OpenAI hopes this early look will start a conversation about how to address potential issues by educating the public and developing better ways to trace the origin of audio content.
Why does this matter?
OpenAI's Voice Engine can transform industries from gaming and entertainment to education and healthcare. Imagine video games with non-player characters that sound like real people, animated films with AI-generated voiceovers, or personalized voice assistants for individuals with speech impairments. But as AI-generated voices become more human-like, questions about consent, privacy, and robust authentication measures must be addressed to prevent misuse.
Microsoft+OpenAI plan $100B supercomputer for AI development
Microsoft and OpenAI are reportedly planning to build a massive $100 billion supercomputer called "Stargate" to rapidly advance the development of OpenAI's AI models. Insiders say the project, set to launch in 2028 and expand by 2030, would be one of the largest investments in computing history, requiring several gigawatts of power - equivalent to multiple large data centers.
Much of Stargate's cost would go towards procuring millions of specialized AI chips, with funding primarily from Microsoft. A smaller $10B precursor called "Phase 4" is planned for 2026. The decision to move forward with Stargate relies on OpenAI achieving significant improvements in AI capabilities and potential "superintelligence." If realized, Stargate could enable OpenAI's AI systems to recursively generate synthetic training data and become self-improving.
Why does this matter?
The Stargate project will give OpenAI and Microsoft a massive advantage in creating AI systems that are far more capable than what we have today. This could lead to breakthroughs in areas like scientific discovery, problem-solving, and the automation of complex tasks. But it also raises concerns about the concentration of power in the AI industry. We'll need new frameworks for governing advanced AI to ensure it benefits everyone, not just a few giants.
MagicLens: Google DeepMind's breakthrough in image retrieval technology
Google DeepMind has introduced MagicLens, a revolutionary set of image retrieval models that surpass previous state-of-the-art methods in multimodality-to-image, image-to-image, and text-to-image retrieval tasks. Trained on a vast dataset of 36.7 million triplets containing query images, text instructions, and target images, MagicLens achieves outstanding performance while meeting a wide range of search intents expressed through open-ended instructions.
Multimodality-to-Image performance
Image-to-Image performance
MagicLens employs a dual-encoder architecture, which allows it to process both image and text inputs, delivering highly accurate search results even when queries are expressed in everyday language. By leveraging advanced AI techniques, like contrastive learning and single-modality encoders, MagicLens can satisfy diverse search intents and deliver relevant images with unprecedented efficiency.
Why does this matter?
The release of MagicLens highlights the growing importance of multimodal AI systems that can process both text and visual information. We can expect to see more seamless integration between language and vision, enabling the development of more sophisticated AI applications. This trend could have far-reaching implications for fields such as robotics, autonomous vehicles, and augmented reality, where the ability to interpret and respond to visual data is crucial.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: Fine-tune an open-source LLM from Postgres data in 5 minutes
In his recent article,
shows how fine-tuning open-source language models has become incredibly easy using Together.ai's API. He walks through an example of fine-tuning an LLM on product catalog data stored in Postgres, which enables capabilities like asking questions about products, generating creative content, and providing customer support.The process involves:
Exporting the Postgres data to JSONL format using a provided TypeScript script
Uploading the JSONL file using Together.ai's command line tool
Initiating the fine-tuning job on a model like llama-2-7b-chat
Testing the fine-tuned chatbot using Together.ai's interface
The article also mentions the potential for incremental fine-tuning and its integration with Sort, a database collaboration platform, to improve data quality and collaboration workflows.
Why does this matter?
The increasing accessibility of LLM fine-tuning may accelerate the development of multi-modal AI systems that seamlessly integrate language, vision, and other modalities. By enabling developers to fine-tune language models with other AI components, such as computer vision and speech recognition models, we could see the emergence of more sophisticated AI applications, even from smaller players.
What Else Is Happening❗
🧠 TCS aims to build the largest AI-ready workforce
Tata Consultancy Services (TCS) has announced that it has trained 3.5 lakh employees, more than half of its workforce, in generative AI skills. The company set up a dedicated AI and cloud business unit in 2023 to address the growing needs of customers for cloud and AI adoption, offering a comprehensive portfolio of GenAI services and solutions. (Link)
🔗 ChatGPT introduces hyperlinked source citations in the latest update
OpenAI has introduced a feature for ChatGPT premium users that makes source links more prominent in the bot's responses. The update hyperlinks words within ChatGPT's answers, directing users to the source websites — a feature already present in other chatbot search resources like Perplexity. (Link)
✏️ OpenAI's DALL·E now allows users to edit generated images
OpenAI has launched a new image editing feature for DALL·E, enabling users to modify generated images by selecting areas and describing changes. The editor offers tools to add, remove, or update objects within the image using either the selection tool or conversational prompts. (Link)
🚇 NYC to test Evolv's AI gun detection technology in subways
New York City plans to test Evolv's AI-powered gun detection scanners in subway stations within 90 days, according to Mayor Eric Adams. However, Evolv is under scrutiny for the accuracy of its technology, facing reports of false positives and missed detections. (Link)
🚫 Microsoft Copilot banned in US House due to potential data breaches
The US House of Representatives has banned its staffers from using Microsoft Copilot due to concerns about possible data leaks to unauthorized cloud services. This decision mirrors last year's restriction on the use of ChatGPT in congressional offices, with no other chatbots currently authorized. Microsoft has indicated that it plans to address federal government security and compliance requirements for AI tools like Copilot later this year. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From ML to ChatGPT to generative AI and LLMs, We break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊