What’s New in Stability AI’s Stable Audio 2.0?
Plus: SWE-agent is an AI coder that solves GitHub issues in 93 seconds, Mobile-first Higgsfield aims to disrupt video marketing with AI
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 246th edition of The AI Edge newsletter. This edition explores the latest features in Stability AI’s Stable Audio 2.0
And a huge shoutout to our incredible readers. We appreciate you😊
In today’s edition:
🎵 What’s new in Stability AI’s Stable Audio 2.0?
.
🤖 SWE-agent: AI coder that solves GitHub issues in 93 seconds
📲 Mobile-first Higgsfield aims to disrupt video marketing with AI
💡 Knowledge Nugget: Are large language models on the trajectory of word processing or digital advertising? by
Let’s go!
What’s new in Stability AI’s Stable Audio 2.0?
Stability AI has released Stable Audio 2.0, a new AI model that generates high-quality, full-length audio tracks. Built upon its predecessor, the latest model introduces three groundbreaking features:
Generates tracks up to 3 minutes long with coherent musical structure
Enables audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts
Enhances sound effect generation and style transfer capabilities, offering more flexibility and control for artists
Stable Audio 2.0's architecture combines a highly compressed autoencoder and a diffusion transformer (DiT) to generate full tracks with coherent structures. The autoencoder condenses raw audio waveforms into shorter representations, capturing essential features, while the DiT excels at manipulating data over long sequences. This combination allows the model to recognize and reproduce the large-scale structures essential for creating high-quality musical compositions.
Trained exclusively on a licensed dataset from AudioSparx, Stable Audio 2.0 prioritizes creator rights by honoring opt-out requests and ensuring fair compensation. You can explore the capabilities of the model for free on the Stable Audio website.
Why does this matter?
Stable Audio 2’s capability to generate 3-minute songs is a big step forward for AI music tools. But it still has some issues, like occasional glitches and "soulless" vocals, showing that AI has limits in capturing the emotion of human-made music. Also, a recent open letter from artists like Billie Eilish and Katy Perry raises concerns about the ethics of AI-generated music.
SWE-agent: AI coder that solves GitHub issues in 93 seconds
Researchers at Princeton University have developed SWE-agent, an AI system that converts language models like GPT-4 into autonomous software engineering agents. SWE-agent can identify and fix bugs and issues in real-world GitHub repositories in 93 seconds! It does so by interacting with a specialized terminal, which allows it to open, scroll, and search through files, edit specific lines with automatic syntax checking, and write and execute tests. This custom-built agent-computer interface is critical for the system's strong performance.
In the SWE-Bench benchmark test, SWE-agent solved 12.29% of the problems presented, nearly matching the 13.86% achieved by Devin, a closed-source $21 million commercial AI programmer developed by Cognition AI. While Devin is currently only available to select developers, the Princeton team has made SWE-agent open-source to gather feedback and encourage collaboration in advancing this technology.
Why does this matter?
The rise of SWE-agent shows AI systems are becoming more sophisticated in assisting human programmers. Over time, they may change the nature of software development roles, requiring developers to focus more on high-level problem-solving and architectural design while delegating routine tasks to AI assistants. This change could make software development faster and more creative, but it might also require significant upskilling within the developer community.
Mobile-first Higgsfield aims to disrupt video marketing with AI
Former Snap AI chief Alex Mashrabov has launched a new startup called Higgsfield AI, which aims to make AI-powered video creation accessible to creators and marketers. The company's first app, Diffuse, allows users to generate original video clips from text descriptions or edit existing videos to insert themselves into the scenes.
Higgsfield is taking on OpenAI's Sora video generator but targeting a broader audience with its mobile-first, user-friendly tools. The startup has raised $8 million in seed funding and plans to further develop its video editing capabilities and AI models. While questions remain around data usage and potential for abuse, Higgsfield believes it can carve out a niche in social media marketing with its realistic, easy-to-use video generation.
Why does this matter?
Higgsfield's mobile-first approach to AI video generation could be a game-changer regarding accessibility and ease of use. The company is positioning itself to capture a significant portion of the creator economy by prioritizing consumer-friendly features and social media integration. As more users embrace these tools, we can expect to see an explosion of AI-generated content across social media platforms, which could have far-reaching implications for content authenticity and user engagement.
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: Are large language models on the trajectory of word processing or digital advertising?
In a recent newsletter post,
compares the trajectory of LLMs like ChatGPT to two very different technological developments: transformative tools like VisiCalc and Microsoft Word, and persistently overhyped microtargeted digital advertising. Despite the growing usage of LLMs, Karpf is doubtful that the technology will follow the path of word processors, where market incentives led to bugs being steadily eliminated.Instead, he sees parallels to online advertising, where problems with inaccurate targeting never seem to get fixed, even as the industry balloons. Karpf argues the business models around LLMs incentivize hype over quality, meaning the "hallucinations" and errors may not get corrected. He ultimately ties the future of AI to broader questions about capitalism's current state - will market dynamics inevitably drive AI improvement, or will they lead to lower quality as industries get hollowed out? Karpf suspects the latter is more likely without strong regulatory intervention.
Why does this matter?
If Karpf is correct and the economics of LLMs drive hype over quality, the AI community may face an ongoing struggle against overpromising, underdelivering, and eroding trust. Persistent errors and "hallucinations" could limit the practical utility of LLMs and slow adoption. The industry would also face growing pressure from policymakers and the public to address these issues.
What Else Is Happening❗
👨💻 Codiumate offers secure, compliant AI-assisted coding for enterprises
Codium AI, an Israeli startup, has launched Codiumate, a semi-autonomous AI agent, to help enterprise software developers with coding, documentation, and testing. It can help with creating development plans from existing code, writing code, finding duplicate code, and suggesting tests. Codiumate aims to make development faster and more secure, with features like zero data retention and the ability to run on private servers or air-gapped computers. (Link)
🖥️ Opera One browser becomes the first to offer local AI integration
Opera now supports 150 local LLM variants in its Opera One browser, making it the first major browser to offer access to local AI models. This feature lets users process their input locally without sending data to a server. Opera One Developer users can select and download their preferred local LLM, which typically requires 2-10 GB of storage space per variant, instead of using Opera's native browser AI, Aria. (Link)
🧠 AWS expands Amazon Bedrock with Mistral Large model
AWS has included Mistral Large in its Amazon Bedrock managed service for generative AI and app development. Mistral Large is fluent in English, French, Spanish, German, and Italian, and can handle complex multilingual tasks like text understanding, transformation, and code generation. AWS also mentioned that Mistral AI will use its Tranium and Inferentia silicon chips for future models, and that Amazon Bedrock is now in France. (Link)
🚀 Copilot gets GPT-4 Turbo upgrade and enhanced image generation
Microsoft is providing GPT-4 Turbo access to business subscribers of its AI-powered Copilot assistant, without daily limits on chat sessions. The company is also improving image generation capabilities in Microsoft Designer for Copilot subscribers, increasing the limit to 100 images per day using OpenAI's DALL-E 3 model. These upgrades are part of the $30 per user, per month pricing of Copilot for Microsoft 365. (Link)
🌐 Status invests in Matrix to create a decentralized messaging platform
Status, a mobile Ethereum client, has invested $5 million in New Vector, the company behind the open-source, decentralized communication platform Matrix.org. They plan to create a secure messaging solution for users to control their data and communicate across apps and networks. (Link)
New to the newsletter?
The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From ML to ChatGPT to generative AI and LLMs, We break down the latest AI developments and how you can apply them in your work.
Thanks for reading, and see you tomorrow. 😊