NVIDIA Redefines LLM Training with New Synthetic Data Model

Plus: Meta pauses AI model training in EU due to regulatory pushbacks, Spotify launches ‘Creative Labs’ to test Gen AI voiceover ads.

Jun 17, 2024

Hello Engineering Leaders and AI Enthusiasts!

Welcome to the 298th edition of The AI Edge newsletter. This edition features how NVIDIA redefines LLM training with new synthetic data model.

And a huge shoutout to our amazing readers. We appreciate you😊

In today’s edition:

💻 NVIDIA's AI model for synthetic data generation rivals GPT-4
⚠️ Meta pauses AI model training in EU due to regulatory pushback
🎵 Spotify launches 'Creative Labs' to test Gen AI voiceover ads
🧠 Knowledge Nugget: How to use Perplexity in your PM work by
Lenny Rachitsky

Let’s go!

NVIDIA’s AI model for synthetic data generation rivals GPT-4

NVDIAI has released Nemotron-4 340B, an open-source pipeline for generating high-quality synthetic data. It includes a base model trained on 9M tokens, an instruction, and a reward model.

The instruction model can generate diverse synthetic data that mimics real-world data.
The reward model then evaluates the generated data to filter out high-quality responses.
This interaction between the two models produces better training data over time.

Note: 98% of the training data used to fine-tune the Instruct model is synthetic and was created using NVIDIA’s pipeline.

In benchmarks such as MT-Bench, MMLU, GSM8K, HumanEval, and IFEval, the Instruct model generally performs better than other open-source models such as Llama-3-70B-Instruct, Mixtral-8x22B-Instruct-v0.1, and Qwen-2-72B-Instruct, and in some tests, it even outperforms GPT-4o.

It also performs comparable to or better than OpenAI's GPT-4-1106 in human evaluation for various text tasks, such as summaries and brainstorming. The technical report provides detailed benchmarks.

Why does it matter?

This development allows businesses to create powerful, domain-specific LLMs without the need for extensive, costly real-world datasets. It has significant potential impacts across various industries, such as healthcare (drug discovery, personalized medicine, medical imaging), finance (fraud detection, risk assessment, customer service), manufacturing (predictive maintenance, supply chain optimization), and retail (personalized customer experiences).

Source

Meta pauses AI model training in EU due to regulatory pushbacks

In response to the regulatory pressure from the Irish Data Protection Commission and the UK's Information Commissioner's Office, Meta has decided to pause its plans to train its large language model, Llama, using public content shared by Facebook and Instagram users in the European Union and the UK.

The regulators expressed concerns about Meta's plan to use this user-generated content to train its AI systems without obtaining explicit user consent. Meta relied on a GDPR provision called "legitimate interests" to justify this data usage, but the regulators felt this was insufficient. Meta has decided to delay the launch of its AI chatbot in Europe until it can address the regulators' concerns and establish a more transparent user consent process.

Why does it matter?

Meta's inability to use EU user data for AI training is a setback for its regional AI ambitions. It could disadvantage Meta against competitors who can leverage such data. This situation highlights the ongoing tensions between tech companies' desire to utilize consumer data for AI development and regulators' efforts to protect user privacy. Striking the right balance between innovation and privacy will be a major challenge as the AI race intensifies.

Source

Spotify launches ‘Creative Labs’ to test Gen AI voiceover ads

Spotify has launched a new in-house creative agency called “Creative Lab.” This agency will help brands and advertisers create custom campaigns for Spotify's platform. Creative Lab teams in different markets will provide local insights and collaborate with brands to develop campaigns through workshops, inspiration sessions, and collaborative ideation.

In addition, Spotify is also testing a new AI tool called "Quick Audio" that will allow brands to create scripts and voiceovers using generative AI technology. This new capability will be integrated into Spotify's ad manager platform, giving advertisers more options to produce audio ads for Spotify's audience of over 615 million listeners.

Why does it matter?

This move emphasizes Spotify's ambition to become a full-service advertising platform. Marketers and advertisers will have new creative and production capabilities available through Spotify to better reach the platform's large and engaged user base in unique ways, including potentially using AI-generated audio ads. This could disrupt traditional advertising models and open new possibilities for how brands connect with consumers on audio platforms.

Source

Enjoying the daily updates?

Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.

Refer a friend

When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.

Knowledge Nugget: How to use Perplexity in your PM work

Chatbots like ChatGPT and Perplexity are becoming increasingly popular among product managers, with over 50% of PMs using a chatbot daily and 85% using one weekly. The author,

Lenny Rachitsky

recently did a deep dive into how Perplexity builds products and spoke with their team about how PMs use Perplexity in their work.

Based on a survey of over 300 product managers and follow-up calls, Rachitsky has compiled a comprehensive collection of 27 ways PMs are using Perplexity, including understanding and crafting growth strategy, finding benchmarks, doing market research, learning best practices, evaluating popular tools, and understanding technical jargon.

Here are some of the prompts that PM can use:

1. Explain growth accounting to a product manager

2. What is the average open rate for push notifications on Android and iOS?

3. Notion’s AI go-to-market strategy

4. List of brainstorming techniques for product managers

Why does it matter?

The widespread adoption of these AI-powered chatbots is a significant shift in how PMs work. It allows them to quickly get information, insights, and ideas to be more productive and effective in their roles. PMs should start experimenting with these tools, as they will likely become integral to the PM workflow shortly.

Source

What Else Is Happening❗

🍎 Apple enters the AI icon race to find a logo that makes sense

Apple has joined other tech giants like Google, OpenAI, Anthropic, and Meta in the race to find an iconic visual representation for AI. No company has yet created an unambiguous "AI logo" that conveys the concept to users. AI's lack of a clear visual identity reflects the difficulty of representing such a broad and evolving technology in a simple icon. (Link)

📝 Niloom.AI launches gen AI content creation platform for spatial computing

Without extensive technical expertise, the platform allows users to create, prototype, edit, and instantly publish sophisticated AR/VR content using text or speech prompts. It consolidates the entire creative process, from ideation to publishing, and integrates with various third-party tools to provide a one-stop solution for spatial computing content creation. (Link)

🏟️ AI to delete abusive posts against athletes during the 2024 Paris Olympics

The International Olympic Committee (IOC) will deploy AI at the 2024 Paris Olympics to automatically detect and erase abusive social media posts directed at athletes and officials. The AI tool will monitor posts about 15,000 athletes and officials and immediately remove any content involving hate speech, bullying, or political attacks. (Link)

🖼️ Picsart and Getty team up to counter Adobe’s “commercially-safe” AI

Picsart has partnered with Getty Images to develop a "responsible, commercially-safe" AI image generator tool. The AI model will be trained exclusively on Getty's licensed stock content to address concerns about AI-generated content violating copyright laws. Picsart hopes to provide a viable alternative to Adobe's Firefly by leveraging Getty's library of licensed images. (Link)

📰 Yahoo News gets an AI-powered revamp with Artifacts integration

Yahoo has acquired the technology behind the Artifact news aggregation app and is launching a new AI-powered Yahoo News app. The app will feature a personalized news feed based on user interests and a "Key Takeaways" feature that provides bullet-point summaries of articles. Users can also flag problematic content, which the AI will then try to rewrite. (Link)

New to the newsletter?

The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.

Thanks for reading, and see you tomorrow. 😊

The AI Edge

NVIDIA Redefines LLM Training with New Synthetic Data Model

Plus: Meta pauses AI model training in EU due to regulatory pushbacks, Spotify launches ‘Creative Labs’ to test Gen AI voiceover ads.

NVIDIA’s AI model for synthetic data generation rivals GPT-4

Meta pauses AI model training in EU due to regulatory pushbacks

Spotify launches ‘Creative Labs’ to test Gen AI voiceover ads

Enjoying the daily updates?

Knowledge Nugget: How to use Perplexity in your PM work

What Else Is Happening❗

New to the newsletter?

Discussion about this post