Meta AI's New Dataset Understands 122 Languages
Plus: Stability AI’s 1st Japanese Vision-Language Model. Transformers as Support Vector Machines.
Hello, Engineering Leaders and AI Enthusiasts!
Welcome to the 98th edition of The AI Edge newsletter. This edition brings you Meta AI's new dataset, ‘Belebele,’ which understands 122 languages.
And a huge shoutout to our incredible readers. You all rock! 😊
In today’s edition:
🗺️ Meta AI's New Dataset Understands 122 Languages
👏🚩
Stability AI’s 1st Japanese Vision-Language Model
🤖 Transformers as Support Vector Machines
🧠 Knowledge Nugget: The market for AI companies by
Let’s go!
Meta AI's New Dataset Understands 122 Languages
Meta AI announced Belebele, a multilingual reading comprehension dataset with 122 language variants. It allows for evaluating text models in high, medium, and low-resource languages, expanding the language coverage of natural language understanding benchmarks.
The Belebele dataset consists of questions based on short passages from the Flores-200 dataset, with four multiple-choice answers. The questions were designed to test different levels of general language comprehension. The dataset enables direct comparison of model performance across all languages and was used to evaluate multilingual masked language models and large language models. The results show that smaller multilingual models perform better in understanding multiple languages.
Why does this matter?
The Belebele dataset expands language coverage, benefiting end users with better AI understanding in various languages. It sets a benchmark for AI models, potentially reshaping competition as smaller models outperform larger ones. It provides new opportunities for evaluating and analyzing the multilingual capabilities of NLP systems.
Stability AI’s 1st Japanese Vision-Language Model
Stability AI has released Japanese InstructBLIP Alpha, a vision-language model that generates textual descriptions for input images and answers questions about them. It is built upon the Japanese StableLM Instruct Alpha 7B and leverages the InstructBLIP architecture.
(Figure. Output: “Two persons sitting on a bench looking at Mt.Fuji”)
The model can accurately recognize Japan-specific objects and process text input, such as questions. It is available on Hugging Face Hub for inference and additional training, exclusively for research. This model has various applications, including search engine functionality, scene description, and providing textual descriptions for blind individuals.
Why does this matter?
This breakthrough ensures improved image understanding and greater accessibility for the visually impaired within the Japanese-speaking community. Furthermore, it serves as a pioneering model that may pave the way for similar innovations in other languages and expand the reach of text-to-image AI models globally. This not only benefits end users but also sets a new benchmark for AI model performance and availability, potentially affecting the competitive landscape across different language markets.
Transformers as Support Vector Machines
This paper establishes a formal equivalence between the optimization geometry of self-attention in transformers and a hard-margin Support Vector Machine (SVM) problem. It shows that optimizing the attention layer of transformers converges towards an SVM solution that minimizes the nuclear norm of the combined parameter.
The study also proves the convergence of gradient descent under suitable conditions and introduces a more general SVM equivalence for nonlinear prediction heads. These findings suggest that transformers can be interpreted as a hierarchy of SVMs that separate and select optimal tokens.
Why does this matter?
This uncovers a deep connection between transformers and Support Vector Machines, shedding light on how transformers optimize attention layers. It can lead to improved AI models that better understand and select tokens, potentially benefiting end users with more accurate and efficient language processing.
📢 Invite friends and get rewards 🤑🎁
Enjoying the daily AI updates? Refer friends and get perks and special access to The AI Edge.
Get 400+ AI tools and 500+ prompts for 1 referral.
Get a free shoutout for 3 referrals!
Get The Ultimate Gen AI Handbook for 5 referrals.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text, email, or share it on social media with friends.
Knowledge Nugget: The market for AI companies
The author
Here explains, the market for AI companies is highly competitive, with a few winners dominating their respective markets. Most AI applications aim to improve existing companies/products rather than create new markets. The scarcity of GPUs and the accumulation of data and research provide advantages to established companies.The current AI funding scene is driven by FOMO and logo hunting, with valuations that may not be justified. AI companies fall into two categories: those that require massive capital and those that can be self-funded. Seed-stage investments in AI companies can be risky, as many may not find a product-market fit.
Why does this matter?
This article provides valuable insights into the competitive landscape of the AI market and the dynamics that shape it and understanding these factors is crucial for investors, entrepreneurs, and anyone interested in the AI industry to make informed decisions, mitigate risks, and seize opportunities.
What Else Is Happening❗
🌐 Anguilla is generating 10s of millions by leasing out domain names with the ".ai" extension. (Link)
🐦 X, Previously Twitter's revised policy confirms it will use public data to train AI models. (Link)
📸 Pika Labs has introduced a new parameter called -fps N: - (Link)
🧠 Google DeepMind Founder sees a great potential for AI in mental health. (Link)
🎒 Microsoft has filed a patent for AI-assisted wearables, including a backpack. (Link)
🛠️ Trending Tools
UTMStack: Launch your own 24x7 Security Operations Center with this Threat Management, SIEM, and Compliance Solution.
Income Statement Generator: Generate an income statement for your company that you can print, export to a spreadsheet, and analyze with AI.
VideoBox: All-in-one Video Creation Hub with free resources, stunning effects, and AI tools.
Talklab: AI-driven insights from customer chats for detailed, actionable reports.
Artvisio AI: Effortless legible Call to Action Text creation in seconds for any purpose with simple prompt.
Audiosonic: Meet Audiosonic—the ultimate AI voice generator! Convert text to lifelike speech in seconds.
CandyIcons: AI design tool to quickly generate high-quality app icons in 3 easy steps. Extensive collection of pre-made icons available.
Bot Butcher: Tired of contact form spam? Our easy-to-use API classifies spam using AI. Stop spam bot messages before they reach your inbox.
🧐 Monday Musings: How Coders Can Survive and Thrive in a ChatGPT World
Generative AI powered by large language models can potentially affect the coders. However, experts believe that AI won't immediately replace human programmers. Coders can take certain steps to stay ahead in a world dominated by generative AI.
These include specializing in niche areas, focusing on high-level problem-solving skills, collaborating with AI systems, and continuously learning and adapting to new technologies. By implementing these strategies, programmers can survive and thrive in an AI-driven coding landscape.
Here are 4 tips for programmers to stay ahead of generative AI:
Stick to Basics and Best Practices
Find the Tool That Fits Your Needs
Clear and Precise Conversations Are Crucial
Be Critical and Understand the Risks
That's all for now!
By subscribing to The AI Edge, you join the company of readers from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other reputable organizations.
Thanks for reading, and see you tomorrow. 😊