Mind2Web: AI Automates Web Tasks
Plus: Salesforce launches AI cloud, Hugging Face Transformers latest features
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 40th edition of The AI Edge newsletter. This edition brings you Mind2Web, the first dataset and model for building generalist web agents.
And a special thank you to our amazing readers. Your ongoing support fuels our passion for delivering quality content. 😊
In today’s edition:
🤖 Mind2Web: AI automates your tedious web tasks
☁️Salesforce brings genAI to enterprises with AI Cloud
🤗Transformers v4.30 gets impressive new features
📚 Knowledge Nugget: Why are Large Language Models general learners by
Let’s go!
Mind2Web: AI automates your tedious web tasks
Mind2Web is a newly introduced dataset aimed at developing and evaluating generalist agents for the web. It provides a diverse range of over 2,000 open-ended tasks collected from 137 real-world websites across 31 domains. The dataset includes crowdsourced action sequences for these tasks, making it suitable for training agents that can follow language instructions to complete complex tasks on any website.
Unlike existing datasets, Mind2Web focuses on real-world websites rather than simulated ones, offering a broad spectrum of user interaction patterns. Researchers have explored using LLMs for building generalist web agents using Mind2Web, demonstrating decent performance even on unseen websites.
Why does this matter?
Mind2Web tackles the limitations of existing datasets, enabling the development of versatile web agents. It focuses on real-world websites, Unlike previous datasets that relied on simulated or simplified websites, Mind2Web incorporates the complexity and diversity of actual websites, making it more applicable to real-world scenarios.
Salesforce brings genAI to enterprises with AI Cloud
Salesforce has launched AI Cloud, a suite of capabilities to bring trusted generative AI to the enterprise. The new Einstein GPT Trust Layer sets an industry standard for secure generative AI, addressing privacy, data security, and compliance concerns. The AI Cloud's features powered by Einstein enable various departments to automate tasks and personalize interactions. By filling the trust gap associated with generative AI, Salesforce aims to provide customers with innovative, efficient, and secure AI-powered solutions.
Also, Salesforce commits to investing $500 million in startups focused on generative AI. This expansion would enable them to work with even more entrepreneurs and accelerate the enterprise's development of transformative AI solutions.
Why does this matter?
Salesforce's AI Cloud brings trusted GenAI to the enterprise with the Einstein GPT Trust Layer. Promises to improve productivity and customer experiences while addressing privacy and data security concerns.
Additionally, Salesforce's $500 million investment will have a significant impact on businesses, driving AI adoption and fueling growth in the industry.
Hugging Face Transformers v4.30 gets impressive new features
The latest version of Hugging Face Transformers (v4.30) includes some impressive new features:
4-bit quantization, which allows you to run LLMs on much smaller devices. You can now run a 30B model on an off-the-shelf 24GB GPU.
Support for conditional image-to-text generation with the pipeline. You can use it to steer the generation in a certain direction or for visual question answering (VQA).
You can now run agents locally– the control is yours, no dependencies on external APIs.
Safetensors as a core dependency. Safetensors' security has been audited by an external company and will become the default serialization solution, regardless of the ML framework.
Speech recognition from 1000+ languages. Meta's MMS has been incorporated into 🤗 transformers, allowing you to handle language diversity.
Why does this matter?
This reinforces HuggingFace Transformers’ position as a comprehensive and cutting-edge library for working with transformer models. For instance, the 4-bit format will make models more resource-efficient at inference and training time. Overall, these upgrades expand the capabilities of transformers, their security, and advanced AI applications across various domains.
Knowledge Nugget: Why are Large Language Models general learners?
This interesting article by
explores why large language models (LLMs) are considered general learners and differ from next token prediction.Predicting the next token accurately requires understanding of underlying reality, enabling LLMs to expand knowledge into various domains. Taking examples of solving addition problems and predicting words in haiku poems, the article illustrates how a deeper understanding of reality simplifies next-token prediction tasks. Thus, LLMs expand their knowledge in complex fields like medicine or law and even pass exams like MCAT or LSAT.
Why does this matter?
The article provides intuition regarding the general learning capabilities of LLMs and highlights the role of understanding underlying phenomena in improving prediction tasks. It can guide AI engineers and researchers in developing versatile models, expanding task capabilities, improving transfer learning, enhancing problem-solving abilities, and facilitating domain adaptation.
What Else Is Happening
🔄Replace anything you want with Segment Anything + ControlNet (Link)
💥ChatGPT added 112 new plugins in a single day, total count 390 (Link)
📝TikTok launches free AI tools that can create ad scripts in seconds (Link)
🤖Amazon is using AI to identify fake reviews and comments (Link)
🚀Oracle is developing a new cloud service with AI startup, Cohere (Link)
🛡️OpenAI, DeepMind, and Anthropic to give UK early access to their models for AI safety research (Link)
Trending Tools
Bito AI: It understands your codebase and uses GPT-4 to help devs write code, tests, comments, and more.
Taiga: An AI coding mentor that integrates into Slack to provide instant feedback and guidance.
AI-Powered LinkedIn Carousel Creator: Create educational carousel posts for LinkedIn with our AI-Assistant in minutes.
Octopulse AI: Octopulse maximizes user-facing notifications and emails for growth teams by getting five Rights right.
Arvin 2.0: A freemium Chrome extension that provides automatic summaries and generates social media posts.
Geometrik: An AI-powered app that helps restaurants boost revenue, cut costs, and enhance customer experience.
Polyglot: One-step localization for your mobile app. Drop in the SDK into your project and run the build.
Mems AI Photo Enhancer: A free one-click photo editing app that uses machine learning to retouch photos flawlessly.
That's all for now!
Subscribe to The AI Edge and join the impressive list of readers that includes professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other reputable organizations.
Thanks for reading, and see you tomorrow.