Google's AI Gives Visual Captions to Meetings
Plus: GGML for AI training at the Edge. Tafi's Text to 3D Character AI engine.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 36th edition of The AI Edge newsletter. This edition brings you Google’s On-the-fly Visual Captions for Meetings and more.
And a special thank you goes out to our amazing readers. Your ongoing support fuels our passion for delivering quality content. 😊
In today’s edition:
👀Google’s AI gives visual captions to meetings
🔥GGML for AI training at the edge
🎨Tafi announced text to 3D character AI engine
💡MeZo redefining LM training - less memory, better results
Let’s go!
Google’s AI gives Visual Captions to meetings
What if AI could show context-relevant visuals in online meetings? Google Research introduced a system for real-time visual augmentation of verbal communication called Visual Captions. It uses verbal cues to augment synchronous video communication with interactive visuals on-the-fly.
Researchers fine-tuned an LLM to proactively suggest relevant visuals in open-vocabulary conversations using a dataset curated for this purpose. Moreover, the system is also robust against typical mistakes that may often appear in real-time speech-to-text transcription (as seen above). Plus, Visual Captions is open-sourced as a part of the ARChat project.
Why does this matter?
This demonstrates the capability of language models to be trained and adapted to perform specialized tasks, especially text-to-image tasks. Moreover, by sharing the training dataset with the wider community, Google encourages further research and innovation in this space.
GGML for AI training at the edge
GGML is a C library that provides a set of machine learning (ML) tools and defines low-level ML primitives, such as tensor types. It introduces a binary format designed for distributing large language models (LLMs). This format allows LLMs to be efficiently shared and utilized on various hardware devices.
GGML leverages a technique called "quantization," which enables large language models to run effectively on consumer-grade hardware. It has also been applied to Meta’s LLaMA for fast, local inference and to OpenAI's Whisper.
Why does this matter?
LLMs are typically resource-intensive and require high-performance hardware to run efficiently. But by using quantization, GGML aims to democratize access to LLMs, making them more accessible to a wider range of users who may not have access to powerful hardware or cloud-based resources.
Tafi’s launches text to 3D character AI engine
Tafi announced its latest development: a groundbreaking text-to-3D character engine. that will transform how artists and developers create high-quality 3D characters. This innovative technology makes it easier and faster than ever before to bring ideas to life by converting text input into 3D characters.
It will help:
Create top-notch 3D characters quickly and effortlessly using plain text.
Generate billions of unique variations of 3D characters.
Easily export the character directly into Blender, Unreal, or Unity.
In the upcoming update, Tafi will enhance its compatibility with popular game engines and 3D software applications, such as adding support for NVIDIA's Omniverse.
Why does this matter?
Tafi’s Text-to-3D simplifies the most challenging parts of 3D creation so the end result is more accessible for creators and empowers individuals of all skill levels to easily bring their ideas to life.
Plus, Tafi's expansion of compatibility with popular game engines and 3D software applications, such as NVIDIA's Omniverse, further encourages cross-disciplinary collaboration and empowers creators to push the boundaries of what is possible in the realm of 3D character design.
MeZo redefining LM Training- Less memory, better results
MeZo, a memory-efficient zeroth-order optimizer, adapts the classical zeroth-order SGD method to operate in place, thereby fine-tuning language models with the same memory footprint as inference.
With a single A100 80GB GPU, MeZO can train a 30-billion parameter OPT.
Achieves comparable performance to fine-tuning with backpropagation across multiple tasks, with up to 12x memory reduction.
Can effectively optimize non-differentiable objectives (e.g., maximizing accuracy or F1).
Why does this matter?
MeZO addresses the memory limitations of backpropagation when training large language models. It offers a practical and efficient solution for training large LMs, enabling researchers to push the boundaries of model size and improve performance on various language-related tasks.
What Else Is Happening
🤖 A robot self-learns in 1 hour! (after struggling like a roach) (Link)
🤗 HuggingChat, ChatGPT’s 100% open-source alternative, adds a web search feature (Link)
✏️ Google Chat now has Smart Compose to help autocomplete your sentences (Link)
🚀 GitLab to launch AI-powered “ModelOps” to its DevSecOps platform (Link)
💬 Instagram might be working on an AI chatbot (Link)
🏢 Introducing Glean Chat, an enterprise-grade generative AI chat assistant (Link)
🔐 LlamaIndex adds private data to large language models (Link)
🧠 Edtech giant Byju’s launches transformer models in AI push (Link)
Trending Tools
ClientZen: Automated tagging for real-time reporting on customer issues. Works with Intercom, Zendesk, and 80+ sources.
MeBoom: AI avatar generator. Create avatar portraits in multiple styles and customize them within seconds.
Pitch Avatar: AI-powered platform for slides. Generate scripts, voice-overs, and analytics, and make an avatar present for you.
Storly.ai: Generates meaningful writing prompts to help you write stories and experiences.
AI Questions Generator: Generate questions– multiple choice questions, true or false, fill in the blanks, and open questions– from any text.
Kai by Gleap: AI bot that automates customer support using website context, FAQs, and help articles.
Campaign Assistant by HubSpot: AI-powered tool that generates landing page, email, and ads copy for free.
Puppies AI: Generate cute puppy images quickly with AI. API is available for developers.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for Engineering Leaders and AI enthusiasts.
Thanks for reading, and see you tomorrow.