MotionGPT: A versatile text-to-motion AI
Plus: DragDiffusion's interactive image editing. Google’s pgvector power AI applications.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 50th edition of The AI Edge newsletter. This edition brings you “MotionGPT: A versatile text-to-motion AI.”
And a big thanks to all our incredible readers!😊
In today’s edition:
🚀 MotionGPT: A versatile text-to-motion AI
🎨 DragDiffusion: Giving Diffusion models interactive point-based image editing
⚡️ Google’s new pgvector power AI applications
📊 Verge polled 2k people about using AI
Let’s go!
MotionGPT: A versatile text-to-motion AI
New research has proposed MotionGPT– a unified, versatile, and user-friendly motion-language model for multiple motion-relevant tasks. It is a generative pre-trained model which treats human motion as a foreign language and introduces natural language models into motion-relevant generation.
It is driven by the insight that fusing motion and language data into a single vocabulary makes the relationship between motion and language more apparent. This enhances the performance of motion-related tasks, even with larger-scale data and models.
The research has also proposed a general motion benchmark for multi-task evaluation, wherein MotionGPT achieves competitive performance across diverse tasks, including text-to-motion, motion-to-text, motion prediction, and motion in-between, with all available codes and data.
Why does this matter?
Advancements in building a unified model for language and other multimodal data, such as motion, remain challenging and less explored. This pre-trained model, capable of supporting numerous motion-relevant tasks through prompts, signifies a step ahead and should benefit diverse fields like gaming, robotics, virtual assistant, and human behavior analysis.
DragDiffusion: Giving Diffusion models interactive point-based image editing
Recently, DragGAN has enabled interactive point-based image editing, i.e., “drag” editing. Although achieving impressive results with pixel-level precision, its applicability is limited by the inherent capacity of the pre-trained generative adversarial networks GAN models (since it is based on GANs).
To remedy this, DragDiffusion extends such an editing framework to diffusion models. Plus, it achieves precise spatial control by optimizing the diffusion latent.
Why does this matter?
Since most previous diffusion-based image editing methods mainly rely on controlling text embeddings, they can only achieve high-level semantic editing instead of precise pixel-level spatial control. DragDiffusionremedies that, and it enables leveraging large-scale pre-trained diffusion models, thus broadening the applicability of “drag” editing in real-world scenarios.
Google’s new pgvector power AI-enabled applications
Google Cloud has announced support for storing and querying vectors in Cloud SQL for PostgreSQL and AlloyDB for PostgreSQL. It allows users to store and index vector embeddings generated by LLMs using the pgvector PostgreSQL extension. It enables efficient searching for similar items using exact and approximate nearest-neighbor search algorithms.
Vector embeddings provide numerical representations of complex user-generated content, such as text, audio, and video, making storing, manipulating, and indexing data easier.
This is a step-by-step tutorial on How to add GenAI features to your own applications with just a few lines of code using pgvector, LangChain, and LLMs on Google Cloud and build AI-powered applications.
Why does this matter?
With only a few lines of code, developers can seamlessly integrate AI features into their applications. This accessibility is a game-changer, as it democratizes using LLMs and eliminates the need for extensive ML expertise or custom model training. This means that even developers without prior ML knowledge can tap into the vast capabilities of LLMs and create AI-driven solutions tailored to their specific needs.
Verge polled 2k people about using AI
The potential impact of AI on the world is uncertain. While some envision opportunities to remove constraints, automate tasks, and revolutionize learning, others worry about its potential to spread misinformation, displace jobs, and pose safety risks if not properly managed.
And to find out what people think about AI and what they want, The Verge collaborated with Vox Media's Insights and Research team with research consultancy firm The Circus and surveyed over 2,000 US adults to understand their perspectives on AI.
The results reveal a mix of uncertainty, fear, and optimism surrounding this emerging technology. Many respondents have limited experience with AI and express concerns about its potential but also hold high expectations for its future benefits.
How is AI being used?
How do people feel about AI’s impact on society?
Why does this matter?
By understanding the public's attitudes and expectations towards AI, stakeholders can work towards aligning AI with societal needs, addressing fears, and ensuring the technology is user-friendly and ethically sound. Ultimately, the survey contributes to a better understanding of how AI can be beneficially integrated into society.
What Else Is Happening❗
🔎 WALDO 2.0, a powerful AI tool that detects objects extremely fast from drone footage (Link)
💰 Databricks picks up MosaicML, an OpenAI competitor, for $1.3B (Link)
❄️ Snowflake and NVIDIA team up to help businesses harness their data for generative AI (Link)
🧠 Merlyn Mind launches education-focused LLMs for classroom integration of generative AI (Link)
💡 ChatGPT Plus users can use Browsing Mode to navigate around paywalled articles (Link)
🤖 NASA to roll out ChatGPT-like chatbot for astronauts to talk to spacecraft (Link)
🛠️ Trending Tools
tl;dr AI: Summarize web articles with tl;dr. Get straight to the point.
Web2chat: Engage customers with AI chatbots, live chat, and analytics.
Slated AI: Efficiently manage large meetings, no more back-and-forth.
Swantide: Deploy customized workflows to your CRM easily with AI-powered RevOps assistant.
Optic: Realtime transcription, AI-generated meeting summaries for professional communication.
Angry Email Translator: Transform angry emails into polite and professional ones.
Tantl: Simplify SQL queries with generative AI that learns your team's knowledge.
Iconwizard AI: Bring your icon and logo ideas to life with AI-powered Icon Generator.
That's all for now!
Stay ahead of the curve! Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you tomorrow. 😊