AI Weekly Rundown (October 14 to October 20)
Major AI announcements from NVIDIA, Microsoft, Meta, Google, Amazon, and more.
Hello Engineering Leaders and AI Enthusiasts!
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
🤝 NVIDIA's new collab for text-to-3D AI
📈 MemGPT boosts LLMs by extending context window
🤑 Microsoft’s new AI Bug Bounty Program with rewards upto $15k
🍏 42% of Mac users use AI-based apps daily, finds new report📽️
ElevenLabs partners with Pictory AI for realistic AI video voices
🚀 China's Baidu unveils Ernie 4.0 to rival GPT-4
⚡ NVIDIA makes AI 4x faster with TensorRT-LLM🏥
ChatGPT outperforms doctors in depression treatment
🔐 BlackBerry announces Gen AI cybersecurity assistant
🧠 Meta’s new AI for real-time decoding of images from brain activity
🤖 Fuyu-8B: A simple, superfast multimodal model for AI agents
🔥 GPT-4V got even better with Set-of-Mark (SoM)
📦 Amazon’s 2 AI-based robots for rapid deliveries & workplace safety
🔬 DeepMind’s MuJoCo 3.0 is the ultimate tool for robotics research
🗣️ Google Search’s new AI-powered feature to improve language skills
Let’s go!
NVIDIA's new collab for text-to-3D AI
NVIDIA and Masterpiece Studio have launched a new text-to-3D AI playground called Masterpiece X - Generate. The tool aims to make 3D art more accessible by using generative AI to create 3D models based on text prompts. It is browser-based and requires no prior knowledge or skills.
Users simply type in what they want to see, and the program generates the 3D model. While it may not be suitable for high-fidelity or AAA game assets, it is great for quickly iterating and exploring ideas.
The resulting assets are compatible with popular 3D software. The tool is available on mobile and works on a credit basis. By creating an account, you'll get 250 credits and will be able to use Generate freely.
MemGPT boosts LLMs by extending context window
MemGPT is a system that enhances the capabilities of LLMs by allowing them to use context beyond their limited window. It uses virtual context management inspired by hierarchical memory systems in traditional operating systems.
MemGPT intelligently manages different memory tiers to provide an extended context within the LLM's window and uses interrupts to manage control flow. It has been evaluated in document analysis and multi-session chat, where it outperforms traditional LLMs. The code and data for MemGPT are also released for further experimentation.
Microsoft’s new AI Bug Bounty Program with rewards upto $15k
Microsoft has launched a new AI program called the Microsoft AI Bug Bounty Program, offering rewards of up to $15,000. The program focuses on the AI-powered Bing experience, with eligible products including Bing Chat, Bing Image Creator, Microsoft Edge, Microsoft Start Application, and Skype Mobile Application.
The program is part of Microsoft's ongoing efforts to protect customers from security threats and reflects the company's investment in AI security research. Security researchers can submit their findings through the MSRC Researcher Portal & earn rewards, and Microsoft is excited to learn and improve its vulnerability management process for AI systems.
42% of Mac users use AI-based apps daily, finds new report
Setapp, the curated app subscription service for macOS and iOS by MacPaw, has released its 3rd annual Mac Apps Report. The report collected responses from Mac users, mostly in the US. Its findings highlight that 42% of respondents use AI-based apps daily. And 63% of AI-based app users believe AI tools are more beneficial.
Its latest Mac Developer Survey also showed that 44% of Mac developers have already implemented AI/ML models in their apps, while 28% are working on it.
ElevenLabs partners with Pictory AI for realistic AI video voices
ElevenLabs has been focused on pushing the boundaries of what's possible with AI voice technology. And Pictory AI is renowned for its proprietary algorithms that transform text into video.
With the integration of ElevenLabs' advanced AI voice technology, Pictory users will now be able to add 51 new hyper-realistic AI voices to their videos, enhancing engagement and personalizing the viewer's experience.
China's Baidu unveils Ernie 4.0 to rival GPT-4
Baidu, China’s Google equivalent, unveiled the newest version of its generative AI model today, Ernie 4.0, saying its capabilities were on par with those of OpenAI’s pioneering GPT-4 model. The reveal focused on the model’s memory capabilities and showed it writing a martial arts novel in real-time, but no concrete benchmark performance figures were disclosed.
NVIDIA makes AI 4x faster with TensorRT-LLM
NVIDIA is bringing its TensorRT-LLM AI model to Windows, providing a 4x boost to consumer PCs running GeForce RTX and RTX Pro GPUs. The update includes a new scheduler called In-Flight batching, allowing for dynamic processing of smaller queries alongside larger compute-intensive tasks.
Optimized open-source models are now available for download, enabling higher speedups with increased batch sizes. TensorRT-LLM can enhance daily productivity tasks such as chat engagement, document summarization, email drafting, data analysis, and content generation. It solves the problem of outdated or incomplete information by using a localized library filled with specific datasets. TensorRT acceleration is now available for Stable Diffusion, improving generative AI diffusion models by up to 2x.
The company has also released RTX Video Super Resolution version 1.5, enhancing LLMs and improving productivity.
ChatGPT outperforms doctors in depression treatment
According to the study, ChatGPT makes unbiased, evidence-based treatment recommendations for depression that are consistent with clinical guidelines and outperform human primary care physicians. The study compared the evaluations and treatment recommendations for depression generated by ChatGPT-3 and ChatGPT-4 with those of primary care physicians.
Vignettes describing patients with different attributes and depression severity were input into the chatbot interfaces.
However, further research is needed to refine the chatbot recommendations for severe cases and to address potential risks and ethical issues associated with using artificial intelligence in clinical decision-making.
BlackBerry announces Gen AI cybersecurity assistant
BlackBerry has announced a new generative AI-powered cybersecurity assistant for its Cylance AI customers. The solution predicts customer needs and proactively provides information, eliminating the need for manual questions. It compresses research hours into seconds and offers a natural workflow instead of an inefficient chatbot experience.
BlackBerry, known for its innovation in the technology industry, has more than 5 times the AI/ML patents than its competitors. The company was also one of the first signatories of Canada's voluntary Code of Conduct on the responsible development and management of advanced Generative AI systems. The cybersecurity assistant will initially be available to a select group of customers.
📢 Invite friends and get rewards 🤑🎁
Enjoying AI updates? Refer friends and get perks and special access to The AI Edge.
Get 400+ AI Tools and 500+ Prompts for 1 referral.
Get A Free Shoutout! for 3 referrals.
Get The Ultimate Gen AI Handbook for 5 referrals.
When you use the referral link above or the “Share” button on any post, you'll get credit for any new subscribers. Simply send the link in a text, email or share it on social media with friends.
Meta’s new AI for real-time decoding of images from brain activity
New Meta research has showcased an AI system that can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant.
Using magnetoencephalography (MEG), this AI system can decode the unfolding of visual representations in the brain with an unprecedented temporal resolution.
The results:
While the generated images remain imperfect, overall results show that MEG can be used to decipher, with millisecond precision, the rise of complex representations generated in the brain.
Fuyu-8B: A simple, superfast multimodal model for AI agents
Adept is releasing Fuyu-8B, a small version of the multimodal1 model that powers its product. The model is available on Hugging Face. What sets Fuyu-8B apart is:
Its extremely simple architecture doesn’t have an image encoder. This allows easy interleaving of text and images, handling arbitrary image resolutions, and dramatically simplifies both training and inference.
It is super fast for copilot use cases where latency really matters. You can get responses for large images in less than 100 milliseconds.
Despite being optimized for Adept’s use case, it performs well at standard image understanding benchmarks such as visual question-answering and natural-image-captioning.
GPT-4V got even better with Set-of-Mark (SoM)
New research has introduced Set-of-Mark (SoM), a new visual prompting method, to unleash extraordinary visual grounding abilities in large multimodal models (LMMs), such as GPT-4V.
As shown below, researchers employed off-the-shelf interactive segmentation models, such as SAM, to partition an image into regions at different levels of granularity and overlay these regions with a set of marks, e.g., alphanumerics, masks, boxes.
The experiments show that SoM significantly improves GPT-4V’s performance on complex visual tasks that require grounding.
Amazon’s 2 new-gen AI-powered robots
Amazon has announced two new robotic solutions, Sequoia and Digit, to assist employees and improve delivery for customers. Sequoia, operating at a fulfillment center in Houston, Texas, helps store and manage inventory up to 75% faster, allowing for quicker listing of items on Amazon.com and faster order processing. It integrates multiple robot systems to containerize inventory and features an ergonomic workstation to reduce the risk of injuries.
Sparrow, a new robotic arm, consolidates inventory in totes. Amazon is also testing mobile manipulator solutions and the bipedal robot Digit to enhance collaboration between robots and employees further.
DeepMind’s MuJoCo 3.0 is the ultimate tool for robotics research
Google DeepMind has released MuJoCo 3.0, an updated version of their open-source tool for robotics research. This new release offers improved simulation capabilities, including better representation of various objects such as clothes, screws, gears, and donuts.
Additionally, MuJoCo 3.0 now supports GPU and TPU acceleration through JAX, enabling faster and more powerful computations.
Google Search’s new AI-powered feature to improve language skills
Google Search is introducing a new feature that allows English learners to practice speaking and improve their language skills. Android users in select countries can engage in interactive speaking practice sessions, receiving personalized feedback and daily reminders to keep practicing.
The feature is designed to supplement existing learning tools and is created in collaboration with linguists, teachers, and language experts. It includes contextual translation, personalized real-time feedback, and semantic analysis to help learners communicate effectively. The technology behind the feature, including a deep learning model called Deep Aligner, has led to significant improvements in alignment quality and translation accuracy.
That's all for now!
If you are new to ‘The AI Edge’ newsletter. Subscribe to receive the ‘Ultimate AI tools and ChatGPT Prompt guide’ specifically designed for engineering leaders and AI enthusiasts.
Thanks for reading, and see you on Monday. 😊