OpenAI’s Secret to Developing Models like GPT-4
Plus: Google SGE can now generate images and drafts, AI tool can predict new viral variants.
Hello Engineering Leaders and AI Enthusiasts!
Welcome to the 125th edition of The AI Edge newsletter. This edition brings you OpenAI's secret to how it develops models like GPT-4.
And a huge shoutout to our incredible readers. We appreciate you!😊
In today’s edition:
🧠 OpenAI reveals how it develops models like GPT-4
🆕
Google SGE can now generate images and drafts
😲 New AI tool can predict viral variants before they emerge
📚 Knowledge Nugget: How I think about LLM prompt engineering by
Let’s go!
OpenAI reveals how it developed GPT-4 model
If you're looking for a simple, straightforward breakdown of how and what goes on at OpenAI, here’s an explainer revealed by the maker of ChatGPT. OpenAI explains how it develops its foundation models, makes them safer, and much more.
Developing an advanced language model like GPT-4 requires:
Pre-training: to teach models intelligence, such as the ability to predict, reason, and solve problems by showing a vast amount of human knowledge over months.
Post-training: to incorporate human choice into the model to make it safer and more usable.
Before publicly releasing GPT-4, OpenAI spent 6 months on post-training. During which, it developed techniques to teach the models to refuse to respond to requests that may lead to potential harm. OpenAI made GPT-4 82% less likely to respond to such requests compared to GPT-3.5. OpenAI also used this time to increase the likelihood of producing factual responses by 40%, making it more conversational, and improving its performance on low-resourced languages.
Why does this matter?
Apart from offering a surface-level (but insightful) understanding of how it develops its foundation models, OpenAI makes a definitive statement about the essence of its work. Moreover, there’s so much misinformation about it out there, that this statement serves as a vital corrective. A must-read for every AI enthusiast!
Google SGE can now generate images and drafts
Google is bringing new capabilities to its AI-powered Search experience (SGE).
Image generation: Now SGE can whip up images if you type a description in search (below is an example). And every image generated through SGE will have metadata labeling and embedded watermarking to indicate that it was created by AI. Google is also coming up with a tool called About this Image that will help people easily assess the context and credibility of images.
Written drafts in SGE: To avoid longer-running searches for writing ideas and inspirations, SGE will write drafts for and also make them shorter or change the tone. From there, it's easy to export your draft to Google Docs or Gmail.
Why does this matter?
Google Search has long been a place where you go with life’s questions or problems, and AI is letting Google do more with it with these nice-to-have features. But does it really matter? Because Google still has a 91.58% share in the search engine market, a stat OpenAI couldn’t budge even if its ChatGPT and Dall-E are better for the above tasks.
New AI tool can predict viral variants before they emerge
A new AI tool named EVEscape, developed by researchers at Harvard Medical School and the University of Oxford, can make predictions about new viral variants before they actually emerge and also how they would evolve.
In the study, researchers show that had it been deployed at the start of the COVID-19 pandemic, EVEscape would have predicted the most frequent mutations and identified the most concerning variants for SARS-CoV-2. The tool also made accurate predictions about other viruses, including HIV and influenza.
Why does this matter?
The information from this AI tool will help scientists develop more effective, future-proof vaccines and therapies. If only this AI boom happened a little earlier, it could have prevented the Covid-19 pandemic. But I guess no more pandemics, thanks to AI?
Enjoying the daily updates?
Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.
When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.
Knowledge Nugget: How I think about LLM prompt engineering
To get information out of an LLM, you have to prompt it. If an LLM is like a database of millions of vector programs, then a prompt is like a search query in that database. Part of your prompt can be interpreted as a “program key”, the index of the program you want to retrieve, and part can be interpreted as a program input.
Consider the following example prompt:
Now, keep in mind that the LLM-as-program-database analogy is only a mental model– there are other models you can use.
suggests a new useful one– prompt engineering as a program search process– in a unique take in this article. The article also draws a parallel with Word2Vec's word embeddings to highlight the underlying principles shared by Word2Vec and LLMs.Why does this matter?
The article highlights the need to experiment with prompts to achieve desired results from LLMs. It also provides insights into the mechanics of LLMs, their capabilities, and the role of prompt engineering in leveraging their power while cautioning against attributing human-like understanding to these models.
What Else Is Happening❗
💰OpenAI’s revenue is on pace to reach $1.3 billion a year.
This remark by CEO Sam Altman indicates the company is generating more than $100 million per month, up 30% from this summer, when it generated revenue at a $1 billion-a-year pace. The revenue is largely from subscriptions to ChatGPT. (Link)
📜Google to defend generative AI users from copyright claims.
It has pledged to protect users of its generative AI systems on Google Cloud and Workspace platforms against accusations of intellectual property violations. The move aligns it with tech giants like Microsoft and Adobe. (Link)
🖌️Microsoft's Paint Cocreator begins rolling out to Windows Insiders in the Beta Channel.
It is a new AI-powered experience powered by DALL-E that helps you create amazing artwork in Paint by describing in a few words what you’d like to create. (Link)
🛡️Lakera launches to protect large language models from malicious prompts.
With $10M in backing, the Swiss startup launches API to protect companies from prompt injections and more. (Link)
🤖Llama models have been downloaded 32M+ times on 🤗 in the last 30 days.
The leaderboards at Hugging Face show that open source is more vibrant than ever, with downloads and model submissions rocketing to record highs. (Link)
That's all for now!
Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you tomorrow. 😊