Gradient AI Releases Llama-3 8B With 1M Context

Plus: Mysterious “gpt2-chatbot” AI model bemuses experts, GitHub’s Copilot Workspace turns ideas into AI-powered software.

Apr 30, 2024

Hello Engineering Leaders and AI Enthusiasts!

Welcome to the 264th edition of The AI Edge newsletter. This edition features “Gradient AI Releases Llama-3 8B With 1M Context.”

And a huge shoutout to our amazing readers. We appreciate you😊

In today’s edition:

🚀 Gradient AI releases Llama-3 8B with 1M context
🤔 Mysterious “gpt2-chatbot” AI model bemuses experts
💻 GitHub’s Copilot Workspace turns ideas into AI-powered software
📚 Knowledge Nugget: Llama 3: Scaling open LLMs to AGI by
Nathan Lambert

Let’s go!

Gradient AI releases Llama-3 8B with 1M context

Gradient AI has released a new Llama-3 8B language model version called Llama-3-8B-Instruct-Gradient-1048k. This model's key feature is its ability to handle extremely long context lengths up to 1 million tokens.

To extend the context window to 1 million tokens, Gradient AI used techniques like NTK-aware initialization of positional encodings, progressive training on increasing context lengths similar to prior work on long context modeling, and optimizations to train on huge GPU clusters efficiently. The model was trained on 1.4 billion tokens, a tiny fraction of Llama-3's original pretraining data.

Why does it matter?

The 1M context window allows the Llama-3 8B model to process and generate text based on much larger inputs, like entire books or long documents. This could enable new applications in summarizing lengthy materials, answering questions that require referencing an extensive context and analyzing or writing on topics that require considering a large amount of background information.

Source

Mysterious “gpt2-chatbot” AI model bemuses experts

A mysterious new AI model called “gpt2-chatbot” is going viral. It was released without official documentation, and there is speculation that it could be OpenAI's next model.

gpt2-chatbot shows incredible reasoning skills. It also gets difficult AI questions right with a more human-like tone.

On a math test, gpt2-chatbot solved an International Math Olympiad (IMO) problem in one try. This does not apply to all IMO problems, but it is still insanely impressive.

Also, many AI experts discuss the gpt2-chatbot's better coding skills than the newest version, GPT-4 or Claude Opus. Without official documentation, we still don’t know who released it and for what purpose.

However, there are a couple of speculations going around in the industry that gpt2-chatbot is:

It's secretly GPT-5 released early OpenAI can benchmark it
It's OpenAI's GPT-2 from 2019 finetuned with modern assistant datasets

You can try out gpt2-chatbot for free by visiting https://chat.lmsys.org direct chat. Unfortunately, with so many people trying it right now, there are slow response times and a maximum of 8 turns per conversation.

Why does it matter?

If the "gpt2-chatbot" model truly represents a major advancement in language generation and conversational abilities, it could accelerate the development of more advanced virtual assistants, chatbots, and other natural language processing applications. However, if the model's capabilities are overstated or have significant limitations, it may lead to disappointment and a temporary setback in the progress of conversational AI.

Source

GitHub’s Copilot Workspace turns ideas into AI-powered software

GitHub is releasing a new AI-powered developer environment called Copilot Workspace. It allows developers to turn an idea into software code using natural language and provides AI assistance throughout the development process—planning the steps, writing the actual code, testing, debugging, etc.

The developer just needs to describe what they want in plain English, and Copilot Workspace will generate a step-by-step plan and the code itself. By automating repetitive tasks and providing step-by-step plans, Copilot Workspace aims to reduce developers' cognitive strain and enable them to focus more on creativity and problem-solving. This new Copilot-native developer environment is designed for any device, making it accessible to developers anywhere.

Why does it matter?

Copilot Workspace could significantly lower the barrier to entry for those who can create software by automating much of the coding work. This could potentially enable a future with 1 billion developers on GitHub building software simply by describing what they want. Copilot Workspace could also make software development more accessible to non-technical people.

Source

Enjoying the daily updates?

Refer your pals to subscribe to our daily newsletter and get exclusive access to 400+ game-changing AI tools.

Refer a friend

When you use the referral link above or the “Share” button on any post, you'll get the credit for any new subscribers. All you need to do is send the link via text or email or share it on social media with friends.

Knowledge Nugget: Llama 3: Scaling open LLMs to AGI

In this article,

Nathan Lambert

argues that Meta's release of the Llama 3 language model, with versions ranging from 8 billion to an upcoming 400 billion parameters, shows that the open LLM ecosystem can scale its models to compete with the largest proprietary models from companies like Google and OpenAI. This challenges the common criticism that open LLMs cannot keep up with the scaling capabilities of tech giants.

The author believes Llama reduces open LLMs' barriers to AGI-level capabilities. Meta's focus on scaling up open models through massive datasets and model sizes makes it very costly for proprietary players to try to outpace the open ecosystem through pure scaling alone. This brings the prospect of open-source AGI closer to reality, though the author cautions against over-hyping the current technical progress.

Why does it matter?

Open LLMs will continue to improve and may soon challenge the capabilities of the biggest closed and proprietary models. While there are still some limitations around the licensing and openness of the Llama models, this release shows that open LLMs are closing the performance gap with the tech giants, which could lead to more competition and innovation in the AI industry. It also gives the open-source AI community a real path to developing AGI.

Source

What Else Is Happening❗

📰 OpenAI collaborates with Financial Times to use its content in ChatGPT

The Financial Times has signed a deal with OpenAI to license its content for developing AI models and allow ChatGPT to answer queries with summaries attributable to the newspaper. It will help OpenAI enhance the ChatGPT chatbot with archived content from the FT, and the firms will work together to develop new AI products and features for FT readers. (Link)

🚀 Cohere's Command R models family is accessible through Amazon Bedrock

Amazon Bedrock developers can access Cohere’s Command R and Command R+ LLMs via APIs. This addition gives enterprise customers more LLM options, joining Claude 3 Sonnet, Haiku, Opus, Mistral 7B, Mixtral 8x7B, and Mistral Large. The Command R and R+ models are highly scalable, RAG-optimized, and multilingual across 10 languages. (Link)

📊 NIST launches a new platform for generative AI evaluation

NIST announced the launch of NIST GenAI, a new program spearheaded to assess generative AI technologies, including text- and image-generating AI. NIST GenAI will release benchmarks, help create “content authenticity” detection (i.e., deepfake-checking) systems, and encourage software development to spot the source of fake or misleading AI-generated information. (Link)

🧬 ‘ChatGPT for CRISPR’ creates new genome-editing tools

ChatGPT has a specialized version called "GenomeGuide for CRISPR Research," focusing on genetic engineering. It aims to assist researchers in designing new, more versatile gene-editing tools compared to the normal ones. It is also an AI assistant dedicated to genetic discoveries and provides R&D support in genetic engineering and CRISPR technology. (Link)

💰 Microsoft to invest $1.7 billion in Indonesia’s AI and cloud infrastructure

Microsoft will invest $1.7 billion over the next 4 years in cloud and AI infrastructure in Indonesia, as well as AI skilling opportunities for 840,000 people and support for the nation’s growing developer community. These initiatives aim to achieve the Indonesian government’s Golden Indonesia 2045 Vision to transform the nation into a global economic powerhouse. (Link))

New to the newsletter?

The AI Edge keeps engineering leaders & AI enthusiasts like you on the cutting edge of AI. From machine learning to ChatGPT to generative AI and large language models, we break down the latest AI developments and how you can apply them in your work.

Thanks for reading, and see you tomorrow. 😊

The AI Edge

Gradient AI Releases Llama-3 8B With 1M Context

Plus: Mysterious “gpt2-chatbot” AI model bemuses experts, GitHub’s Copilot Workspace turns ideas into AI-powered software.

Gradient AI releases Llama-3 8B with 1M context

Mysterious “gpt2-chatbot” AI model bemuses experts

GitHub’s Copilot Workspace turns ideas into AI-powered software

Enjoying the daily updates?

Knowledge Nugget: Llama 3: Scaling open LLMs to AGI

What Else Is Happening❗

New to the newsletter?

Discussion about this post