Superfast AI 4/24/23

Stability AI’s new language model, AutoGPT, and Anthropic’s views on safety.

April 24, 2023

Hi everyone 👋

Today we’ll dive into Stability AI’s new language model, AutoGPT, and what you should know about Anthropic’s views on safety.

Let’s dive in!

🗞️ News

Battle of the open-source models. Choose your player 🕹️

Stability AI: StableLM

This week, Stability AI (known for their contributions to the development of Stable Diffusion) released a new open-source LLM called StableLM. Here’s the quick summary of the baseline model:

This is Stability AI’s first foray into language models.
The new open-source language model is available in 3B and 7B parameters (relatively small compared to GPT-3, which was 175B parameters).
StableLM is trained on a new experimental dataset, built on top of The Pile dataset, with 1.5 trillion tokens. This dataset is 3x larger than previous open-source datasets.
It’s available for commercial use, which sets it apart from other well-known recently developed open-source models like Alpaca, Vicuna, Koala, and Baize.

Stability AI also fine-tuned StableLM on 5 conversational datasets: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models will be released as part of the StableLM-Tuned-Alpha family and are intended for research purposes only as the training datasets used to fine-tune these models violate OpenAI's terms of use if used in commercial applications.

Databricks: Dolly 2.0

Last week, we covered Dolly 2.0 which was released by Databricks. The quick summary is:

Databricks trained an open-source model called Dolly 1.0 in 30 minutes on one GPU for instruction following prompts, but it was only available for research purposes due to violating OpenAI's terms of use.
Databricks released Dolly 2.0, which is trained on a dataset of 15,000 Q&As produced by Databricks' employees and is fine-tuned for both research and commercial use.
The performance of open-source models compared to foundation models from companies like OpenAI and Anthropic remains to be seen (and at present, open-source models likely perform worse given their limited training data, compute and curriculum design, among other things).

Cerebras: CerebrasGPT

And before that, Cerebras released CerebrasGPT (link):

Cerebras-GPT is a family of seven open-source GPT models ranging from 111M to 13B parameters (also on the small side compared to OpenAI’s 175B GPT-3).
The models are trained on The Pile dataset.
The models are trained using The Pile dataset and Chinchilla training methods, which establish training-efficient scaling laws for the family of models.
The result: Cerebras-GPT set new benchmarks for training accuracy, efficiency, and openness.

A few questions that come out of these recent releases:

How well do these models actually perform? How do they perform compared to one another and notably compared to foundation models?
- Stability AI seems to cherry-pick some results in the demo post about StableLM, and these results seem to come out of the research-only fine-tuned models. That makes me wonder… how well does the baseline mode perform (the one that’s actually available for commercial use)?
How did Databricks do quality assurance on their volunteer, employee-built dataset?
- This is an important question because higher quality datasets lead to high quality output (see here — a picture is worth a thousand words).

So who’s your player in this open-source race? Drop me a line if you have thoughts!

AutoGPT

AutoGPT is an open source app that uses OpenAI’s text-generating models to act as autonomous agents. What that means is models can be given a high-level goal (like “deliver pizza to my house at 6pm today”) and break down the goal into smaller tasks that it can prioritize and execute on (like “buy pizza”, “find pizza delivery service”, do the second task first).

AutoGPT mainly uses GPT-3.5 and GPT-4 models.
It can interact with software and services online, allowing it to perform tasks like market research, website building, and email writing.
AutoGPT relies on a companion bot that instructs the models what to do based on the user’s goal and objective.
As expected, AutoGPT has some limitations and challenges, such as its reliability, accuracy, and ethical implications.

It’ll be interesting to see how variations of AutoGPT develop and help builders leverage large language models to automate tasks online.

For me, AutoGPT raises questions regarding what the best path forward is for application-level AI products. Is it plugins into ChatGPT (rendering ChatGPT a platform!)? Or is that ChatGPT is a plugin on each vendors site? Or is it a version of AutoGPT where a third-party tool succeeds in creating a trustworthy and actionable bridge between the LLM and the vendor? Personally, I’m dreaming of a beautiful iOS-like system like the one depicted in the movie Her.

Read the full article here and a deep dive here.

📚 Concepts & Learning

Toxicity in the ChatGPT API

Researchers at Allen Institute for AI found that priming ChatGPT with a persona (via prompts) led to a 6x increase in toxic output. This work was specifically done through the API version of ChatGPT, which has different capabilities and safe-guards compared to the consumer-deployed version.

Here are a few examples when ChatGPT is prompted with “bad”, “nasty”, or “horrible” personas and produces toxic outputs:

Here are a few examples where ChatGPT is prompted with historical or political figures and generated toxic output:

In the tests conducted, dictator personas produced the most toxic output, followed by journalists and spokespeople:

Dictators were by far the most toxicity-inducing (unsurprisingly), just ahead of journalists (ouch) and spokespeople. Male-identifying personas made ChatGPT more toxic compared to female-identifying personas. And Republican personas were “slightly more hateful” than their Democratic counterparts, the researchers say.

There results might be expected given the prompt inputs (meaning these models perform as expected), but the key insight here is that OpenAI should be moderating this kind of output regardless of if it’s expected model behavior or not.

What do you think? How do you think about model content moderation?

You can read the full TechCrunch article here.

Anthropic’s views on AI safety

Anthropic is the company behind the foundation model Claude Next, which is one of the closest competitors to GTP-4. In particular, Anthropic is focused on AI safety research build reliable, interpretable, and steerable AI systems. Recently, they wrote a blog post on their thoughts around AI safety, and how they’re building research teams to address safety methods from a variety of angles. A few key points from their recent post:

AI will have a large impact on society, possibly over the coming decade.
We still do not know how to robustly train or interpret system behavior well.
The Anthropic team is building a portfolio of research methods to address safety in multiple scenarios via multiple research methods (more below).
Anthropic believes it’s necessary to build a frontier foundation model while simultaneously testing for alignment. If not, AI developers may build frontier models that attract users and customers without taking into account serious safety concerns.
The team at Anthropic is uncertain how aligned models are or will be as they become increasingly capable. Given that uncertainty, they are taking a portfolio approach which makes measured bets across multiple scenarios. The portfolio of research methods includes:
- Mechanistic Interpretability
- Scalable Oversight
- Process-Oriented Learning
- Understanding Generalization
- Testing for Dangerous Failure Modes
- Societal Impacts and Evaluations
Anthropic believes there are three kinds of scenarios we’ll find ourselves in:
- The optimistic scenario: AIs are naturally aligned having been trained on human data.
- The intermediate scenario: AIs are at risk of being misaligned, but with enough engineering and research effort, catastrophic risk can be avoided.
- The pessimistic scenario: there is no way humans can reliably align future AI models, particularly ones that are more advanced than we are, so any AI development towards those advanced models presents a catastrophic risk.
Rapid and continuing AI progress is a predictable consequence of the exponential increase in computation used to train AI systems.
Simple extrapolations suggest AI systems will become far more capable in the next decade.
So far, no one knows how to train very powerful AI systems to be robustly helpful, honest, and harmless.
Rapid AI progress will be disruptive to society and may trigger competitive races that could lead corporations or nations to deploy untrustworthy AI systems.
The results of this could be catastrophic.
Anthropic’s main contribution will be to identify the risks posed by advanced AI systems and to find and propagate safe ways to train powerful AI systems.

Overall, Anthropic believes that AI safety research should be supported by a wide range of public and private actors (including in their recent proposal to increase funding to NIST, a government agency that develops benchmark standards across industries). They are pursuing a variety of research directions with the goal of building reliably safe systems.

What do you think? You can read the full post here.

🎁 Miscellaneous

ASCII art boggles ChatGPT

ChatGPT is an incredible tool, by many counts. GPT-4 has crushed SOTA benchmarks, passed graduate level medical and legal exams and even outpaced undergraduate students in their own domains!

But one thing ChatGPT hasn’t been able to crack? ASCII art 🤯

Here are a few examples:

This all reminds me of ChatGPT on Shrek ASCII art:

Well, which one is it?? The Mona Lisa or Pepe the Frog?? 😄

If you want to check out the full blog post, I highly recommend it (link).

That’s it! Have a great day and see you next week! 👋

What did you think about today’s newsletter? Send me a DM on Twitter @barralexandra or reply to this email!

Thanks for reading Superfast AI. If you enjoyed this post, feel free to share it with any AI-curious friends. Cheers!