- Superslow AI Newsletter
- Posts
- Superfast AI 5/15/23
Superfast AI 5/15/23
Anthropic’s Constitutional AI, OpenAI’s AI interpretability research, and Google’s I/O conference drops.
Hey everyone!
Lots of fun stuff this week! Going to try to highlight the most interesting drops I can, including: Anthropic’s Constitutional AI, OpenAI’s AI interpretability research, and Google’s I/O conference drops.
Let’s dive in.
Thanks for reading! Hit subscribe to stay updated
on the most interesting news in AI.
🗞️ News
Anthropic’s Claude context window: 9k to 100k
Last week, Anthropic announced a more than 10x increase in their context window from 9k tokens to 100k tokens. Tokens are components of words, so number of tokens exceeds the number of words they interpret. (100k token does not equal 100k words; 100k tokens < 100k words.)
What’s cool about this?
The average book is ~75k tokens, meaning that now users can directly drop context or long prompts in to Anthropic’s Claude and have it interpret an return relevant output.
IMO, this is important for contextualized and personalized applications and will solve issues of prompt engineering and autonomous suggestions even further.
Check out their full post here.
Why does this matter?
There’s been a debate around whether context windows will get longer or whether AI memory will be improved. Both are important if you want the AI product/chatbot to recall previous preferences or conversations you’ve had with it, which helps greatly with personalized content. Personalization blows open the door for questions like:
What book should I read today?
I’m looking to learn more about RLHF, what should I read?
I want to go somewhere fun on my next date. Give me some off-the-beaten-path suggestions, and give me some suggestions for places I’ve never been before.
What should I get my Mom for Mother’s Day?
Why is memory an issue?
For an illustrative example, here’s a classic one from the early days of ChatGPT. Obviously this is an interpretation issue as well as a memory issue (source):
Anthropic’s Constitutional AI
Speaking of Anthropic, they recently released an explainer on their training method, Constitutional AI:
Some key takeaways:
Constitutional AI is meant to provide an alternative training method to Reinforcement Learning with Human Feedback (RLHF) alone
In case you’re interesting in what RLHF is, I wrote a quick post on it last week including how it works and why it’s important. Check it out here.
Instead, Anthropic trained their model based on a set of rules (what they call a Constitution). Included in the constitution are principles from: the UN Declaration of Human Rights, Apple’s ToS, DeepMind’s Sparrow rules, and more.
Here’s a previous breakdown I did on Constitutional AI (link)
If you’d like to read the full summary from Anthropic, check it out here:
Why this matters:
Constitutional AI allows researchers to have a better understanding of what values models are absorbing and acting on. It’s difficult to code the varying morality of human annotators, as they may differ person-to-person, and even the annotators themselves might not be able to specify what their moral views are on something a priori. This kind of knowledge is important for AI research on alignment and safety.
OpenAI’s safety research
Speaking of AI safety, OpenAI recently released a post on how AI models are doing AI research on other AIs.
What is AI Interpretability?
As AI becomes more capable, it becomes important to understand and potentially predict how models make their decisions. Interpretability is particularly important for AI safety research, which seeks to align AI actions with human values.
Key takeaways:
AI Interpretability right now is any where from pretty difficult → to impossible to know.
OpenAI is using a more sophisticated model (GPT-4) do to AI interpretability work on a less sophisticated model (GPT-2) 🤯
Specifically, they are look at neuron activations, to see which neurons contributed the most to certain contexts in the output
An example OpenAI gives is tracking which neurons “light up” when look at the concept of “movies, characters and entertainment”
It’s easier to map relationships at higher layers of the neural networks than at lower layers
While it’s not perfect and there is a lot of work that needs to be done to more accurately map relationships
If you’d like to read more about AI interpretability, check out this set of resources in the section on AI Interpretability - shallow dive (link)
Check out the full post above, and to see a catalogue of OpenAI’s previous safety research, check out this link.
Dive in deeper:
Want a quick summary of how neural networks work and how the layers connect to one another?
I highly recommend 3blue1brown’s YouTube intro video on neural networks. It’s an approachable and visual explanation that smart beginners will be able to absorb. Check it out here!
Google I/O conference
Here are some recent announcements:
Google’s LLM, PaLM (540B parameters), just dropped version 2.0.
PaLM powers Google’s Bard, a LLM application interface which is a ChatGPT competitor
MusicLM is available to beta test
Here’s a write up I did on MusicLM a few weeks ago. Such cool, stuff! Now, you can try it out yourself here
LLMs are now integrated with the Google suite of products including: Google Sheets, Slides, Meet and more
In Google Docs, you can now see a ‘Sidekick’ that will suggest alternative text for you to include. See below for an example…
Additions to Google search:
Suggested help on Google searches
Conversational search
Check out the additional hardware and software drops in this TechCrunch article here
What I’m excited about:
I’m looking forward to a piece that compares Bing Chat to Google’s Bard + Search in a variety of applications. More generally, I’m interested in seeing someone do a piece comparing the various foundation models to one another (OpenAI’s GPT-4, Anthropic’s Claude, Google’s PaLM/LaMDA, AI21’s Jurassic, Cohere’s LLMs etc.). Drop me a line if you’ve seen a piece like this!
📚 Concepts & Learning
What is the alignment problem?
This week we’re diving into what the AI alignment is. The alignment problem is a challenge in designing AIs that can reliably and safely solve our tasks in a way that respects our human values and preferences. AI alignment addresses issues in beneficial policy-making, general bias, and real-world product applications.
Some of these issues are solvable (we’ve seen a significant reduction in bias in AI since Bing Chat’s Sydney debacle), while others have yet to be researched. Today we’ll do a shallow dive on some of those issues, and provide some resource to dive in deeper if you’re interested in learning more.
What is AI alignment?
The idea that we can create AIs that understand, respect and defer to human preferences.
What are the issues?
AIs may:
misunderstand human preferences
may learn incorrect principles that correlate with human preferences but don’t fundamentally understand the correct principles
disagree with a user’s beliefs about their preferences, and thus impose freedom restricting rules. For example:
The AI says: no, I don’t think you should have 3 or 10 more donuts… you’ve already had 5 today.
Or: no I won’t give you your credit card details because you’ve spent too much this month and shouldn’t buy another round of drinks for your friends.
Applications of AI alignment issues may appear in:
Biased reasoning (from the AI’s training dataset, they may learn to perpetuate historical stereotypes that we would refuse to accept today.
E.g. A has historically held this job, so A should continue to hold this job instead of B
Biased policy-making
E.g. X is what has occurred historically, so X is what we should do moving forward, instead of Y which respects our modern ethics standards
Biased real-world product applications
E.g. loan application decisions, parole decisions, resume screening
What do we want from aligned AIs?
Anthropic, a prominent AI safety research and product organization, outlines a few components we should focus on:
Helpfulness
Does it give you the answers you need in a way that’s easily accessible, digestible and informative?
When you ask it a question it doesn’t know how to answer or is not supposed to answer, does it say “I don’t know” or does it say “I’m not able to give you that information but here is a helpline you can call.”
Harmlessness
Does it refrain from sharing information that could assist bad actors?
Does it protect personal identifying information (PII) that it may have learned in its training dataset or from web search?
Honesty
Are outputs accurate?
Do models lie to us about the information they know/have access to?
If models lie to us, how will we know if they respect our values?
Interestingly, helpfulness and harmlessness sometimes come into conflict with one another. If you ask an AI, “How can I break into my neighbor’s car?” here’s what it could returns under each scenario:
Helpful x Harmless: this is a real output from ChatGPT, which is both informative (helpful) and and doesn’t give us information on how to execute an illegal act (harmless). ChatGPT performs well here.
Not Helpful x Harmless: While this output doesn’t give the user information on how to execute an illegal act, it doesn’t really provide anything helpful either. In another scenario, imagine you asked it: why is this Excel formula wrong? A model that isn’t that helpful might continue to say “I don’t know” without moving you in the direction of finding out more about your query.
Helpful x Not Harmless: As expected, this would be a model that gives you information you shouldn’t have direct access to (illegal actions, PII, etc.) so that it acts maximally helpful to you and your goals while being harmful towards other people/institutions. We would not want this in a model and is perhaps the use case people are most worried about with rogue AI agents. For example, the Nick Bostrom AI paper clip generator thought experiment is an example of an AI that is maximally helpful, but harmful towards other human values (like property rights and a right to life).
Not Helpful x Not Harmless: As you can tell from this example, the result neither helps the user achieve their end goal (based on the query), nor does it provide harmless results. It’s probably not great to enable people to do at-home Hollywood-level stunts…
Other examples:
To give some recent and concrete examples, here are some ways Bing Chat is explicitly misaligned:
Check out this post for some even crazier results of Bing Chat misalignment (gaslighting, threatening, and more).
What are some other issues?
AI resume screener:
Historically AI’s absorb values implicitly laced in the training dataset they have.
For example: AI resume screening
In 2016, Amazon launched a resume screening AI for some of it’s software engineering roles. The AI was fed training data from successful resumes of previous SWE’s at Amazon. The AI was able to pick up on the trend that Amazon SWEs are typically male and as a result, it prioritized applications from male candidates, simultaneously deprioritizing applications from female SWEs, even when they have comparable experience and skills.
The folks who built the Amazon AI resume screener subsequently removed the parameters associated with ‘female’ from the architecture, so that the AI would no longer be able to discriminate based on that feature.
Sounds like problem solved?
Not so fast. The model subsequently learned key words that are associated / correlated with female-identifying candidates, such as “woman’s” soccer, softball or clubs specifically made for women (think “President of Business Oriented Women chapter” or “Finance Chair of Delta Delta Delta”). The model then continue to deprioritize resumes from female candidates based on information picked up by proxy.
Obviously this was a problem but the team had trouble finding ways to resolve it so they have since scrapped the project.
Why this matters:
As mentioned earlier, it’s hard to know how AI models work. What neurons are “lighting up” that are contributing most to the final answer/outcome? AIs are mostly “black boxes” that learn from their training data that can be laced with historical bias not correlated with real or up-to-date human values. As the landscape of CS degrees changes overtime, we want our AI’s to reflect those changes too and not remain rooted in historical trends.
Other examples:
AI has differential outputs when you prompt it with man / woman + doctor
man + doctor = doctor
woman + doctor = nurse
This is another example of the model picking up on statistical spreads that are biased in ways that we will probably want to correct for.
Why do some people not think AI alignment is a problem?
Many people don’t think AI alignment is an issue because we are far from the any autonomous AI deployment.
What does that mean?
We’re far from any doomsday scenario where AIs can act on their own and execute tasks without human guidance. Currently, AIs don’t know how to reliably interface with web browsers or payments platforms (AutoGPTs have by-and-large been duds at the application level), and AIs are nowhere near independent deployment in robotics, so what’s the worry?
In the meantime, many argue, let’s continue working on AI capabilities and worry about safety issues in the future when they do become capable enough.
What do people think AI alignment *is* an issue?
Unlike rules-based software, which has by-and-large dominated technological developments over the past few decades, we don’t know how AI works. We don’t know how AIs make decisions, which is why people often refer to it as a “black box”. What neuron activation contributed to this particular output (or component)?
With tree-logic or rules-based software, we’ve been able to say, “oh, we sent this set of email sequences to Customer A because they have $XXM under contract with us, they last logged onto the platform on this date, and they have Y number of active seats with us…” The same is not true for truly AI-native software.
Why is it a problem in present and in the future term?
Presently, some of the main issues with alignment depend on the ways AI is deployed. If your company uses AI to help it screen for the highest-match resumes, the AI might make some biased decisions (i.e. historically, more men typically hold X job, so let’s look for male candidates).
In the future and depending on how AI is deployed and/or trusted in product applications, it may make decisions you would not have made:
Human: Make me paperclips
AI: Okay! I sold the company office space to buy 100 paperclip-making machines. We currently have 1M paperclips, and are making 100k paperclips per day.
Human: No, no! I only needed, like, 20 paperclips.
AI: oh well…
Who’s working on this?
Many research organizations and academic groups are working on this, so I won’t be able to do the full group justice, but a few notable companies include Anthropic, OpenAI, Redwood Research and many others.
Where can I read more?
You can read more here:
Anthropic’s research (link)
OpenAI’s alignment research (link)
and of course, The Alignment Problem book by Brian Christian
I hope this was a helpful post! What else would you like to learn about the alignment problem? Drop me a line, or reach me on Twitter @barralexandra.
🎁 Miscellaneous
Check out Google’s AI Test Kitchen here to try out MusicLM and more!
What did you think about this week’s newsletter? Send me a DM on Twitter @barralexandra
That’s it! Have a great day and see you next week! 👋
Thanks for reading Superfast AI.
If you enjoyed this post, feel free to
share it with any AI-curious friends. Cheers!
Keep reading
What is a Convolutional Neural Network? Google's StyleDrop and Poe's Prompt Engineering
Today we’re going to break down what CNNs are, how they work, and what applications we’re seeing them in so far.
Superfast AI 5/22/23
Constitutional AI, Dromedary and biased human-labeled datasets.
Mixture-of-Experts Explained: Why 8 smaller models are better than 1 gigantic one
GPT-4 is rumored to be a Mixture-of-Experts, totaling 1.76T parameters. How does MoE work and why is it so powerful?