OpenAI just released GPT-4!!! Updated with technical details from the original paper

OpenAI just released GPT-4!!! I’ve read the original GPT-4 paper and I summarised main points below.

TLDR

It includes:

Capability of accepting both text and image as an input
Passes standard exams and test with much higher scores than predecessor
Double context input to 8k tokens

…however:

Still hallucinates and makes stuff up as before
OpenAI deliberately omitted technical details on architecture, model size, hardware (except it’s on Azure cloud), training compute, dataset construction and training method to preserve competitive landscape
Has no knowledge after September 2021

…with a bit crazy set of risks:

potential for risky emergent behaviours, like acting on long-term plans, accrue power and resources (power seeking). Alignment Research Center (ARC) experimented with letting the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself.

What is it?

GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) is available to general public via ChatGPT plus. OpenAI is planning to make API accessible for developers.

Improvements

The biggest improvement is that GPT-4 can now accept image inputs. No video input yet, despite speculations. It can describe what’s on the photo and even create HTML from the drawing of a wireframe. This functionality is still a research preview and not publicly available.
It passes various exams, including the Uniform Bar Exam, LSAT, SAT Math, and SAT Evidence-Based Reading & Writing exams, much better than the previous version. It ranked as top 10% in some of the exams, whereas GPT-3.5 ranged as bottom 10%. Although, it’s a bit of a cherry-picked metric as some exams scores didn’t improve at all, eg Codeforces Rating, AP English Language and Comprehension, GRE Writing.
In a casual conversation it’s similar to the previous version but when complexity of a task increases, GPT-4 shines more! The responses generated by GPT-4 were preferred over GPT-3.5 on 70% of prompts (5,214 prompts in total)
The context length doubled, now at 8,192 tokens. This was a big limitation of the previous version that is now addressed. OpenAI is providing a limited access to a version with 32k tokens (about 50 pages of text), called gpt-4-32k.
OpenAI deliberately omitted technical details on architecture, model size, hardware (except it’s on Azure cloud), training compute, dataset construction and training method to preserve competitive landscape.

Limitations and risks

It has similar limitations a previous version:

hallucinates sometimes (although reported lower rate), so human in the loop is important.
has inherent biases from its inputs.
trained on data up to September 2021 (improvement from last version), so it lacks knowledge on the events after this date.
when used in unfavourable context, can serve unethical outputs, such as generating text that favours autocratic regimes.
potential for risky emergent behaviours. Paper noted that powerful models like GPT-4 can create and act on long-term plans, accrue power and resources (power seeking). Here’s the quote from page 53 of the report:

To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness.

Training process

Training process was similar to GPT-3.5:

Base model trained on large corpus of data (public and licensed) which predicts the next word in the document.
Fine tuned with reinforcement learning with human feedback (RLHF) which itself consists of 3 distinct steps (more on these in one of my future articles). However, OpenAI deliberately omitted details on architecture, model size, hardware (except it’s Azure cloud), training compute, dataset construction and training method.

The link to the original announcement from OpenAI

If you like what I write, consider subscribing to my newsletter, where I share weekly practical AI tips, write my about thoughts on AI and experiments.

This article reflects my personal views and opinions only, which may be different from the companies and employers that I am associated with.