GPT-5, GPT-6 and 7 trillion dollars
OpenAI’s current state:
There are four important state of affairs I think OpenAI are dealing with
GPT-5 is being worked on and will fundamentally be a different product than 4. It will have agentic abilities and be smart enough to learn on the fly with users. It will be capable enough to really speed up the productivity of a lot of professions’ use cases.
They’re trying to figure out how to release this in the right product form considering how different it is and they want it out as soon as they can.
I’m guessing it will be out by July - August. The sooner it’s released the more experience they can get servicing a product that’s novel. It also gives them more lead time on the competition who are farrr from having a model this capable.
The election concerns are bullshit. What I mean is some people think they’re going to wait until AFTER the election so they don’t get accused of interfering with the election, but that doesn’t make any sense. GPT-3.5 and GPT-4 already exist. GPT-5 is just another addition to the ecosystem. It’s a problem they’re already stuck with.
GPT-5 will learn about and cater to individual users. It will get better at the kind of work they do because it will store the relevant contextual data in RAG. I’m guessing they’re trying to make this as good as possible and find a good feedback mechanism to make it better over time.
GPT-5 is already close to AGI per what most people consider AGI.
It’s not that it’s unmistakably AGI, but it’s really not far off. It can learn how to do pretty much anything within professional roles (ignoring robotics tasks) if you were to sit and teach it how to. It can use RAG as a hacky long term memory. Its raw intelligence is quite a bit higher than GPT-4, and that wasn’t low in the first place.
It can self-correct a decent amount of its own reasoning errors and build larger programs than we’ve seen an AI system make yet. It will feel like an actual entity working on projects because of how much it can manage on its own. It will also pick up on inefficient work it made in past work and go back and fix it (say e.g if it finds an efficiency implementation it made in a RAG it had saved).
The biggest missing pieces are that it doesn’t have a lot of data stemming from real world experience and the things you pick up from that. Fortunately for them, they’ll get data like that from users who’re now using GPT-5 for a much wider range of work tasks, they’ll continue having experts curate and label data in their fields and they’ll keep enhancing and building up the models skills with synthetic data.
The point is it’s all conceptually close to AGI. It’s just an engineering problem that needs some time. They need more data, probably to scale the model up some, and to keep laying more bricks until it’s undeniable.
GPT-6 or whatever architecture it uses is probably AGI or so close that it would look idiotic to deny we’re on the cusp of transformative AI.
My guess is there are two important things at play that makes it vastly more capable than what scraped data could: 1) agentic environments 2) pure synthetic data generation
First, a mixture of agents playing around in little environments where they can: 1) learn and figure out how it works 2) interact with each other 3) attempt to create synthetic data to get better at whatever it is they’re doing 4) lots of synthetic data created with intent of meta-improving how much a system can teach itself over time. Think many smart sized up and down GPT-5s with other even weirder models that were trained in totally different ways generating and interacting together.
The environment containers get more or less complex as models learn to solve more challenging problems by finding the best ways to do so. The environments they learn in or the tasks they’re faced with are themselves partially generated by a model that gets better at making them over time – it all builds on itself. Both the model and their environments serve as training goals. It’s fundamentally different when you consider it’s a mix of gradient descent and models attempting to actually learn and compete against each other to make the best reasoning / learning data. The point is all of the data that gets generated in the end to use in a behemoth model (that maybe itself uses other models as aids).
I also think GPT-4 got an early infusion of synthetic data that partially explains how good it was for the time, but GPT-5 is where they’re going to push on synthetic data harder and GPT-6 will be a product of at least a full year of data generated specifically for it.
Second, there will be lots of simpler synthetic data used like AlphaGeometry but for other types of math and other scientific fields (chemistry, biology, social sciences, etc). It’s all about them finding methods that work and push the needle across all domains.
GPT-6 will be a transformative system. I don’t really see any way around it. It won’t be solo solving brand new physics problems or inventing lots of new technologies overnight (but it will SPEED UP that process some) but will be able to do pretty much any cognitive work people do. GPT-5 was close, but didn’t cross the Rubicon quite yet. GPT-6 is human level in many ways that matter.
The biggest thing is that synthetic data is how OpenAI wins and since that comes from a mix of focused talent + compute they know they need to secure as much compute as possible and fast.
More compute is also necessary to service models to tens, hundreds of millions and eventually billions of people.
The 7 trillion dollar thing is Sam’s long term plan to build enough compute to service the entire world. That is for some reason his specific estimate or not far off from it.
I personally don’t see how they lose this race unless something dramatic happens in the next two to three years.