He has this introduction here where he talks about how the talk of the town has shifted from 10 billion compute clusters to 100 billion dollar compute clusters to even trillion dollar clusters and every six months another zero is added to the boardroom plans.


The AGI race has begun.


We are building machines that can think and reason and by 2025 to 2026 these machines will outpace college graduates and by the end of the decade they will be smarter than you or I and we will have super intelligence in the true sense of the word.


I'm going to say that again by the end of the decade okay we will have super intelligence in the truest sense of the word along with the way national security forces not seen in half a century will be unleashed and before long the project will be on.


These are some very fascinating predictions but just trust me once we get into some of the charts and some of the data that he's been analyzing I think it really does make sense and this is why this document is called situational awareness.


Just read this part before we get into everything he says before long the world will wake up but right now there are perhaps a few hundred people most of them in San Francisco and the AI labs that actually have situational awareness through whatever peculiar forces or fate I have found myself amongst them and this is why this document is really important because information like this we're really lucky that people could leave a company like OpenAI and then publish a piece of information which gives us the details on how super intelligence is likely to arise and when that system is likely to arise.


So this is the section one from GPT-4 to AGI counting the orders of magnitudes.


When you see OOMs, that's what it stands for.


He clearly states here his AGI prediction.


AGI by 2027 is strikingly plausible.


gpt2 to GPT-4 took us from preschooler to smart hall high school abilities in just four years.


If we trace the trend lines of compute algorithmic efficiencies and unhobbling of gains, we should expect another preschooler to high school a size qualitative jump by 2027.


This is where we get into our first very important chart because this shows us exactly where things may go.


He says, I make the following claim: it is strikingly plausible that by 2027 models will be able to do the work of an AI researcher engineer that doesn't require believing in sci-fi.


It just requires believing in straight lines on a graph.


What we can see here is a graph of the base scale up of effective compute counting gpt2 all the way up to GPT-4 and looking at the effective compute that we're going to continue to scale up.


One thing that is fascinating from here is that I think there is going to be an even steeper curve for this.


The reason I state that is because during the period from 2022 to 2023, there was something that I would like to call an awareness.


This period right here marks the birth of GPT-3 to GPT-4 and this put a real giant spectacle on the AI era.


GPT-3 and GPT-4 weren't just research products.


GPT-3 was, but GPT-4 and ChatGPT-3.5 were actual products that were available for the public.


Since then, we've seen an explosion in terms of how many people are now intrigued by AI and how many different companies are now piling billions of dollars and billions of resources into the technology into the compute clusters just so that they can capture all of the economic value that's going to be happening during this area.


Which is why I do believe that it wouldn't be surprising if during the period from 2024 to 2028, we do have a lot more growth than we've had in this period.


Which means that having an automated AI research engineer by 2027 to 2028 is not something that is far, far off.


Because if we're just looking at the straight lines and the effective compute, then this is definitely where we could get to.


The implications of this are quite stark.


Because if we can have an automated AI research engineer, that means that it wouldn't take that long to get to super intelligence after that.


Because if we can automate AI research, then or better off, we're able to effectively recursively self-improve, but just without that crazy loop that makes super intelligence explode.


Here's where he states one of the things that I think is really, really important to understand.


I stated this in a video before this document was released, but it's glad to see that someone else is ushering one of the same concerns that I originally thought of.


He stated that the next generation of models has been in the oven, leading some to proclaim the stagnation and that deep learning is hitting a wall.


But by counting the ordnance of magnitude, we get a peek at what we should actually expect.


In a video around three weeks ago, I clearly stated that look, things are slowing down externally, but things are not slowing down internally at all.


Just because some of the top AI labs may not have presented their most recent research, that doesn't mean the breakthroughs aren't being made every single month.


He states here that while the inference is simple, the implication is striking.


Another jump like that very well could take us to AGI, to models as smart as PhDs or experts that can work beside us as a co-worker.


Perhaps most importantly, if these AI systems could automate AI research itself, that would set intense feedback loops.


That's, of course, where we get AI researchers to make breakthroughs in AI research, then we apply those breakthroughs to the AI systems, they become smarter.


The loop continues from there.


Basically, recursive self-improvement but on a slower scale, and here he clearly states even now barely anyone is pricing this in, but the situational awareness on AI isn't actually that hard.


Once you step back and look at the trends, if you keep being surprised by AI capabilities, just start counting the orders of magnitudes.


Here's where we talk about the last four years so you can see he speaks about gpt2 to GPT-4.


gpt2 was essentially like a preschooler, while it can string together a few plausible sentences.


These are the gpt2 examples people found very impressive at the time, but yet it could barely count to five without getting tripped up.


We had GPT-3 which was in 2020, and this was as smart as an elementary schooler, and this was something that once again impressed people quite a lot.


This is where we get to GPT-4 in 2023, and this is where we get a smart high schooler.


While it can write some pretty sophisticated code and iteratively debug it, it can write intelligently and sophisticatedly about complicated subjects.


It can reason through difficult high school competition math, and it's beating the vast majority of high schoolers on whatever tests we give it.


Remember, there was the sparks of AGI paper which showed some capabilities that showed us that we weren't too far away from AGI, and that this GPT-4 level system were the first initial sparks of artificial general intelligence.


The thing is here, he clearly states, and I'm glad he's stating this because a lot of people don't realize this, that the limitation comes down to obvious ways that models are still hobbled.


Basically, he's talking about the way that models are used and the current frameworks that they have, the raw intelligence behind the model, the raw cognitive capabilities of these models if you even want to call it that, as artificially constrained.


In the future, if you calculate the fact that these are going to be unconstrained, it's going to be very fascinating on how that raw intelligence applies across different applications.


One of the clear things I think that most people aren't realizing is that we're running out of benchmarks.


As an anecdote, my friends Dan and Colin made a benchmark called the MMLU a few years ago in 2020.


They hoped to finally make a benchmark that would stand the test of time equivalent to all the hardest exams we give high school and college students.


Just three years later, models like GPT-4 and Gemini get around 90%.


GPT-4 mostly cracks all the standard high school and college aptitude tests.


You can see here the test scores of AI systems on various capabilities relative to human performance.


You can see that in the recent years there have been a stark stark level of increases.


Here it's absolutely crazy with as to how many different areas that AI is increasing in terms of the capabilities.


It's really really fascinating to see and also potentially quite concerning.


One of the things that most people did actually miss about going from GPT-4 to AGI was a benchmark that actually did shock me.


There is essentially this benchmark called the math benchmark, a set of difficult mathematic problems from a high school math competitions.


When the benchmark was released in 2021, GPT-3 only got 5%. Basically the crazy thing about this was that researchers predicted at that time stating to have more traction on mathematical problem solving, we will likely need new algorithmic advancements from the broader research community and we're going to need fundamental new breakthroughs to solve maths or as they thought they predicted minimal progress over the coming years.


But by mid 2022, we got to 50 accuracy and basically now with the recent math Gemini 1.5 Pro, we know that this is now at 90 which is absolutely incredible.

Here's something that you can clearly screenshot and share to your friends or colleagues or whatever it is whatever kind of community that you might be in.


But you can see that the performance on the comments exams percentile compared to human test takers.


We can see that GPT-4 ranks above 90% for pretty much all of them except calculus and chemistry which is a remarkable feat when we went from GPT-3 to GPT-4 in such a short amount of time.


This is a true true jump in capabilities that many people just simply wouldn't have expected.


Here's where we start in to get to some of the predictions that we can really start to make based on the nature of deep learning.


Essentially the magic of deep learning is that it just works and the trend lines have been astonishingly consistent despite the naysayers at every turn.


We can see here that this are screenshots from the scaling compute in the OpenAI's Sora technology and at each level we can see an increase in the quality and consistency.


The base compute results in a pretty terrible image/video four times compute results in something that is pretty coherent and consistent but 30 times compute is something that is remarkable in terms of the quality consistency and the level of video that we do get which shows us that these trend lines are very very consistent.


He says if we can reliably count the orders of magnitude that we're going to be training these models we can therefore extrapolate the capability improvements and that's how some people actually saw the GPT-4 level of capabilities.


One of the things that he talks about is of course things like chain of thought tools and scaffolding and therefore we can unlock significant latent capabilities basically when we have GPT-4 or whatever the base cognitive capabilities are for this architecture.


We can use that to unlock latent capabilities by adding different steps in front of that system.


For example, when you use GPT-4 with chain of thought reasoning, you significantly improve your ability to answer certain questions in different scenarios.


It's things like that where you can unlock more knowledge from the system by using different ways to interact with it, which means that the raw data behind the system and the raw knowledge is a lot bigger than people think.


This is what you call un-hobbling gains.


One of the things that's really important, and this is something that doesn't get enough attention, but this is going to make up a lot of the gains that you won't see, is the algorithmic efficiencies.


Whilst massive investments into compute get all the attention, algorithmic progress is similarly an important driver of progress and is dramatically underrated.


To see just how big of a deal algorithmic progress can be, consider the following illustration.


This one right here: the drop of the price to attain 50 accuracy on the math benchmark over just two years.


For comparison, a computer science phd student who didn't particularly like math scored 40.


This is already quite good, and the inference efficiency improved by nearly three orders of magnitude or a thousand x in less than two years.


What we have here is something that is incredibly more efficient for the same result in just two years.


That is absolutely incredible.


These algorithmic efficiencies are going to drive a lot more gains than you think.


As someone who was looking at arxiv, which is where a lot of these research papers get published, just trust me, there are like probably 50 to 80 different research papers that get published every single day.


A few of those allow you know 10 to 20 percent gain, 30 gain.


If you calculate the fact that all of these algorithmic efficiencies are going to compound against each other, we're really going to see more cases like this.


Here's where you talk about the api cost and you basically look at how efficient it becomes to run these models.


GPT-4 on release cost to save at GPT-3 when it was released.


But since the GPT-4 release a year ago, the prices for GPT-4 level models have fallen six times/four times for the input/output with the release of GPT-4.


GPT-3.5 level is basically Gemini 1.5 Flash, and this is 85 times cheaper than what we previously used to have.

We can see here on this graph that if we want to calculate exactly how much progress is going to be made, we can clearly see that there are two main things here.


The first is the physical compute of scaling, which is going to be things like these data centers and the hardware that we throw at the problems.


And then the algorithmic progress, which is going to be the efficiencies where people rewrite these algorithms in crazy ways that just drive efficiencies that we previously didn't know how to solve.


That's why in the future, where we do get an automated AI researcher to do that, this gap is going to widen even more.


This is where we talk about hobbling.


