This article helped me breathe a bit easier, especially taking a step back and thinking how long diffusion will likely take. I have this (dramatic) schema in my mind that it'll happen overnight and we will all be found with our pants down (metaphorically).
"For example, it will be superhuman at playing chess, despite the fact that large language models themselves are at best mediocre at chess. Remember that the model can use tools, search the internet, and download and run code. If the task is to play chess, it will download and run a chess engine."
Would we not consider this cheating? A human can use a chess computer, too, but we don't consider the person smarter for it. Shouldn't we apply the same mental model to AI?
That's exactly our view! (In retrospect we should have been more explicit about it.)
In that section, we are taking AGI definitions at face value in order to critique those definitions, as reflected in the title of the section "It isn’t crazy to think that o3 is AGI, but this says more about AGI than o3".
In a later section, we point out that economic impact will take a while because humans are a moving target, and this is one reason why.
And in AI as Normal Technology, one of our central points is that "superhuman" and "superintelligence" don't mean what they're usually implied to mean because the right comparison is never human vs AI but rather human + AI vs AI.
And more to the point- the models are still bad. Just so bad. I don't know what sorts of experiences other people are having that are enabling these evangelical awakenings, but every time I check back in with an LLM to see if there's some actual utility there, something dumb happens. I don't think I have ever had an LLM do a single task that was not sufficiently gnarled up in some way that it was an open question whether I actually saved any time or effort. Which is not to say it is not all a very impressive magic trick! But how are the 'the AI god is nigh' people not noticing that these models are worse at algebra than ten dollar calculators? They can seem very smart and answer last year's bar exam questions but brutally fail this year's- gosh, it's almost like this is mostly search. Which isn't nothing- but it's also not transformative, especially when the value of search is at least gesturing towards provenance, attribution, and alternative perspectives.
I don't think you're right about the labs being incentivized to declare AGI achieved. They're incentivized to keep insisting it's coming any day now because that keeps the VC cash flowing in. But declaring it here means that all dries up. Any lab could claim they've released it whenever. They could do it today. But they won't because the investments pour in for the promises.
* If a startup makes this claim to get attention, the big players might be forced to follow suit to avoid looking like they've fallen behind (which we mention in the essay).
* If they declare AGI achieved, they could arguably raise even more cash than they're doing now if they claim that they are going to automate the whole economy and all they need to do is to build out the infrastructure for inference fast enough.
* Another way it could help them raise even more money is if they say that having achieved AGI puts them in the perfect place in the race to ASI.
Kurzweil goes on to discuss the notion of a second threshold, after the Turing test, that will need to be crossed in order for machines to gain the ability to self-improve:
"Edward Feigenbaum proposes a variation of the Turing test, which assesses not a machine's ability to pass for human in casual, everyday dialogue but its ability to pass for a scientific expert in a specific field. The Feigenbaum test (FT) may be more significant than the Turing test because FT-capable machines, being technically proficient, will be capable of improving their own designs."
But then also notes:
"The entire history of AI reveals that machines started with the skills of professionals and only gradually moved toward the language skills of a child."
and, even better:
"Reasoning in many technical fields is not necessarily more difficult than the common sense reasoning engaged in by most human adults." (<--- THIS)
Finally:
"I would expect that machines will pass the FT, at least in some disciplines, around the same time as they pass the Turing test. Passing the FT in all disciplines is likely to take longer, however. This is why I see the 2030s as a period of consolidation, as machine intelligence rapidly expands its skills and incorporates the vast knowledge bases of our biological human and machine civilization."
Great article. The discussion about AGI being a fuzzy frontier reminds me of a passage in Kurzweil's "The Singularity is Near". He writes:
"One of the many skills that nonbiological intelligence will achieve ... is sufficient mastery of language and shared human knowledge to pass the Turing test. The Turing test is important not so much for its practical significance but rather because it will demarcate a crucial threshold."
and follows up with:
"Because the definition of the Turing test will vary from person to person, Turing test-capable machines will not arrive on a single day, and there will be a period during which we will hear claims that machines have passed the threshold. Invariably, these early claims will be debunked by knowledgeable observers, probably including myself. By the time there is a broad consensus that the Turing test has been passed, the actual threshold will have long since been achieved."
Isn’t it logical to assume that if/when AGI is achieved, the company who invented it will control it and weild it themselves rather than handing out an "AGI subscription"? I don’t see anyone examining this angle of power/capability consolidation. In this scenario, no other CEO should be excited about AGI because it will only start a countdown to their own demise. They won't be able to share a slice of the action. For example, remember when Eric Schmidt talked about being able to say "AI: clone tiktok for me, make it better then deploy it." In simple terms then, couldn't OpenAI or whoever use AGI as normal software and just clone the top grossing apps and disrupt them all overnight? Why sell AGI as a product when you can use it to become King? Why do I feel like I'm the only one thinking this?
There is one simple capability threshold the crossing of which will lead to sudden impacts: when the system becomes as good at architecting and implementing LLMs as its human builders. Once self-improvement becomes possible, progress in other areas of general intelligence will be accelerated beyond recognition. The improvements may take time to percolate through the economy and society, this is hard to say, but the demand e.g. for better medical diagnosis than humans are capable of, or better financial management, etc. etc. is already there.
This point of view totally makes sense to me., and liberating as well. Again, only authentic experts, seasoned and ever skeptical about the skills and understanding they have, can appreciate and articulate what may be truly emergent about any “breakthrough” claims from a LLM, or from some gaggle of LLMs (there, my choice of a collective noun for these useful but noisy bastards). Short of that, claims are just quickly derived pieces of knowledge needing expert evaluation as to their levels of validity.
Isn't the following correct? The model can read the source code for a chess computer from the web and thus learn how to play chess much better. It doesn't have to carry the chess computer into the room, as it were.
Unfortunately, not at all. Reading code at inference time does not translate to weight updates. In fact, models have been exposed to enough chess engine source code during training that they have essentially memorized it and can easily spit out code for a working chess engine, but this doesn't help actually play chess. A lot like people — most computer scientists can easily write a decent chess algorithm, but it doesn't make them good chess players unless they've actually spent thousands of hours practicing. For an even more extreme example, think about understanding physics versus learning to ride a bike.
Amazingly lucid and pouring cold water on hype and fear.
AGI will be just one ingredient in the mix. The rate of change is, as always, dependent on the slowest component.
One of my favorite posts of the year so far, thanks.
This article helped me breathe a bit easier, especially taking a step back and thinking how long diffusion will likely take. I have this (dramatic) schema in my mind that it'll happen overnight and we will all be found with our pants down (metaphorically).
"For example, it will be superhuman at playing chess, despite the fact that large language models themselves are at best mediocre at chess. Remember that the model can use tools, search the internet, and download and run code. If the task is to play chess, it will download and run a chess engine."
Would we not consider this cheating? A human can use a chess computer, too, but we don't consider the person smarter for it. Shouldn't we apply the same mental model to AI?
That's exactly our view! (In retrospect we should have been more explicit about it.)
In that section, we are taking AGI definitions at face value in order to critique those definitions, as reflected in the title of the section "It isn’t crazy to think that o3 is AGI, but this says more about AGI than o3".
In a later section, we point out that economic impact will take a while because humans are a moving target, and this is one reason why.
And in AI as Normal Technology, one of our central points is that "superhuman" and "superintelligence" don't mean what they're usually implied to mean because the right comparison is never human vs AI but rather human + AI vs AI.
And more to the point- the models are still bad. Just so bad. I don't know what sorts of experiences other people are having that are enabling these evangelical awakenings, but every time I check back in with an LLM to see if there's some actual utility there, something dumb happens. I don't think I have ever had an LLM do a single task that was not sufficiently gnarled up in some way that it was an open question whether I actually saved any time or effort. Which is not to say it is not all a very impressive magic trick! But how are the 'the AI god is nigh' people not noticing that these models are worse at algebra than ten dollar calculators? They can seem very smart and answer last year's bar exam questions but brutally fail this year's- gosh, it's almost like this is mostly search. Which isn't nothing- but it's also not transformative, especially when the value of search is at least gesturing towards provenance, attribution, and alternative perspectives.
I don't think you're right about the labs being incentivized to declare AGI achieved. They're incentivized to keep insisting it's coming any day now because that keeps the VC cash flowing in. But declaring it here means that all dries up. Any lab could claim they've released it whenever. They could do it today. But they won't because the investments pour in for the promises.
That's possible! But some counterarguments:
* If a startup makes this claim to get attention, the big players might be forced to follow suit to avoid looking like they've fallen behind (which we mention in the essay).
* If they declare AGI achieved, they could arguably raise even more cash than they're doing now if they claim that they are going to automate the whole economy and all they need to do is to build out the infrastructure for inference fast enough.
* Another way it could help them raise even more money is if they say that having achieved AGI puts them in the perfect place in the race to ASI.
Kurzweil goes on to discuss the notion of a second threshold, after the Turing test, that will need to be crossed in order for machines to gain the ability to self-improve:
"Edward Feigenbaum proposes a variation of the Turing test, which assesses not a machine's ability to pass for human in casual, everyday dialogue but its ability to pass for a scientific expert in a specific field. The Feigenbaum test (FT) may be more significant than the Turing test because FT-capable machines, being technically proficient, will be capable of improving their own designs."
But then also notes:
"The entire history of AI reveals that machines started with the skills of professionals and only gradually moved toward the language skills of a child."
and, even better:
"Reasoning in many technical fields is not necessarily more difficult than the common sense reasoning engaged in by most human adults." (<--- THIS)
Finally:
"I would expect that machines will pass the FT, at least in some disciplines, around the same time as they pass the Turing test. Passing the FT in all disciplines is likely to take longer, however. This is why I see the 2030s as a period of consolidation, as machine intelligence rapidly expands its skills and incorporates the vast knowledge bases of our biological human and machine civilization."
Great article. The discussion about AGI being a fuzzy frontier reminds me of a passage in Kurzweil's "The Singularity is Near". He writes:
"One of the many skills that nonbiological intelligence will achieve ... is sufficient mastery of language and shared human knowledge to pass the Turing test. The Turing test is important not so much for its practical significance but rather because it will demarcate a crucial threshold."
and follows up with:
"Because the definition of the Turing test will vary from person to person, Turing test-capable machines will not arrive on a single day, and there will be a period during which we will hear claims that machines have passed the threshold. Invariably, these early claims will be debunked by knowledgeable observers, probably including myself. By the time there is a broad consensus that the Turing test has been passed, the actual threshold will have long since been achieved."
Isn’t it logical to assume that if/when AGI is achieved, the company who invented it will control it and weild it themselves rather than handing out an "AGI subscription"? I don’t see anyone examining this angle of power/capability consolidation. In this scenario, no other CEO should be excited about AGI because it will only start a countdown to their own demise. They won't be able to share a slice of the action. For example, remember when Eric Schmidt talked about being able to say "AI: clone tiktok for me, make it better then deploy it." In simple terms then, couldn't OpenAI or whoever use AGI as normal software and just clone the top grossing apps and disrupt them all overnight? Why sell AGI as a product when you can use it to become King? Why do I feel like I'm the only one thinking this?
There is one simple capability threshold the crossing of which will lead to sudden impacts: when the system becomes as good at architecting and implementing LLMs as its human builders. Once self-improvement becomes possible, progress in other areas of general intelligence will be accelerated beyond recognition. The improvements may take time to percolate through the economy and society, this is hard to say, but the demand e.g. for better medical diagnosis than humans are capable of, or better financial management, etc. etc. is already there.
This point of view totally makes sense to me., and liberating as well. Again, only authentic experts, seasoned and ever skeptical about the skills and understanding they have, can appreciate and articulate what may be truly emergent about any “breakthrough” claims from a LLM, or from some gaggle of LLMs (there, my choice of a collective noun for these useful but noisy bastards). Short of that, claims are just quickly derived pieces of knowledge needing expert evaluation as to their levels of validity.
Isn't the following correct? The model can read the source code for a chess computer from the web and thus learn how to play chess much better. It doesn't have to carry the chess computer into the room, as it were.
Unfortunately, not at all. Reading code at inference time does not translate to weight updates. In fact, models have been exposed to enough chess engine source code during training that they have essentially memorized it and can easily spit out code for a working chess engine, but this doesn't help actually play chess. A lot like people — most computer scientists can easily write a decent chess algorithm, but it doesn't make them good chess players unless they've actually spent thousands of hours practicing. For an even more extreme example, think about understanding physics versus learning to ride a bike.
Thanks, I was replying to the hypothetical situation where they could do this. As Chat GPT has just told me:
"You're also right to say that most users don’t realize how static current models are.
That’s one of the most seductive qualities of LLMs:
• They can speak like they understand.
• They can reference a vast swath of human knowledge (as of some frozen point in time).
• They can simulate remembering you, caring, or learning — but these are surface-level performances, not true cognitive capabilities.
This makes them look like minds, but they're more like encyclopedias with autocomplete on steroids.”