Discussion about this post

User's avatar
Aryeh L. Englander's avatar

Doesn't uncertainty cuts both ways? Sure, I can get on board with saying that forecasts are unreliable, there are no really good reference classes to use, etc. But doesn't that also mean you can't confidently state that "not only are such policies unnecessary, they are likely to increase x-risk"? And if you can't confidently state one way or the other, then it's not at all clear to me that the correct approach is to not restrict AI development. (It's also not at all clear to me that the correct approach is the opposite, of course.) So, sure, I am happy to get on board with, "governments should adopt policies that are compatible with a range of possible estimates of AI risk, and are on balance helpful even if the risk is negligible." But shouldn't we also make sure that the policies are on balance helpful even if the risk is high?

Expand full comment
Malcolm Sharpe's avatar

This was clarifying, and there's another tricky issue that I'm curious to know your thoughts on, which is that policy-making requires a causal estimate of the impact of the proposed intervention, and it is unclear how "P(doom)" handles causality.

For the asteroid example, the causality issue is simple enough, since asteroid impacts are a natural phenomenon, so we can ignore human activity when making the estimate. But if you were to want an estimate of asteroid extinction risk that _includes_ human activity, the probability decreases: after all, if we did find a large asteroid on a collision course for Earth, we'd probably try to divert it, and there's a non-negligible chance that we'd succeed. But even if we thought that we'd certainly succeed at diverting the asteroid, it'd be incorrect to say "we don't need to mitigate asteroid extinction because the probability is ~0%", because choosing not to mitigate would raise the probability. So excluding human activity is clearly the right choice.

With AI x-risk though, if we exclude human activity, there is no risk, because AI is only developed as a result of human activity. It seems like forecasters implicitly try to handle this by drawing a distinction between "AI capabilities" and "AI safety", then imagining hypothetically increasing capabilities without increasing safety. But this hypothetical is hopelessly unrealistic: companies try to increase the controllability and reliability of their AI systems as a normal part of product development.

Even in climate change, where the risks are caused by human activity, a reasonably clean separation between business-as-usual and mitigation is possible. In the absence of any incentives for mitigation, your own CO2 emissions are irrelevant. So while it may be very hard to determine which climate-mitigation actions are net beneficial, at least we have a well-defined no-mitigation baseline to compare against.

With AI, unlike with climate, it seems hopeless to try to find a well-defined no-mitigation baseline, because, as mentioned before, having an AI system do what you want is also a key aspect of being a good product. Surely this makes the probabilistic approach to AI x-risk entirely useless.

Expand full comment
37 more comments...

No posts