20 Comments
Mar 13·edited Mar 13Liked by Sayash Kapoor

Thank you, this was a great read.

"The assumption that AI safety is a property of AI models is pervasive in the AI community."

Isn't the reason for this that a lot of AI companies claim they're on the pathway to "AGI", and AGI is understood to be human-level intelligence (at least), which translates, in the minds of most people, as human level understanding of context, and thus, human level of responsibility. It's hard to say a model is as smart as a human, but not so smart that it cannot discern to what purpose another human is using it.

Put another way, many (though not all) AI companies want you to see their models to be as capable of humans, able to do the tasks humans can, at human or near-human levels, only automated, at broader scale, and without the pesky demands human employees make.

Acknowledging that AI models cannot be held responsible for their risky uses puts them in their appropriate place: as a new form of computing with great promise and interesting use cases, but nowhere close to replacing humans, or equaling the functions humans play in thinking through and mitigating risk when performing the kinds of tasks AI may be used for.

But that runs contrary to the AGI in a few years/singularity narrative, so safety is facto expected to be an aspect of the intelligence of Artificial Intelligence models. The snake oil salesmen are being asked to drink their own concoctions. Hopefully, at some point, they'll be forced to acknowledge reality.

Expand full comment
Mar 12Liked by Arvind Narayanan

Great post! There are a lot of important ideas here that I hadn't seen clearly expressed before.

You note that in many cases, an important aspect of risk reduction may lie in adapting to risks, rather than attempting to develop riskless models. I have recently been thinking along somewhat similar lines. It seems to me that for many risks, there are adaptations available that are worthwhile on their own merits, even without taking AI risk into account. For instance, public health measures such as improved building ventilation and pandemic preparedness seem justifiable based solely on the the basis of existing viruses, but would also reduce the danger from AIs capable of assisting with bioengineering. Not all risks can feasibly be eliminated in this fashion, but it seems to me that many risks can be substantially reduced.

Expand full comment

Thoughtful piece, and a few good take-aways, but a few comments in response:

- As you point out, this is a post about misuse, not safety risks in general. You suggest your argument applies broadly, but make no convincing case in this area. There are a variety of AI safety risks that are treatable at the model level, especially if that model is trained to be effective on a specific use case or for a specific persona, each of which would imply embedding some amount of context as part of the training process. Things like toxicity, slander, hate speech, bias and fairness, and other areas of safety are things that model teams can and should be trying to detect and, where relevant, resolve at the model level, or at least providing information on vulnerabilities to others in the application chain so application-level support (e.g., adversarial judges, application-level rails, etc) can be considered.

- Your point on marginal risk (here and elsewhere) is a reasonable addition to the debate, but the fact that no/little marginal risk is added by a model need not mean that action to reduce that risk should not be taken. Just because a model doesn't today provide much more risky information to a user than that user can already get from the Internet hardly means that we should throw our hands in the air and conclude that all new things should be as unsafe as whatever is already possible. This is a least common denominator approach to technology development. We need not even get into questions around responsibility here -- development and consuming organizations have good corporate reasons to avoid being vectors for unwanted outcomes, and therefore lots of reason to be better than, say, what the general Internet makes possible. So marginal risk defined as risk beyond some public denominator is often actually the wrong way for organizations to think about risk.

Expand full comment

Interesting! You basically argue for government funded regulators to build (and presumably test) offensive pipelines, right? Would be interesting to hear you speculate about what that would mean/require in detail. Hunderds or thousands of government funded highly capable people using AI to try to cyberattack nuclear powerplants, steal credit card information, or build biological weapons? Not trying to caricature your argument here. I agree actually, but might make some people quite uncomfortable. So interesting and useful to speculate what it should look like...

Expand full comment

There is a category of risk that you have omitted, but which features heavily in the ai-notkilleveryone discourse: what if the model decides to do something malicious, without being asked to do it by the model user?

Now, maybe, models are safe by default in this sense, in that you have to do some work to create a malicious model that turns on its users. If we're unlucky, it's an emergent property.

In any case, the user of the model would probably like to know that the model isn't going to try to kill them, and this seems a model property.

Expand full comment

AI safety that goes beyond "model properties" can potentially yield contextually safe models with appropriate 1) training, 2) computational resources, 3) abstraction, 4) emergent contextuality through relations and correspondences, and 5) iterative improvements. These elements are underpinned by a 6) nuanced epistemological and ontological risk-based framework that carefully measures context in relation to individual settings, fostering what we recognize as "good actions under free will" and, subsequently, "agency". In other words, I believe that the more powerful the AI system, while maintaining current approaches to safety (and including the arguments the authors make), the safety of AI systems will scale in such a way that (in addition to) self-iteration (or self-improvement, in terms of safety improvements) will result in systems with proportionally reduced adversary capabilities to cause real harm (as opposed to adversaries using a variety of "narrow AI systems" with fewer safety guardrails). However, the path to truly safe deep neural network architectures and functional agentic systems is hampered by economic imperatives that prioritize deployment, leaving us perpetually one step behind in achieving "provable safety" (or security, recognizing the distinction between the two, which becomes harder to distinguish the more powerful the AI system). The complexity escalates when we consider critical, open systems where the timing of decisions - whether 50 milliseconds or 50 minutes - is crucial.

Expand full comment

In a typical Internet firewall setup, the firewall sits between the wilds of the outside Internet (where the attackers live) and the machines on your corporate internal network the firewall is supposed to keep the bad stuff out. But to do that, it needs an accurate model of the machines on your internal network so it can figure out what's actually bad. There is a whole class of attacks where something that is actually bad isn't recognised by the firewall,

Ok, now imagine doing this with LLMs. The attacker types on stuff, a firewall that you have written checks if it is bad, and if not passes it on to the LLM.

Problem: your firewall needs an accurate model of the LLM, and therefore is as complex as the LLM. Oh dear.

Expand full comment

If true, this would be absolutely disastrous for AI safety,

So, suppose every application has its own idiosyncratic safety requirement.

Can you use an off the shelf model?

No! By hypothesis, they are unsafe wrt your application requirement.

Oh, so you need to build your own model, that meets your requirement.

Trouble is, that costs you on the order of one hundred million dollars.

For every application.

Have you you got a hundred million dollars for each app.ication.

No?

Ok, you're negligent deploying the system, and you should be liable - and maybe go to jail - if anything goes wrong

Stronger version: you are criminally negligent and should go to jail as a preventative measure before anything bad happens.

Expand full comment

I teach law, and just posted on Boeing & Complexity on my own substack, so this post really caught my eye. I liked it, as usual, but think the case, or at least the title, is overstated. Of course, any technology can be used for bad things. Consider planes (9/11), or pharmaceuticals (opioid crisis), or cars, or household appliances, or . . . But we care a great deal about the safety of planes and drugs and the rest. Modern societies have elaborate regulatory structures to ensure that such technologies are safe in ordinary, and perhaps even modestly negligent, use, quite apart from malicious use. So yeah, the properties of technologies matter. Your recognition of an exception of "safety not model property" rule for child sexual abuse seems insufficient.

That said, and after reading past the title, I think your overarching point is that context matters, and that developers need to bear responsibility for ensuring safe use. An analogy might be controlled substances of various sorts. How much responsibility, of course, is a social/legal problem. And I agree, from a developer's perspective, it would be convenient to limit the question of responsibility to model properties, and society should not allow that.

Holding the problems of context to one side, which your post interestingly addresses in preliminary fasion, I am left wondering, what do you think model safety would look like? Keep up the good work!

Expand full comment

The writers have made out a great case here. That, too, in an easy to understand manner.

Expand full comment

An interesting read!

I'd add that, generally speaking, there is an incentives misalignment to begin with, in AI and tech in general.

How do you explain to someone about the risks of AI when they're in a headlong rush towards wealth and power?

There is just so much money to be made, quite a bit of arrogance, and not enough consequences.

Expand full comment