EA - Aim for conditional pauses by AnonResearcherMajorAILab
The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categories:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Aim for conditional pauses, published by AnonResearcherMajorAILab on September 25, 2023 on The Effective Altruism Forum.TL;DR: I argue for two main theses:[Moderate-high confidence] It would be better to aim for a conditional pause, where a pause is triggered based on evaluations of model ability, rather than an unconditional pause (e.g. a blanket ban on systems more powerful than GPT-4).[Moderate confidence] It would be bad to create significant public pressure for a pause through advocacy, because this would cause relevant actors (particularly AGI labs) to spend their effort on looking good to the public, rather than doing what is actually good.Since mine is one of the last posts of the AI Pause Debate Week, I've also added a section at the end with quick responses to the previous posts.Which goals are good?That is, ignoring tractability and just assuming that we succeed at the goal -- how good would that be? There are a few options:Full steam ahead. We try to get to AGI as fast as possible: we scale up as quickly as we can; we only spend time on safety evaluations to the extent that it doesn't interfere with AGI-building efforts; we open source models to leverage the pool of talent not at AGI labs.Quick take. I think this would be bad, as it would drastically increase x-risk.Iterative deployment. We treat AGI like we would treat many other new technologies: something that could pose risks, which we should think about and mitigate, but ultimately something we should learn about through iterative deployment. The default is to deploy new AI systems, see what happens with a particular eye towards noticing harms, and then design appropriate mitigations. In addition, rollback mechanisms ensure that we can AI systems are deployed with a rollback mechanism, so that if a deployment causes significant harmsQuick take. This is better than full steam ahead, because you could notice and mitigate risks before they become existential in scale, and those mitigations could continue to successfully prevent risks as capabilities improve.Conditional pause. We institute regulations that say that capability improvement must pause once the AI system hits a particular threshold of riskiness, as determined by some relatively standardized evaluations, with some room for error built in. AI development can only continue once the developer has exhibited sufficient evidence that the risk will not arise.For example, following ARC Evals, we could evaluate the ability of an org's AI systems to autonomously replicate, and the org would be expected to pause when they reach a certain level of ability (e.g. the model can do 80% of the requisite subtasks with 80% reliability), until they can show that the associated risks won't arise.Quick take. Of course my take would depend on the specific details of the regulations, but overall this seems much better than iterative deployment. Depending on the details, I could imagine it taking a significant bite out of overall x-risk. The main objections which I give weight to are the overhang objection (faster progress once the pause stops) and the racing objection (a pause gives other, typically less cautious actors more time to catch up and intensify or win a capabilities race), but overall these seem less bad than not stopping when a model looks like it could plausibly be very dangerous.Unconditional temporary pause. We institute regulations that ban the development of AI models over some compute threshold (e.g. "more powerful than GPT-4"). Every year, the minimum resources necessary to destroy the world drops by 0.5 OOMs, and so we lower the threshold over time. Eventually AGI is built, either because we end the pause in favor of some new governance regime (that isn't a pause), or because the compute threshold got low enough that some actor flou...