EA - The bullseye framework: My case against AI doom by titotal

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The bullseye framework: My case against AI doom, published by titotal on May 30, 2023 on The Effective Altruism Forum.Introduction:I’ve written quite a few articles casting doubt on several aspects of the AI doom narrative. (I’ve starting archiving them on my substack for easier sharing). This article is my first attempt to link them together to form a connected argument for why I find imminent AI doom unlikely.I don’t expect every one of the ideas presented here to be correct. I have a PHD and work as a computational physicist, so I’m fairly confident about aspects related to that, but I do not wish to be treated as an expert on other subjects such as machine learning where I am familiar with the subject, but not an expert. You should never expect one person to cover a huge range of topics across multiple different domains, without making the occasional mistake. I have done my best with the knowledge I have available.I don’t speculate about specific timelines here. I suspect that AGI is decades away at minimum, and I may reassess my beliefs as time goes on and technology changes.In part 1, I will point out the parallel frameworks of values and capabilities. I show what happens when we entertain the possibility that at least some AGI could be fallible and beatable.In part 2, I outline some of my many arguments that most AGI will be both fallible and beatable, and not capable of world domination.In part 3, I outline a few arguments against the ideas that “x-risk” safe AGI is super difficult, taking particular aim at the “absolute fanatical maximiser” assumption of early AI writing.In part 4, I speculate on how the above assumptions could lead to a safe navigation of AI development in the future.This article does not speculate on AI timelines, or on the reasons why AI doom estimates are so high around here. I have my suspicions on both questions. On the first, I think AGI is many decades away, on the second, I think founder effects are primarily to blame. However these will not be the focus of this article.Part 1: The bullseye frameworkWhen arguing for AI doom, a typical argument will involve the possibility space of AGI. Invoking the orthogonality thesis and instrumental convergence, the argument goes that in the possibility space of AGI, there are far more machines that want to kill us than those that don’t. The argument is that the fraction is so small that AGI will be rogue by default: like the picture below.As a sceptic, I do not find this, on its own, to be convincing. My rejoinder would be that AGI’s are not being plucked randomly from possibility space. They are being deliberately constructed and evolved specifically to meet that small target. An AI that has the values of “scream profanities at everyone” is not going to survive long in development. Therefore, even if AI development starts in dangerous territory, it will end up in safe territory, following path A. (I will flesh this argument out more in part 3 of this article).To which the doomer will reply: Yes, there will be some pressure towards the target of safety, but it won’t be enough to succeed, because of things like deception, perverse incentives, etc. So it will follow something more like path B above, where our attempts to align it are not successful.Often the discussion stops there. However, I would argue that this is missing half the picture. Human extinction/enslavement does not just require that an AI wants to kill/enslave us all, it also requires that the AI is capable of defeating us all. So there’s another, similar, target picture going on:The possibility space of AGI’s includes countless AI’s that are incapable of world domination. I can think of 8 billion such AGI’s off the top of my head: Human beings. Even a very smart AGI may still fail to dominate humanity, if it’s locked...