EA - Some governance research ideas to prevent malevolent control over AGI and why this might matter a hell of a lot by Jim Buhler

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some governance research ideas to prevent malevolent control over AGI and why this might matter a hell of a lot, published by Jim Buhler on May 23, 2023 on The Effective Altruism Forum.Epistemic status: I spent only a few weeks reading/thinking about this. I could have asked more people to give me feedback so I could improve this piece but I’d like to move on to other research projects and thought throwing this out there was still a good idea and might be insightful to some.SummaryMany power-seeking actors will want to influence the development/deployment of artificial general intelligence (AGI). Some of them may have malevolent(-ish) preferences which they could satisfy on massively large scales if they succeed at getting some control over (key parts of the development/deployment of) AGI. Given the current rate of AI progress and dissemination, the extent to which those actors are a prominent threat will likely increase.In this post:I differentiate between different types of scenarios and give examples.I argue that 1) governance work aimed at reducing the influence of malevolent actors over AGI does not necessarily converge with usual AGI governance work – which is as far as I know – mostly focused on reducing risks from “mere” uncautiousness and/or inefficiencies due to suboptimal decision-making processes, and 2) the expected value loss due to malevolence, specifically, might be large enough to constitute an area of priority in its own right for longtermists.I, then, list some research questions that I classify under the following categories:Breaking down the conditions for an AGI-related long-term catastrophe from malevolenceRedefining the set of actors/preferences we should worry aboutSteering clear from information/attention hazardsAssessing the promisingness of various interventionsHow malevolent control over AGI may trigger long-term catastrophes?(This section is heavily inspired by discussions with Stefan Torges and Linh Chi Nguyen. I also build on Das Sarma and Wiblin’s (2022) discussion.We could divide the risks we should worry about into those two categories: Malevolence as a risk factor for AGI conflict and Direct long-term risks from malevolence.Malevolence as a risk factor for AGI conflictClifton et al. (2022) write:Several recent research agendas related to safe and beneficial AI have been motivated, in part, by reducing the risks of large-scale conflict involving artificial general intelligence (AGI). These include the Center on Long-Term Risk’s research agenda, Open Problems in Cooperative AI, and AI Research Considerations for Human Existential Safety (and this associated assessment of various AI research areas). As proposals for longtermist priorities, these research agendas are premised on a view that AGI conflict could destroy large amounts of value, and that a good way to reduce the risk of AGI conflict is to do work on conflict in particular.In a later post from the same sequence, they explain that one of the potential factors leading to conflict is conflict-seeking preferences (CSPs) such as pure spite or unforgivingness. While AGIs might develop CSPs by themselves in training (e.g., because there are sometimes advantages to doing so; see, e.g., Abreu and Sethi 2003), they might also inherit them from malevolent(-ish) actors. Such an actor would also be less likely to want to reduce the chance of CSPs arising by “accident”.This actor can be a legitimate decisive person/group in the development/deployment of AGI (e.g., a researcher at a top AI lab, a politician, or even some influencer whose’ opinion is highly respected), but also a spy/infiltrator or external hacker (or something in between these last two).Direct long-term risks from malevolenceFor simplicity, say we are concerned about the risk of some AGI ending up with...