EA - AGI Catastrophe and Takeover: Some Reference Class-Based Priors by zdgroff

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI Catastrophe and Takeover: Some Reference Class-Based Priors, published by zdgroff on May 24, 2023 on The Effective Altruism Forum.This is a linkpost forI am grateful to Holly Elmore, Michael Aird, Bruce Tsai, Tamay Besiroglu, Zach Stein-Perlman, Tyler John, and Kit Harris for pointers or feedback on this document.Executive SummaryOverviewIn this document, I collect and describe reference classes for the risk of catastrophe from superhuman artificial general intelligence (AGI). On some accounts, reference classes are the best starting point for forecasts, even though they often feel unintuitive. To my knowledge, nobody has previously attempted this for risks from superhuman AGI. This is to a large degree because superhuman AGI is in a real sense unprecedented. Yet there are some reference classes or at least analogies people have cited to think about the impacts of superhuman AI, such as the impacts of human intelligence, corporations, or, increasingly, the most advanced current AI systems.My high-level takeaway is that different ways of integrating and interpreting reference classes generate priors on AGI-caused human extinction by 2070 anywhere between 1/10000 and 1/6 (mean of ~0.03%-4%). Reference classes offer a non-speculative case for concern with AGI-related risks. On this account, AGI risk is not a case of Pascalâ€™s mugging, but most reference classes do not support greater-than-even odds of doom. The reference classes I look at generate a prior for AGI control over current human resources anywhere between 5% and 60% (mean of ~16-26%). The latter is a distinctive result of the reference class exercise: the expected degree of AGI control over the world looks to far exceed the odds of human extinction by a sizable margin on these priors. The extent of existential risk, including permanent disempowerment, should fall somewhere between these two ranges.This effort is a rough, non-academic exercise and requires a number of subjective judgment calls. At times I play a bit fast and loose with the exact model I am using; the work lacks the ideal level of theoretical grounding. Nonetheless, I think the appropriate prior is likely to look something like what I offer here. I encourage intuitive updates and do not recommend these priors as the final word.ApproachI collect sets of events that superhuman AGI-caused extinction or takeover would be plausibly representative of, ex ante. Interpreting and aggregating them requires a number of data collection decisions, the most important of which I detail here:For each reference class, I collect benchmarks for the likelihood of one or two things:Human extinctionAI capture of humanityâ€™s available resources.Many risks and reference classes are properly thought of as annualised risks (e.g., the yearly chance of a major AI-related disaster or extinction from asteroid), but some make more sense as risks from a one-time event (e.g., the chance that the creation of a major AI-related disaster or a given asteroid hit causes human extinction). For this reason, I aggregate three types of estimates (see the full document for the latter two types of estimates):50-Year Risk (e.g. risk of a major AI disaster in 50 years)10-Year Risk (e.g. risk of a major AI disaster in 10 years)Risk Per Event (e.g. risk of a major AI disaster per invention)Given that there are dozens or hundreds of reference classes, I summarise them in a few ways:Minimum and maximumWeighted arithmetic mean (i.e., weighted average)I â€œwinsoriseâ€, i.e. replace 0 or 1 with the next-most extreme value.I intuitively downweight some reference classes. For details on weights, see the methodology.Weighted geometric meanFindings for Fifty-Year Impacts of Superhuman AISee the full document and spreadsheet for further details on how I arrive at these figures....