In Favor Of Caution

...

Jul 31, 2022

Summary: If alignment is unlikely to be solved before the creation of AGI, then there’s no advantage in being first (because no specific, useful action can be performed), and so a frame of “caution” (slow, careful progress) makes the most sense.

In “How to make the best of the most important century?” Holden Karnofsky presents two frames for thinking about the prospect of AGI: “caution” and “competition”.

Caution

In the “caution” frame, the main risk is that some actor (good or bad) will unintentionally create misaligned AGI. Therefore, the best strategy is to slow the advancement of AGI capabilities, increase global coordination, strengthen cooperation between relevant actors, and think carefully about the nature of alignment.

Competition

In the “competition” frame, the main risk is that a bad actor will develop AGI first. For example, if an authoritarian country wins the race to AGI, it can (according to this frame) impose its values on the rest of the world and gain control over the long term future. Therefore, the best strategy is to ensure that AGI is first developed by good actors.

Given the current playing field, I expect the “caution” frame to be far more applicable in the near future. In particular, the difficulty of alignment makes certain arms race dynamics irrelevant, and by extension, makes the “competition” frame less applicable.

Arms Races

Before going into what makes AGI unique, I’ll address the standard risks of arms race dynamics.

Two famous examples of arms races are the Manhattan Project and the Cold War. Haydn Belfield gives a detailed account of both in “Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg”. In the case of the Manhattan Project, the United States began its nuclear program at the request of Einstein and Szilárd out of fear that Germany would develop nuclear capabilities first. After WWII, the U.S. continued developing its military power out of fear that the Soviet Union would become militarily dominant in the world.

Both cases can be seen as an example of racing against an exaggerated adversary. The U.S. overestimated its opponents, and as a result it accelerated the arrival of dangerous technology. Of course, this is easy to say in retrospect, but it’s important to recognize the risk of overestimation. If the U.S. government were to accelerate AGI research out of fear that an adversarial country might develop it first, the alignment field would have even less time to solve its various core problems. This would also apply if an individual company or organization were to accelerate AGI research in hopes of ensuring that the transition to AGI is positive.

But beyond these standard risks, there’s an additional reason to abstain from participating in an AGI arms race.

Symmetrical Risk

Unlike nuclear weapons, which can be used as a deterrent, AGI provides no benefit to the group that creates it without basic alignment. The appropriate analogy here might be a nuclear weapon with a blast radius covering the entire planet. It can only be used as a kind of one-sided mutual assured destruction (“if you attack us, we’ll activate the doomsday device”).

Unless we can instruct the AGI to do something specific like “stop the adversarial actors from developing AGI”, we have no way of exploiting the technology to our advantage. Our position is the same as if we had abstained, except with the added risk of misaligned AGI which would be disastrous if employed.

The kind of specific, useful action that we would need to perform for AGI to be advantageous is often called a “pivotal act”.

Pivotal Acts

Pivotal acts are defined as actions that will “make a large positive difference a billion years later”. The term communicates the basic difficulty of reducing AGI risk in the long term. Even if we can control our own AGI, there’s still a possibility that some other actor will develop unsafe AGI that doesn’t properly employ alignment protocols. Without some preventative measures, that risk continues to exist until some eventual disaster.

The necessity of a “pivotal act” is also implied in the “competition” frame. If being the first to AGI allows us to prevent bad actors from using it themselves, then there must be some action we can perform after developing AGI that achieves this.

However, we don’t know of any pivotal acts that don’t require basic alignment (the kind that lets us tell an AGI to do some specific task and then stop). So accelerating AGI research in democratic countries, or within reasonable AI companies, will not allow us to perform some kind of pivotal act unless alignment is already well understood.

Regulation

Within the pivotal acts framework, regulation is only helpful if it delays the creation of AGI long enough for alignment to be solved.

In the long term, regulation will probably have a limited effect, like measures to control the proliferation of nuclear weapons. Many states have been successfully prevented from developing nuclear weapons, but there are currently nine nuclear states (Russia, United States, China, France, United Kingdom, Pakistan, India, North Korea, and Israel), and the total number of nuclear weapons in the world is over 12,000. This level of success in AGI nonproliferation would be insufficient, because unlike with nuclear weapons, it would only take one misaligned AGI to pose an existential risk.

Trends and Predictions

One important question is whether alignment is likely to be solved before the arrival of AGI given current trends. If not, then we should focus on slowing progress, increasing cooperation, and studying alignment.

This metaculus question gives a 10% chance that “expert consensus is that the control problem is solved before the public demonstration of "weak" artificial general intelligence.”

Although “weak” AGI might not pose an existential risk, this basically matches my impression of the two fields. Currently, deep learning seems to be advancing faster than alignment research, and people in the deep learning field are more optimistic about AI progress than people in the alignment field are about alignment progress.

The alignment field also has several disadvantages: it’s smaller, less mainstream, and less lucrative.

Given the state of each field, it’s unlikely that we’ll be in a situation where alignment is solved, AGI hasn’t been developed yet, and some adversarial actor is on the verge of getting it first.

Recommendations

If the “caution” frame appears more applicable in the near future, there are several things we should do:

Avoid accelerating progress towards AGI
Avoid stoking fears about bad actors developing AGI (which could spark an AGI arms race)
Strengthen cooperation and coordination between AI companies, governments, and alignment researchers
Try to make progress on alignment

These aren’t exactly deviations from the current strategy, but it’s important to avoid changing course if the logic of the situation stays the same, even if an adversarial scenario starts to appear more likely.

Centerless Set

Discussion about this post