
Professor Russell’s book starts out with an entertaining journey through the history of AI and automation, as well as cautionary thinking about them. This discussion is well informed – he is a renown AI academic and co-author of a comprehensive and widely used AI textbook.
Having provided historical background, the remainder of the book argues two main points: (1) the current approach to AI development is having dangerous side-effects, and it could get much worse; and (2) what we need to do is build AIs that can learn to satisfy human preferences.
Concerning the dangers of AI, the author first addresses current perils: misuse of surveillance, persuasion, and control; lethal autonomous weapons; eliminating work as we know it; and usurping other human roles. I found this part of the book an informative and well-reasoned analysis.
Beyond AI’s current perils, the author next addresses the possibility of AIs acquiring superhuman intelligence and eventually ruling and perhaps exterminating humankind. The author believes this is a definite possibility, placing him in basic agreement with works such as Bostrom’s Superintelligence and Tegmark’s Life 3.0. AI’s existential threat is the subject of continuing debate in the AI community, and Russell attempts to refute the arguments made against his position.
Russell bases his case for AI’s existential threat on two basic premises. The first is that in spite of all the scientific breakthroughs required to initiate superintelligence (well documented by Russell), you cannot rule out humans achieving these breakthroughs. While I appreciate this respect for science and engineering, clearly some human achievements are more within reach than others. Humans understanding human intelligence, let alone creating human-level machine intelligence, seems to me too distant to speculate about except in science fiction.
Russell’s second premise is that unless we change course, superintelligence will be achieved using what he calls the standard model, which creates AIs by optimizing them to meet explicit objectives. This would pose a threat to humanity, because a powerful intellect pursuing explicitly defined objectives can easily spell trouble, for example if an AI decides to fix global warming by killing all the people.
I don’t follow this reasoning. I find it contradictory that an AI would somehow be both super intelligent and bound by fixed concrete objectives. In fact in the last part of the book, Russell goes to great pains to illustrate how human behavior, and presumably human-level intelligence, is far more complicated than sequences of explicit objectives.
In the last part of the book Russell advocates developing provably beneficial AI, a new approach that would build AIs that learn to satisfy human preferences instead of optimizing explicit objectives. While I can see how this would be an improvement over homicidal overlords, I don’t think Russell makes the case that this approach would be even remotely feasible.
To point out how we might grapple with provably beneficial AI he spends a good deal of time reviewing mathematical frameworks that address human behavior, such as utility theory and game theory, giving very elementary examples of their application. I believe these examples are intended to make this math accessible to a general audience, which I applaud. However what they mainly illustrate is how much more complicated real life is, compared to these trivial examples. Perhaps this is another illustration of Russell’s faith that human ingenuity can reach almost any goal, as long as it knows where to start. Like scaling up a two-person game to billions of interacting people.
I was very pleased to read Russell’s perspective on the future of AI. He is immersed in the game, and he is definitely worth listening to. However, I have real difficulty following his extrapolations from where we are today to either superintelligence or provably beneficial AI.