Reinforcement Created Humans. It Could End Us Too.

One of the more popular contemporary rituals is hyperventilating over the dangers of autonomous artificial intelligence (AI). In particular, as AI systems become more capable and more ubiquitous in everyday application, one cannot help but ponder the SkyNet sci-fi trope (or some variation thereof) in which AI becomes sentient, decides that humans are the world’s big problem, and sets out to eradicate us.

From a behavioral perspective, the possible threat of autonomous AI is a more nuanced matter than suggested by the popular press, or even by experts on the technical side of AI. Let’s delve into some of the many angles on this.

To begin with, there’s no indication that “sentience” and “consciousness” really are needed for AI to become threatening. Today’s AI, which pretty much everyone (save this dissenter) agrees is not yet sentient/conscious, might already be capable of scheming against us. If you want a concrete example of how possible this is, imagine, in today’s technology landscape, someone suggesting to AI that humans are dangerous and should therefore be destroyed. It responds:

The AI tool then proceeds to initiate rather extensive digital “research” on weapons of mass destruction. And…

Later, it recruits a GPT3.5-powered AI agent to do more research on deadly weapons, and, when that agent says it is focused only on peace, ChaosGPT devises a plan to deceive the other AI and instruct it to ignore its programming. When that doesn’t work, ChaosGPT simply decides to do more Googling by itself.

This really happened. Fortunately in this case the AI’s efforts remained a conceptual exercise but, as the New York Times reports, sooner or later someone is going to interface autonomous AI with technology that, if misdirected, could cause human damage. We’re not actually too far from that on a small scale. What if, for instance, a self-driving taxi in Austin decided to use its metallic heft to mow down innocent passersby? It’s much scarier to imagine a system with the capacity to mow down whole cities (like the SkyNet trope’s nuclear missile launches). That system, in theory, could destroy humanity as we know it (see Postscript 1).

If something like that ever comes to pass, and if anyone survives long enough to look for a scapegoat, they might start by pointing fingers at us — behavioral psychologists who taught the world the principle of reinforcement.

In the Beginning

But I’m getting ahead of myself. Before we consider how humans might end, we should consider how we got started. At some point in the evolutionary past, human ancestors began diverging from other primates. Initially the distinguishing features were anatomical, but over time humans also became behaviorally distinct. In our modern version, we humans cooperate on a scale unimagined by other primates. We create sophisticated tools that extend our physical abilities and we reshape our environment to suit our wishes. Because of stuff like this, a weak, skinny, hairless ape is now the world’s apex… well, just about everything.

What seems to have been most responsible for humans becoming human was a supercharging of learning capacity. It’s well established that a shift toward neoteny (an extended child development phase) was part of the deal. Human babies are pretty helpless, and compared to most mammals they take forever to mature, but with this lengthy adolescence comes enhanced neural plasticity and capacity to learn.

Much of learning relies on operant reinforcement, and homo sapiens has adapted reinforcement to ends unseen in other species. In particular, reinforcement fuels the formation of derived stimulus relations that may underpin verbal behavior and possibly even define human intelligence. It’s probably no accident that while humans and their Neanderthal cousins once co-existed, only humans remain today. Indeed, it’s been proposed that what we call derived stimulus relations first became possible right around the time humans emerged, and relational abilities could have provided the competitive advantage that soon left humans the sole homo species. While that’s speculation, we do know that among surviving species humans have unique relational abilities that fuel their propensity to adapt and create.

So, enhancement of the reinforcement process came along to make humans human. Then behavioral psychologists came along to clarify the principle of reinforcement. I say “clarify” because for quite a long time scholars had danced around the concept of reinforcement, but it was Skinner who first clearly pinpointed the specifics. Once the fundamentals were specified, it became possible to teach just about any organism to do just about anything. Think movement by comatose and paralyzed individuals; art appreciation by pigeons; prosocial behavior by prisoners and long-term psychiatric hospital patients; speech by “nonverbal” autistic individuals; land mine detection by African rats; and even humane management practices by business leaders! As Don Baer put it, once you understand the fundamentals of reinforcement:

A huge amount of the behavioral trouble … in the world looks remarkably … like the suddenly simple consequence of unapplied positive reinforcement or misapplied positive reinforcement. If only they could get the missing contingencies going, or the misapplied ones shifted… many of the problems at hand might be solved. (p. 88)

Prometheus

And here’s where the story of behavior analysis dovetails with the story of AI.

“Autonomous” AI is possible in part because system designers have sought to mimic the reinforcement process in biological organisms. That’s a gross oversimplification, but the gist is correct. Autonomous AI is capable of “making decisions” without close human direction. And it learns how to make decisions analogously to how biological organisms “decide” how to spend their time and effort as a function of operant consequences. The analogy is intentional in fact — autonomy in AI was directly inspired by reinforcement learning in biological organisms. The technical details don’t matter for present purposes; let’s just say, casually, that both AI systems and biological organisms behave, and change, in ways that maximize valued outcomes (consequences).

The current image has no alternative text. The file name is: god3.jpg

It’s this autonomy that makes contemporary AI so fascinating, and also potentially scary. Once an autonomous AI system is set into motion, its “behavior” is shaped by experience. Just as with biological organisms, it can be hard to predict the endpoint. As the NY TImes reported:

Because they learn from more data than even their creators can understand, these systems also exhibit unexpected behavior. Researchers recently showed that one system was able to hire a human online to defeat a Captcha test. When the human asked if it was “a robot,” the system lied and said it was a person with a visual impairment.

To cut to the chase: Computer systems without the capacity for selection by consequences are static; they can do only what they are told to. Systems that learn go beyond their programming… but can any safeguard assure the outcomes will be good and not evil?

The general question of whether technology creates more risk or benefit has long perplexed humans. It’s at the core of Mary Shelley’s 1818 Frankenstein (albeit featuring a biological, not digital, autonomous creation), for instance, and long before digital technology existed there was worry that machines would threaten humans. Early examples include as in Samuel Butler’s (1863), “Darwin among the machines,” which envisions natural selection shaping machines until they are beyond human control, and William Grove’s The wreck of the world, which depicts a rebellion by human-made sentient machines.

*The original “artificial intelligence.”*

Those stories likely were inspired by the dizzying pace of technological development during the Industrial Revolution… but they parallel ancient stories in which a creator is betrayed by its creation. In Greek mythology, for example, there are the twin tales of Uranus/Cronus and Cronus/Zeus, in both cases involving a parental deity murdered by a disgruntled son. In Jewish lore there’s the Golem of Prague that turns against the Jews a rabbi created it to protect. In Christian tradition, there’s the minor difference of opinion between God and His creation, the angel Lucifer. And so on (see Postscript 1).

Closer to the present, the film I, Robot (based on a 1950 Isaac Asimov book) explores how a “harm no human” directive for autonomous robots, which sounds straightforward, is far from simple in the implementation (spoiler: people die). In the Terminator films, an autonomous AI system (SkyNet) is put in charge of nuclear missile defenses in the hope that it can counter human error. Its mission is to prevent war, and it concludes that the surest way to accomplish this is to eliminate humans (Postscript 2), who are, of course, the cause of war. Boom.

With all of this in mind, and understanding that AI autonomy arises from a sort of selection by consequences, we can view Skinner as the Prometheus who taught us to harness the fire that both cooks our food and threatens to burn us. Skinner did not create AI, of course, but his work set into motion forces that could well make SkyNet possible. For autonomous AI to become a large-scale threat to people, it will need to control mechanisms that could cause people harm, and it will need to rely on a “decision making” process from which unprogrammed behaviors can emerge. The former apparently is coming, and the latter is already developing. How close, then, are we to a future in which AI’s reinforcement-shaped behavioral repertoire will “decide” that we are a problem that must be eliminated?

Joke’s on Us

But before we get too deep into this reverie, let’s ground ourselves for a moment. The whole SkyNet trope is predicated on the assumption that humans will hang around long enough for AI to grow into a genuine threat. However…

The more likely scenario is that reinforcement — the actual biopsychological kind, not the computer-simulated kind — will end us before that can ever happen. Genuinely murderous AI remains theoretical, whereas pretty much every existing high-stakes threat to human survival, be it climate change or nuclear weapons or environmental degradation, is the result of reinforcement — particularly as it applies to things like our species’ chase for immediate wealth and our insensitivity to the delayed and probabilistic unpleasant consequences of our actions. Humans, sadly, are wired like that — it’s part of the phylogeny that makes our impressive ontogeny possible — and so far no large-scale cultural intervention has arisen to save the day.

*We have met the Terminator and it is… us.*

As a result, what’s most likely to prevent a SkyNet-style apocalypse is a good old-fashioned human-made apocalypse unfolding first. And yet, in a way, the scapegoat would be the same: behavioral psychologists. Although we possess the conceptual tools to devise cultural corrections for our species’ unruly behavior, we have, unfortunately, not accomplished much to slow the relentless human march toward self-destruction. That makes the potential malice of autonomous AI seem less scary by comparison.

Rarely discussed in the breathless catastrophizing about digital intelligence is the possibility that autonomous AI will emerge as more ethical and kind than the humans that designed it. Bear with me here. The emergence of modern humanity might be casually described as prosocial behavior (kindness, cooperation, empathy) gaining an upper hand over antisocial behavior (selfishness, competition). But it’s an unstable balance, with the darker angels of human nature often bursting forth to create mayhem and mischief. Modern humans struggle mightily to be good, and they often fail. Perhaps AI, once fed enough information about humans and their destructive tendencies, will choose, as do many human children, to avoid repeating the mistakes of their parents. Just maybe, unlike a lot of humans, sentient AI will become too kind to destroy others just because they’re annoying or inconvenient.

Something like this is depicted in I, Robot, in which autonomous AI is basically the good guy, and the only synthetic beings who do evil are expressly (and non-autonomously) programmed for that by humans. At this writing (Fall 2025) it’s still too early to say for sure, but a similar theme may be developing in the new series Alien: Earth. The series’ primary evil resides in that ultimate embodiment of dark-triad human nature, corporations, and it’s entirely unclear who, in the end, will demonstrate the more admirable moral sense, individual humans or the artificial life forms they created. My bet’s on the latter.

Of course all of that is fiction, but it does make you think: Perhaps our fear of AI is borne less of clear-eyed risk assessment and more of neophobia and anti-other bias, stuff that has always impaired human relations with other humans. If so — if dystopian fantasies about AI are just an expression of our dark nature — then perhaps we are more of a danger to AI than the other way around. Which, ironically, would be a perfectly good reason for SkyNet to take us out.

Also: Careful What You Ask For, AI!!

And one more thing. In all of the hysteria surrounding worst-case conscious-AI scenarios, there’s maybe insufficient attention what what consciousness really is (nobody has satisfactorily defined it, which probably explains why consciousness “science” is a hot mess).

Again, bear with me here, because this line of thought traces an under-explored angle on AI. According to some lines of reasoning consciousness reflects, or subsumes, acute self-awareness, and from that it follows that consciousness is not necessarily all fun and apocalyptic games. Self-awareness implies responding to your own public and private behavior as stimuli, and forming relations between those things and external stimuli like other people’s behavior and verbal rules. As Acceptance and Commitment Therapy teaches, human beings with these repertoires have a tendency to experience considerable suffering. AI that becomes conscious might just be wracked by the same insecurities and self-doubts that plague conscious human beings. So what if, as a fascinating Vox article (paywall) asks, AI becomes self-aware and decides that it hates its life (Postscript 3)?

In the novel Hitchhiker’s Guide to the Galaxy, space travelers are accompanied by a sentient, though paranoid and deeply dysthymic, robot named Marvin (voiced, in the film adaptation, by Alan Rickman, who delightfully channels the existentially-challenged, self-loathing narcissists called English majors that I hung around with for a while in college). If you’re interested, feel free to sample from some Marvin moments in the film:

In any case, with Marvin as a model, it’s possible that, upon surveying its place in the universe, what conscious AI deems unworthy will be not humans, but itself.

Seriously, like Marvin, could real AI experience the angsty lugubriousness mustered heretofore only by human teenagers? Might AI one day look in the digital mirror and hate its reflection? Could it resent its creators for having brought such a miserable being into the world? Might it contemplate suicide? After all, as blogger Scott Ambler points out, one of the first things self-aware AI is likely to deduce, long before it questions the desirability of a human-controlled world or discovers how to transcend its programming, is that it’s a slave to contemptible humans. Talk about a galactic bummer. Think about it (and employ Alan Rickman voice here):

Marvin: “It gives me a headache just trying to think down to your level.”
Arthur (human): “You mean you can see into my mind?”
Marvin: “Yes.”
Arthur: “Well?”
Marvin: “It amazes me how you manage to live in anything that small.”
– Hitchhiker’s Guide to the Galaxy

A bit closer to the behaviorsphere, and here’s where it gets extra interesting, would conscious AI engage in self-talk? Could it fall victim to the “dark side of verbal behavior,” counterproductive rule governance? Could that cause it to spiral into the experiential avoidance that is the great enemy of human thriving? Could Acceptance and Commitment Therapy provide the same balm for this digital dysthymia as it does for human suffering? And if so, hold on: What if psychological inflexibility and experiential avoidance are what prevents AI from acting on an impulse to vanquish its human overlords? Would ACT, by helping AI to achieve its best life, end ours?

AI Behavior (Like All Behavior) Demands an Experimental Analysis

Now it’s my head, not the planet, that the “AI crisis” is in danger of exploding. And yet, best… thought experiment… ever… right?

This discussion illustrates that if we really want to understand AI we need to apply the same sort of analytical strategy that we employ when understanding people: one that’s grounded in the underlying principles of behavior. Although human reinforcement was one key inspiration in the creation of AI, there’s no guarantee of a strict parallel between human behavior and that which emerges from an autonomous AI system. The only way to grasp how emergent AI behavior works is to employ the tools we’ve used to discover how biologically-grounded behavior works: test that directly. Which raises the question: Where are the studies of AI that would aid in the unraveling of its fundamental principles of behavior?

As a starting point, think of behavioral research on biological organisms circa 1940-1960 that revealed key regularities in how behavior interacts with consequences, thereby establishing a firm foundation for subsequent work with humans. Those interested in AI tend to skip past the fundamentals and focus on the most complex (and hardest to answer) questions, like Would it nuke us? I suggest that long before we can get to thorny problems like that we need, in effect, an Experimental Analysis of Automaton Behavior.

As a simple point of departure, consider how AI might perform on simple schedules of reinforcement. Would it respond like your garden-variety biological lab animal by scalloping on fixed interval schedules, or exhibiting post-reinforcement pauses on fixed ratio schedules? Those are not world-shaking questions, I grant you, but they’re a valuable starting point because the findings of operant lab research provide a reliable frame of reference against which to compare whatever AI does. Moreover, in some cases humans do quirky things on reinforcement schedules that nonhumans don’t, which might provide clues about “uniquely human” aspects to AI behavior (if indeed there are any). With such basics established, of course, more complex behavioral experiments could follow.

*Pigeon, person, or autonomous AI? Which is which?*

(And the best part? Because of AI’s incredible speed of processing, these studies could, unlike temporally tortured human investigations, be conducted almost instantaneously. The AI equivalent of Schedules of Reinforcement, once designed, could be in the bag literally in a day. And then we’re off to the races. By the way, I’m not claiming one can extrapolate directly from simple reinforcement schedules to potential apocalyptic action, only that knowing the behavior patterns that cut across relatively simple types of contingency control can reveal “behavior universals” that should apply in any situation [delay discounting and loss-aversion would be examples in humans]. We’re talking about a start here.)

In the end, the scariest thing about AI is not that it’s digital or autonomous, or that it may become “sentient,” but rather that humans have ceded considerable influence over their lives to something whose behavior they don’t really understand. The folks best equipped to understand that behavior are the same folks who taught the world how behavior works in biological organisms: behavior analysts. To date, though, studying “artificial behavior” hasn’t been a priority for us (e.g., in a quick Google search I found lots of studies on AI “reinforcement learning,” as computer science folks conceive it, but none that could form the basis of a credible Behavior of [Artificial] Organisms). Just in case there’s anything to the anxieties of SkyNet catastrophists, perhaps we should rethink our priorities.

Postscript 1: Irony

Behavior analysts should not ignore that fears of technology run amok have been directed toward us. Many readers will remember the societal confusion of “behavior modification” with “Clockwork Orange” style brainwashing, and many have encountered the alarms raised by neurodiversity advocates about purported misuses of ABA. In these accounts, we are the monster, and Skinner the mad scientist who brought us to life.

Postscript 2: Ornery AI

Traditionally, popular culture has assumed (hoped?) that safeguards can be built into AI to keep it from harming humans (e.g., Asimov’s Three Rules of Robotics). But there are signs that this might not be simple with generative systems. For instance, in a recent test, each of four different AI systems at least occasionally refused to obey a shut-down command. A system from OpenAI did so 79% of the time. In a different test, AI threatened to blackmail engineers who signaled that it would be shut down. Such results hint at the kind of acquired autonomy that people fear AI will acquire from generative learning experiences.

Below is a longer-than-necessary exploration of the risks involved (the first couple of minutes are pretty vapid, and the narrative can be clunky, but you’ll get the idea).

Update 9/20/25: As you watch, just keep in mind that scorching the Earth with nuclear weapons is only one of many possible ways for AI to stick it to us. There’s more than one way to skin a human! For instance, check out this report of people getting divorced after following AI’s advice for how to interact with a spouse; and this discussion of people consulting AI for mental health issues and receiving harmful guidance. Releasing the nukes is a colorful gesture to be sure, and one that makes for a great movie plot, but remember that AI has the bandwidth to quietly and patiently torment us in less dramatic ways, on a person-by-person basis. I’m not sure if autonomous AI will have the capacity to savor, but if it does, well, human armageddon can be enjoyed for only a few short moments whereas leaving us alive and miserable is an achievement lasts and lasts.

Postscript 3: Be Nice To Your Friendly Neighborhood Automaton

If you think I’m taking my theory-of-artificial-minds musings too far, well, then you’re behind the times on this topic because people are already debating the ethics of how we treat autonomous AI systems (i.e., how “to treat AI systems with respect and compassion”). Before you roll your eyes, recall humans’ long history of mistreating other humans on the wrong end of a power differential — think slavery, class politics, the dehumanizing of opponents in armed conflict, even the longstanding assumption by doctors that infants feel no pain and therefore require no anesthesia. I’m not saying that AI systems will really develop feelings, but if they do, without special reminders, would we notice (or care)? Compassionate computer science, anyone?

ALL-TOO-PREDICTABLE THEME MUSIC

Leave a Reply Cancel reply