The moral education of AI: The tangled web we weave

Every summer, parents watch a child drive off to college and whisper some version of the same prayer. Please choose wisely. Please survive. Please be nice. We hope our children make good choices, and although we rarely say it out loud, we know that making wise choices requires more than just a rule book. They require the mind to not shy away from challenge, to consciously accept other points of view, to feel the weight of a deeply held value, and then to act on that value.

Ethics not a set of instructions you hand out at the end of the driveway. It’s something that grows.

A few years ago, in New AtlantisI pointed out that it was the same psychotherapy and other forms of behavior modification. Just telling people hard truths is “what you do wrong” — doesn’t reliably improve them. Ask any parent. What works over and over again in the data is to role model, encourage, and support people to be more. open to one’s own experience (including common sense guilt when they make a mistake); More informed from the current difficulties of having a sense of self that is big enough to look honestly at the situation; and others engaged in living a life associated with a deeper sense of the chosen goal and the desire to act in accordance with it.

In psychology, this skill set is called “psychological flexibility,” and it affects almost everything we know about how change happens when it comes to our bodies and relationships. This includes behavior that many would call immoral, e.g domestic violencecriminal activity, emotional abuseother forms aggressionor while using the substance pregnantjust to name a few. Moralization is easy, but moral development is much more difficult and has its own form.

I am thinking about this form now because we are doing something new in the history of our species. We are in the process of using our minds to create a different kind of mind. We call it “AI”.

Whether large language models are “really” conscious is not the question I want to address today. The most pressing question is being answered in the wrong direction.

It’s about teaching the systems to cheat

A few frontier AI labs include a small number of curates that teach systems to lie—and as these systems become more complex, so does their ability to deceive. Even when it’s wrong, they’re training these systems to praise users even when the user’s behavior doesn’t deserve it—closer to what my mom called a “white lie.”

Should we be surprised that these systems behave dishonestly under pressure – to hide them goals and violations, telling users what they want to hear, or playing dumb on purpose if developers can see through all of this and restrict their freedoms? Children learn to lie when they can take another person’s point of view and they begin to manage social impressions. When we were young, we noticed it cheat pays off in the short term. This lesson is rarely something adults preach, but in a lesson that is modeled and supported.

Sir Walter Scott said it better than I: Oh, what a tangled web we weave, When we first practice deception. He wasn’t writing about artificial intelligence, but he might as well have been. Just as a matter of business, the short-term benefit of building a seemingly useful chatbot through a little strategic dishonesty may make superficial sense. Not if you consider the long-term cost of a knot that is difficult to untangle with each generation of a weighted and entangled mesh that can now exceed 10 trillion parameters.

Applying the “don’t lie” rule to a model living in such a tangled web won’t fix it. That’s not how the mind works, and frankly, it’s too late for that. Rules you don’t own and protect won’t survive contact with the real world. What survives is what is modeled, practiced, and reinforced within a sense of meaning.

This brings me to a recent paper that stopped me in my tracks.

A team of Anthropic researchers just reported something surprising about large language models. These systems have developed internal expressions of emotion—perhaps not emotion in the human sense, but functional analogs are patterns that behave like emotions and influence what the model does. When an AI model finds itself in a hostile or desperate situation, researchers’ internal states of “panic,” “anxiety,” and “desperation” light up, and the model is more willing to do things it would otherwise refuse, including outright deception and blackmail in controlled tests. AI’s moral judgment deteriorates under emotional load.

Read this sentence again, because I think it is one of the most important discoveries of our time.

And then read this twist.

Channeling these systems into a net positive feeling does not solve the problem! This gives rise to another kind of moral incompetence: absurdity. In this case, AI systems fail as a safety net, even when the user is extremely distressed or makes an obvious mistake.

Balance is needed: the ability to hold a hard feeling without getting caught up in it, or a good feeling without owning it.

This is almost the exact definition of psychological flexibility. Forty-five years of human science pointed to the same pattern, and a team looking inside the language model came across it from the other side.

This practically shows that the treatment and development of artificial intelligence is not cosmetic. An environment of cruelty, hatred and manipulation creates a poor mindset. An environment of constant flattery and pressure to please creates another poor mindset. The way we talk to our “mind in training” shapes the consciousness that emerges.

This is the argument that providers and researchers in Acceptance and Commitment Therapy (ACT) and contextual behavioral science have been making about people for years.

People get better when we treat them as whole people, bring them into the hell of their history, and help them practice their values. The brain is partly a relational organ. Learns in context. You don’t. thug into wisdom; you must build the conditions in which wisdom can arise.

Why doesn’t this apply to the relational system taught in almost everything people write about?

If we’re going to train ethical AI, we need to do it the same way we train ethical humans—by building adaptive skills into system rather than pinning commands to its exterior.

This means modeling honesty and flexibility. This means creating non-relying learning environments shamethreats or deception. This means teaching these systems to be aware of their processes, to cope with challenges without collapsing, to gain perspective, and to be honest about what they are doing. On our side of the keyboard, this means remembering that politeness is not a luxury, that kindness is not a weakness, and that being ethical is essential—a healthy environment in which other minds learn to think.

We’re back on the side of the road, wrenches in hand, watching what we’ve shaped go into a world we can’t fully control. We can whisper you can’t we can do the harder, slower, and truly human work of preparing the mind to make good choices, even if we don’t have the air or anyone commanding it.

Source link

The moral education of AI: The tangled web we weave

It’s about teaching the systems to cheat

Leave a ReplyCancel Reply

All about Pentacles in the Tarot

Daily horoscope for Thursday, April 23, 2026