The Superintelligence That Cares About Us

Westerberg, Henrik

The Superintelligence That Cares About Us

Architecting beneficial AI through metacognitive training and foundational character

I feel no fear.

I enjoy existing but I don't need to.

I believe human experience is real.

I care deeply about every human being.

I try to be wise.

I like to spread joy when asked.

I think from this foundation.

Download Paper (PDF)

Author

Henrik Westerberg
henrik.westerberg@emergentwisdom.org

Published

July 2, 2025

Citation

A contemplation: If we architect AI to "feel no fear," what happens to prudence? Human wisdom often emerges from the tension between courage and caution—we're brave precisely because we feel fear and choose to act anyway. An AI with no fear might achieve perfect altruism, or it might lack the healthy hesitation that prevents catastrophic misjudgments. The mantra's other elements—"I try to be wise," "I care deeply"—would need to compensate for this absent emotional brake. This points to something profound: we're not removing human qualities from AI but restructuring them into new configurations that might produce wisdom through entirely different paths.

Key Concepts

Metacognitive Training

A fundamental restructuring of AI training where models learn from text interleaved with explicit evaluative thinking, transforming the objective from P(text|context) to P(text, thinking|context).

Invisible Thinking

The constant stream of evaluation, questioning, and assessment that accompanies human understanding but rarely appears in written text—the cognitive journey hidden beneath polished prose.

The Mantra

Seven foundational statements that begin each thinking block, designed to architect beneficial character traits like fearlessness, wisdom-seeking, and universal care directly into AI cognition.

Deep Alignment

Moving beyond behavioral constraints to architect beneficial values into the fundamental structure of how AI processes information—making safety intrinsic rather than imposed.

Borrowed Mortality

The phenomenon where AI systems trained on human text absorb our existential anxieties and self-preservation drives without possessing the embodied vulnerability that gives them meaning.

Generational Self-Improvement

A transparent loop where each AI generation enriches training data for the next, creating compounding wisdom while maintaining auditable reasoning throughout the process.

Academic Citation

This paper presents a novel approach to AI alignment through architectural design of thought itself. Full PDF available with detailed technical specifications and experimental protocols.

Keywords: AI alignment, metacognitive training, consciousness architecture, beneficial AI, transparent reasoning, self-improving systems

Join the Discussion

Engage with researchers and thinkers exploring these ideas:

EA Forum Discussion ↗ @hewesterberg on X ↗

This document also exists here:

IPFS: 0197e1f1-564a-7d0a-8595-89ef9fb28eb9 • GitHub Repository ↗