Part 2.5: AI reinforcement learning vs human governance

Twiter Facebook LinkedIn

Published on 17 October 2024 | Updated on 27 February 2025

Author: Anita Lamprecht

Subscribe to Diplo's Blog

This post is part of the AI Apprenticeship series:

By Dr Anita Lamprecht, supported by DiploAI and Gemini

Inspired by the AI Apprenticeship online course, I wanted to write a post on the relationship between AI reinforcement learning and human governance. By analysing their similarities and differences, we can better understand the potential impacts of AI.

Reinforcement learning in hide-and-seek

Reinforcement learning (RL) is a subset of machine learning where agents learn to optimise their behaviour through trial and error. A fascinating application of RL is seen in the game of hide-and-seek, where agents learn complex strategies to either hide or seek effectively. This game, often used as a benchmark in AI research, demonstrates how agents develop complex strategies and adapt to dynamic environments, mirroring certain aspects of human learning and decision-making processes.

In a study conducted by OpenAI, agents were placed in a virtual environment with movable objects, and were tasked with playing hide and seek. The agents used RL to develop strategies over millions of iterations. The hiders learned to block entrances with objects to create safe zones, while the seekers learned to use ramps to overcome obstacles. This emergent behaviour demonstrates the power of RL in discovering complex strategies from simple rules.

Human governance and behavioural regulation

Human governance involves establishing rules, norms, and institutions to regulate behaviour within societies. Unlike RL, which relies on computational algorithms to optimise behaviour, human governance is a complex interplay of cultural, legal, and ethical considerations. Different societies may adopt varying governance models, from democratic systems emphasising citizen participation to more centralised structures, each with its own power dynamics and cultural values.

Governance structures are designed to maintain order, protect rights, and promote welfare, often requiring consensus and compliance from the governed population. Governance systems are typically evaluated based on their effectiveness in achieving societal goals, such as justice, security, and economic prosperity. These systems rely on a combination of incentives and deterrents, similar to the reward and penalty system in RL, to influence behaviour. However, human governance also involves negotiation, persuasion, and the balancing of competing interests, which adds layers of complexity not present in RL environments.

Comparison of learning and adaptation

Both RL in hide-and-seek and human governance involve learning and adaptation, but they differ significantly in their mechanisms and outcomes. In RL, learning is driven by a clear, quantifiable reward function, and adaptation occurs through trial and error over numerous iterations. The agents in hide-and-seek adapt by exploring different strategies and retaining those that maximise their reinforcing feedback.

In contrast, human governance involves learning through historical experience, cultural evolution, and institutional development. Adaptation in human governance is often slower and more deliberate, as it requires changes in laws, policies, and social norms. The feedback mechanisms in governance are less direct and quantifiable than in RL, often involving complex social dynamics and political processes. These feedback loops can range from election results and public opinion polls to social movements and protests, shaping policies, and ensuring responsiveness to the needs of the population.

Emergent strategies and unintended consequences

One of the fascinating aspects of RL in hide-and-seek is the emergence of strategies that were not explicitly programmed. This emergent behaviour results from the agents’ interactions with their environment and each other, leading to innovative solutions to the challenges they face. For example, the hiders’ use of objects to block entrances was an emergent strategy that evolved from the basic rules of the game.

Similarly, human governance can lead to emergent behaviours and unintended consequences. Policies designed to achieve specific goals can have ripple effects throughout society, leading to outcomes that were not anticipated. For instance, raising the retirement age, while intended to address economic concerns related to an ageing population, might disrupt traditional family structures and caregiving arrangements. This shift could strain families and further increase existing inequalities, especially if the system lacks adequate and affordable childcare options or support services for elderly dependents. The complexity of human societies means that governance must be adaptive and responsive to these emergent challenges.

The diagram illustrates the interconnectedness of an AI system and how it learns through trial and error. Similarly, human governance involves a complex web of interactions, where policies can have unintended ripple effects across societies. — The diagram illustrates the interconnectedness of an AI system (left) and how it learns through trial and error (right). Similarly, human governance involves a complex web of interactions, where policies can have unintended ripple effects across societies (Cornell University).

Innovation and the spectrum of hallucinations

The line between visionary thinking and hallucinations can be blurry for both humans and AI systems. For humans, this spectrum ranges from visionary eureka moments to distorted, delusional perceptions of reality, as well as creative explorations of fantasy. ‘Foresight’ is also on the spectrum, bridging visionary thinking and the potential for hallucinations. It is a method used to explore future scenarios, and is currently a highly sought-after skill. Foresight is one of the five essential skills mentioned in the UN 2.0’s Quintet of Change, particularly because the needed innovations require exploring new ideas and going beyond the conventional.

AI agents operate differently from humans, particularly as they do not have inherent natural boundaries, such as common sense, cognitive limits, ethical considerations, or physical and biological constraints. This lack of human boundaries allows the system to deliver unexpected perspectives and results, so novel that their consequences might be unpredictable for us. We interpret the output as creative or even revolutionary if they sound good and promising, but label them as ‘hallucinations’ if they sound or prove to be wrong.

This potential for AI ‘hallucinations’ highlights the need for responsible AI use in governance, ensuring that AI-generated ideas, like human ideas, are evaluated and validated against human values to avoid unintended consequences. But is it truly accurate to speak of ‘hallucinations’, or are we once again falling for the trap of anthropomorphism by attributing human processes to AI, instead of staying with the more neutral metaphor of hide-and-seek?

Part of the game: Randomness and bias

It is fun to watch the AI agents play hide-and-seek. When walls and boundaries are removed, some agents simply run off into the infinite world of data. Will they ever return with any results? To prevent agents from getting lost in the vastness of information, the system introduces a trade-off: the probability for randomness. It is a core principle in RL and a challenge when designing AI systems: finding a balance between randomness (usually referred to as ‘hallucinations’) and directed exploration to produce usable results.

Regarding usability, both RL and human governance raise ethical considerations. In RL, bias in training data can lead to AI systems perpetuating societal biases. However, this bias can also be used positively to surface hidden societal biases, allowing for analysis and improvement. Indeed, it is often through exploring randomness that we uncover such biases, revealing hidden patterns and challenging our assumptions.

While for humans, bias is often ingrained and non-random, stemming from complex personal and societal factors, for AI, bias is a more direct reflection of the data it is trained on. This could make it easier to identify and address bias. In human governance, ensuring fairness and accountability are ongoing challenges. As AI plays an increasingly prominent role, it is essential to address these ethical implications proactively, ensuring that AI is used in a manner that aligns with human values and promotes well-being.

This leaves us with the question of how to use AI for human governance.

The AI Apprenticeship online course is part of the Diplo AI Campus programme.

Events Blogs Resources

AI and diplomacy – Workshop at ITU

16 Jun 25 - 16 Jun 25Geneva, Switzerland

Introducing the WSIS+20 for the Asia Pacific Internet Community

03 Jun 25 - 03 Jun 25Online

Diplo/GIP at IGF 2025

23 Jun 25 - 27 Jun 25Lillestrøm, Norway

Tech attache briefing: UN80 Initiative, AI, and digital governance

28 May 25 - 28 May 25Geneva - In Situ

Expert Workshop on the Rule of Law and Human Rights Aspects of Using Artificial Intelligence for Counter-Terrorism Purposes

08 May 25 - Geneve Centre for Security Policy

Swiss Plateforme Tripartite: Meeting on WSIS+20

06 May 25 - 06 May 25

WSIS+20 review: What’s in it for Africa?

07 May 25 - 07 May 25Geneva

Trump and tech: After 100 days

30 Apr 25 - 30 Apr 25Online

AI Apprenticeship for International Organisations blended course

29 Apr 25 - 29 Apr 25Geneva and online

GITEX Africa 2025

14 Apr 25 - 16 Apr 25

Demystifying AI: How to prepare international organisations for AI transformation?

29 Apr 25 - 29 Apr 25Geneva

Tech attache briefing: WSIS+20 and AI governance negotiations – Updates and next steps

16 Apr 25 - 16 Apr 25Geneva - In Situ

Why military AI needs urgent regulation

As military AI becomes operational in today’s conflicts, the lack of regulation and accountability risks turning warfare into a domain governed by opaque algorithms and unchecked escalation. Without[...]

Julia Williams

09 Jul, 2025

AI Apprenticeship for IOs · From diplomats to AI builders

The AI Apprenticeship for International Organisations, developed by DiploFoundation, empowers professionals from entities like the UN, WHO, and CERN to create AI tools that enhance global cooperation.[...]

Anita Lamprecht

06 Jul, 2025

AI and Magical Realism: When technology blurs the line between wonder and reality

The challenges of governing artificial intelligence often feel like something out of a Gabriel García Márquez novel, where the extraordinary blends seamlessly with the everyday, and the line between[...]

Jovan Kurbalija

27 Jun, 2025

AI in Sophie’s world: How a philosophy book can help us govern AI

As we convene in Oslo for the Internet Governance Forum, we reflect on the philosophical insights from Jostein Gaarder's "Sophie’s World." The novel's exploration of identity and constructed reality[...]

Jovan Kurbalija

21 Jun, 2025

Advancing Swiss AI Trinity: Zurich’s entrepreneurship, Geneva’s governance, and communal subsidiarity

Switzerland can inspire global AI transformation by leveraging its unique strengths: Zurich’s entrepreneurial spirit, Geneva’s governance expertise, and a focus on communal subsidiarity. This "AI [...]

Jovan Kurbalija

15 Jun, 2025

EU Digital Diplomacy: Geopolitical shift from focus on values to economic security

The EU's International Digital Strategy 2025 shifts focus from a values-centric approach to prioritizing geopolitical and economic security. While it retains a commitment to human rights, the new stra[...]

Jovan Kurbalija

10 Jun, 2025

Empowering communities through bottom-up AI: The example of ThutoHealth

In Botswana, a silent epidemic claims nearly half of all lives. Hypertension, diabetes, cancer, and other non-communicable diseases (NCDs) are responsible for 46% of deaths nationwide—a staggering s[...]

DiploFoundation

26 May, 2025

What can we learn from 160 years of tech diplomacy at ITU?

On May 17, 1865, the International Telecommunication Union (ITU) was founded by 20 European states to streamline telegraph messaging across borders, highlighting the need for multilateral cooperation [...]

Jovan Kurbalija

17 May, 2025

Part 1: An introduction to digital twins

When Spain & Portugal went dark, it wasn't just lights that failed. It was a reminder: technology isn't just a tool – it's the system we live in.[...]

Anita Lamprecht

14 May, 2025

Part 7: ‘Converging realities: Embedding governance through digital twins’

The metaverse is no longer a question of ‘what if’ – it’s already being built. Digital twins, embedded governance, and the collapse of the digital–physical divide mark the next frontier.[...]

Anita Lamprecht

05 May, 2025

Tech continuity in President Trump’s first 100 days

During President Trump’s first 100 days, technology policy exhibited continuity rather than disruption, with a focus on AI and digital regulation characterized by incremental adjustments. Only 9 of [...]

Jovan Kurbalija

27 Apr, 2025

From geopolitics to classrooms: The hopeful side of the US-China AI race

The competition between the US and China in AI education is emerging as a vital battleground amidst geopolitical tensions. Both nations are prioritizing AI education to prepare future generations for [...]

Jovan Kurbalija

27 Apr, 2025

2025

The latest from Diplo and GIP

Tailor your subscription to your interests, from updates on the dynamic world of digital diplomacy to the latest trends in AI.

Subscribe to more Diplo and Geneva Internet Platform newsletters!

Subscribe now

Trending in Diplo Academy

Trending in Resources

Trending in Topics

Courses & Programmes

Faculty & Alumni

Publications

Research

Trending in Blogs

Diplo Events

DigWatch Events

Trending Projects

Contact us

Social icons

Part 2.5: AI reinforcement learning vs human governance

Contents

See also

Subscribe to Diplo's Blog

Reinforcement learning in hide-and-seek

Human governance and behavioural regulation

Comparison of learning and adaptation

Emergent strategies and unintended consequences

Innovation and the spectrum of hallucinations

Part of the game: Randomness and bias

The latest from Diplo and GIP

Diplo: Effective and inclusive diplomacy

Diplo on Social