IndirectReciprocity-v0

Category: Reciprocity Environment (TR-4) Agents: 4 Difficulty: Advanced Source: coopetition_gym/envs/reciprocity_envs.py

Overview

IndirectReciprocity-v0 implements a four-agent population with reputation-mediated cooperation and TR-4 reciprocity dynamics. Cooperation with any partner is observed by all members, enabling indirect reciprocity,”I cooperate with you because you cooperated with someone else.”

The environment tests whether agents can learn that reputation matters: cooperation with any single partner builds reputation that encourages cooperation from all others.

MARL Classification

Property	Value
Game Type	4-player Markov Game (general-sum)
Cooperation Structure	Mixed-Motive with reputation externalities
Observability	Full (all actions visible to all agents)
Communication	Implicit (through observed actions)
Agent Symmetry	Symmetric (identical capabilities)
Reward Structure	Integrated utility with multi-partner reciprocity
Action Space	Continuous: $A_i = [0, 100]$
State Dynamics	Deterministic
Horizon	Finite, T = 150 steps
Canonical Comparison	Indirect reciprocity; Nowak & Sigmund (1998, 2005); Panchanathan & Boyd (2004)

Formal Specification

Population Structure

Four symmetric agents with uniform interdependence:

\[\mathbf{D} = \begin{pmatrix} 0 & 0.4 & 0.4 & 0.4 \\ 0.4 & 0 & 0.4 & 0.4 \\ 0.4 & 0.4 & 0 & 0.4 \\ 0.4 & 0.4 & 0.4 & 0 \end{pmatrix}\]

Indirect Reciprocity Mechanism

Indirect reciprocity emerges from Eq 44’s multi-agent summation. Agent $i$’s reciprocity modifier aggregates signals from all partners:

\[U_{\text{recip},i} = \lambda_R \sum_{j \neq i} T_{ij} \cdot (1 + \omega D_{ij}) \cdot \rho_{ij} \cdot \varphi(s_{ij})\]

When agent $j$ cooperates with agent $k$ (not $i$), agent $i$ observes $j$’s high action level. This creates a positive memory average for $j$, generating positive cooperation signals when $i$ evaluates $j$,even without direct interaction history.

Reciprocity Sensitivity

With $\rho_0 = 0.8$, $\eta = 1.0$, and $D_{ij} = 0.4$:

\[\rho_{ij} = 0.8 \cdot 0.4^{1.0} = 0.32 \quad \text{for all pairs}\]

Lower individual sensitivity, but with 3 partners contributing to each agent’s modifier, the aggregate effect is substantial.

TR-4 Parameters

Parameter	Symbol	Value	Rationale
Base reciprocity	$\rho_0$	0.8	Lower per-pair (but 3 pairs sum)
Dependency elasticity	$\eta$	1.0	Linear dependency effect
Response sensitivity	$\kappa$	1.0	Standard bounded response
Memory window	$k$	7	Longer memory for reputation tracking
Reciprocity weight	$\lambda_R$	1.5	Higher weight (reputation matters more)
Dependency amplification	$\omega$	0.5	Moderate dependency boost

Distinction from PlatformEcosystem-v0

Aspect	PlatformEcosystem-v0	IndirectReciprocity-v0
Mechanism	Market competition (TR-1/TR-2)	Reputation-mediated reciprocity (TR-4)
Topology	Hub-spoke (platform + developers)	Fully connected peer network
Agent Roles	Heterogeneous (platform vs. developer)	Homogeneous (all symmetric)
Key Dynamic	Ecosystem health management	Reputation building across population
Cooperation Driver	Structural dependency on platform	Indirect reciprocity via observed actions

Game-Theoretic Background

Indirect Reciprocity Theory

Nowak & Sigmund (1998, 2005) established that cooperation can be sustained in populations through image scoring: 1. Direct reciprocity: “I help you because you helped me” (TFT)

Indirect reciprocity: “I help you because you helped someone” (reputation)

Strategic Implications

Reputation as Public Good:

Each cooperative act is observed by all 3 other agents
Cooperation builds reputation, encouraging future cooperation from all partners
Defection damages reputation with all observers simultaneously

The Scoring Problem:

With $k = 7$ memory window, recent defection is remembered for 7 steps
Longer memory enables more nuanced reputation assessment
Higher $\lambda_R = 1.5$ amplifies reputation effects

Environment Specification

Basic Usage

import coopetition_gym
import numpy as np

# Create environment
env = coopetition_gym.make("IndirectReciprocity-v0")

obs, info = env.reset(seed=42)

# All agents cooperate
for step in range(150): actions = np.array([60.0, 60.0, 60.0, 60.0])
    obs, rewards, terminated, truncated, info = env.step(actions)

    if terminated or truncated: break

print(f"Mean trust: {info['mean_trust']:.3f}")

Parameters

Parameter	Default	Description
`max_steps`	150	Extended horizon for reputation dynamics
`render_mode`	None	Rendering mode

Spaces

Observation Space

Type: Box Dtype: float32

Includes actions, trust matrix (4×4), reputation (4×4), interdependence (4×4), and step info.

Action Space

Type: Box Shape: (4,) Dtype: float32 Range: [0.0, 100.0] for each agent

Metrics and Info

The info dictionary contains:

Key	Type	Description
`step`	int	Current timestep
`mean_trust`	float	Average trust level
`cooperation_signals`	dict	Per-pair $s_{ij}$ values (12 pairs)
`reciprocity_effects`	dict	Per-pair reciprocity contributions
`memory_averages`	dict	Per-pair memory averages
`tr4_memory_window`	int	Memory window $k = 7$

Key Dynamics

Reputation Cascade

Agent $j$ cooperates with agent $k$ at high level
Agents $i$ and $l$ observe $j$’s high action in memory
When evaluating $j$, both $i$ and $l$ compute positive $s_{ij}$ and $s_{lj}$
Both $i$ and $l$ receive positive reciprocity modifier when cooperating with $j$
This encourages all agents to cooperate with $j$,indirect reciprocity

Defection Contagion

A single defection creates negative signals with all 3 partners simultaneously, triggering faster punishment than in 2-agent settings.

Research Applications

IndirectReciprocity-v0 is suitable for studying:

Indirect Reciprocity: Can agents learn reputation-based cooperation?
Image Scoring: Do agents develop consistent cooperation patterns?
Population Dynamics: How does group size affect cooperation stability?
Reputation Cascades: How does one agent’s behavior propagate through the network?

DynamicPartnerSelection-v0: Reputation-based matching (TR-1/TR-2)
ReciprocalDilemma-v0: Direct reciprocity baseline
GraduatedSanction-v0: Larger population with graduated sanctions

References

Pant, V. & Yu, E. (2026). Computational Foundations for Strategic Coopetition: Formalizing Sequential Interaction and Reciprocity. arXiv:2604.01240. Link
Nowak, M. A. & Sigmund, K. (1998). Evolution of Indirect Reciprocity by Image Scoring. Nature.
Nowak, M. A. & Sigmund, K. (2005). Evolution of Indirect Reciprocity. Nature.
Panchanathan, K. & Boyd, R. (2004). Indirect Reciprocity Can Stabilize Cooperation Without the Second-Order Free Rider Problem. Nature.

Coopetition-Gym

Multi-agent reinforcement learning environments for studying mixed-motive coopetitive dynamics. Twenty environments organised into four mechanism classes, with reward-type ablation methodology and four validated case studies.

IndirectReciprocity-v0

Overview

MARL Classification

Formal Specification

Population Structure

Indirect Reciprocity Mechanism

Reciprocity Sensitivity

TR-4 Parameters

Distinction from PlatformEcosystem-v0

Game-Theoretic Background

Indirect Reciprocity Theory

Strategic Implications

Environment Specification

Basic Usage

Parameters

Spaces

Observation Space

Action Space

Metrics and Info

Key Dynamics

Reputation Cascade

Defection Contagion

Research Applications

References

IndirectReciprocity-v0

Overview

MARL Classification

Formal Specification

Population Structure

Indirect Reciprocity Mechanism

Reciprocity Sensitivity

TR-4 Parameters

Distinction from PlatformEcosystem-v0

Game-Theoretic Background

Indirect Reciprocity Theory

Strategic Implications

Environment Specification

Basic Usage

Parameters

Spaces

Observation Space

Action Space

Metrics and Info

Key Dynamics

Reputation Cascade

Defection Contagion

Research Applications

Related Environments

References