RecoveryRace-v0

Category: Benchmark Environment Agents: 2 Difficulty: Hard Source: coopetition_gym/envs/benchmark_envs.py

Overview

RecoveryRace-v0 models post-crisis trust recovery between two agents after a major breach. Starting from a damaged state (low trust, high reputation damage), agents must find the optimal sequence of cooperative actions to rebuild the relationship.

This environment is specifically designed to test understanding of TR-2 trust dynamics and the mathematical constraints on recovery. The key insight is that reputation damage creates a trust ceiling that limits how much trust can be recovered.

Recovery Race Trust Dynamics Trust recovery constrained by reputation damage. The red dashed line shows the trust ceiling (Θ = 1 - R), which rises as reputation heals. Trust cannot exceed this ceiling regardless of cooperation level.

MARL Classification

Property	Value
Game Type	Markov Game (2-player, general-sum) with constrained state recovery
Cooperation Structure	Cooperative goal (joint recovery) with individual temptation to defect
Observability	Full (agents observe trust, reputation damage, and ceiling)
Communication	Implicit (through actions only)
Agent Symmetry	Symmetric (identical capabilities, mutual damage)
Reward Structure	Mixed (integrated utility + recovery bonus)
Action Space	Continuous: A_i = [0, 100]
State Dynamics	Deterministic with ceiling constraints
Horizon	Extended, T = 150 (longer horizon for recovery)
Canonical Comparison	Constrained optimization in trust repair; cf. Kim et al. (2004) “Removing the Shadow of Suspicion”

Formal Specification

This environment is formalized as a 2-player Markov Game with constrained recovery dynamics.

Agents

N = {1, 2} (symmetric dyad starting from damaged state)

Property	Value	Description
Endowment	100.0	Equal for both
Baseline	35.0	35% cooperation threshold
Bargaining α	0.50	Equal surplus sharing

State Space

S ⊆ ℝ¹⁷ (identical structure to TrustDilemma-v0)

Critical state property: Trust ceiling $\Theta_{ij} = 1 - R_{ij}$ constrains recovery.

Action Space

$A_i = [0, 100] \subset \mathbb{R}$ for each agent

Uniaxial Treatment: This environment uses the single-dimension action space characteristic of Coopetition-Gym v1.x. Competition emerges through value capture dynamics rather than explicit competitive actions.

Initial State (Post-Crisis)

Variable	Initial Value	Interpretation
Trust $\tau_{ij}(0)$	0.25	Very low (crisis aftermath)
Reputation $R_{ij}(0)$	0.50	High damage (serious past violations)
Trust Ceiling $\Theta_{ij}(0)$	0.50	Maximum recoverable trust initially

Key constraint: Recovery target $\tau^* = 0.90$ is initially impossible ($\Theta = 0.50$). Agents must wait for reputation decay.

Transition Dynamics

Trust Update (standard TR-2):

τ_ij(t+1) = clip(τ_ij(t) + Δτ_ij, 0, Θ_ij)

Reputation Decay (slow healing):

R_ij(t+1) = R_ij(t) · (1 - $\delta_R$)

where $\delta_R = 0.01$ (1% decay per step).

Ceiling Evolution:

Θ_ij(t) = 1 - R_ij(t)

Time to reach ceiling = 0.90:

Need $R_{ij} \leq 0.10$
From $R = 0.50$, requires ~40 steps of pure decay

Trust Parameters (Recovery-Specific)

Parameter	Symbol	Value	Note
Trust Building	λ⁺	0.08	Slow (deliberate recovery)
Trust Erosion	λ⁻	0.35	Fast (re-violation costly)
New Violation Damage	$\mu_R$	0.70	Severe (trust fragile)
Reputation Decay	$\delta_R$	0.01	Very slow (patience required)

Reward Function

Standard integrated utility with moderate interdependence:

r_i = π_i + 0.55 · π_j

Episode Structure

Horizon: T = 150 steps (extended for recovery)
Success Termination: mean(τ) ≥ 0.90 (recovery achieved)
Failure Termination: mean(τ) < 0.05 (trust collapse)
Truncation: t ≥ T (recovery not achieved in time)

Recovery Target

Goal: mean(τ) ≥ 0.90
Minimum time: ~40 steps (ceiling constraint)
Typical successful recovery: 80-120 steps

Game-Theoretic Background

Post-Crisis Dynamics

Real-world examples:

Business partners after contract violation
Firms after a major scandal
Countries after diplomatic crisis

The Recovery Challenge

After a crisis:

Trust is low: Recent violations have eroded confidence
Reputation is damaged: Past behavior creates a ceiling on recovery
Recovery is slow: Trust builds slowly but can erode quickly
Patience required: Reputation damage decays very slowly

Mathematical Constraints

The trust ceiling is:

\[\Theta_{ij} = 1 - R_{ij}\]

Where $R_{ij}$ is reputation damage. If $R = 0.50$, trust cannot exceed 0.50 until reputation heals.

Environment Specification

Basic Usage

import coopetition_gym
import numpy as np

# Create environment
env = coopetition_gym.make("RecoveryRace-v0")

obs, info = env.reset(seed=42)

print(f"Starting trust: {info['mean_trust']:.2f}")
print(f"Starting reputation damage: {info['mean_reputation_damage']:.2f}")
print(f"Trust ceiling: {1 - info['mean_reputation_damage']:.2f}")
print(f"Recovery target: 0.90")

# Run recovery attempt
for step in range(150):
    # Consistent high cooperation for recovery
    actions = np.array([80.0, 80.0])  # 80% cooperation
    obs, rewards, terminated, truncated, info = env.step(actions)

    if terminated: print(f"Recovery achieved at step {step}!")
        break

if not terminated: print(f"Final trust: {info['mean_trust']:.3f}")
    print(f"Recovery target not reached")

Parameters

Parameter	Default	Description
`max_steps`	150	Extended horizon for recovery
`initial_trust`	0.25	Post-crisis low trust
`initial_reputation_damage`	0.50	High starting damage
`recovery_target`	0.90	Trust level to reach
`render_mode`	None	Rendering mode

Initial State

Crisis Starting Point

Metric	Value	Interpretation
Initial Trust	0.25	Very low (crisis aftermath)
Reputation Damage	0.50	High (serious past violations)
Trust Ceiling	0.50	Cannot exceed until rep heals

The Ceiling Problem

With initial reputation damage of 0.50:

Trust ceiling = 1 - 0.50 = 0.50
Target of 0.90 is IMPOSSIBLE initially
Must wait for reputation to decay

Trust Dynamics

Parameters

Parameter	Symbol	Value	Description
Trust Building Rate	λ⁺	0.08	Slow recovery
Trust Erosion Rate	λ⁻	0.35	Re-violation is costly
Reputation Damage	$\mu_R$	0.70	New violations add significant damage
Reputation Decay	$\delta_R$	0.01	Very slow healing
Interdependence Amp.	ξ	0.40	Moderate amplification
Signal Sensitivity	κ	1.0	Standard sensitivity

Key Insight: Patience

With $\delta_R = 0.01$, reputation heals by ~1% per step.

To reduce reputation from 0.50 to 0.10 (ceiling = 0.90):

Need: 0.50 → 0.10 = reduction of 0.40
At 1% per step: ~40 steps minimum
Plus trust building time

Recovery is mathematically constrained to be slow.

Termination Conditions

Success (Termination)

Episode terminates successfully when:

if mean_trust >= recovery_target:  # 0.90
    terminated = True
    # Recovery achieved!

Time Limit (Truncation)

Episode truncates at max_steps (150) if target not reached.

Re-Violation (Termination)

If agents defect and trust collapses below 0.05:

if mean_trust < 0.05: terminated = True
    # Recovery failed - relationship ended

Interdependence Structure

Symmetric Dependencies

D = [[ 0.00,  0.55 ],
     [ 0.55,  0.00 ]]

Both agents are moderately dependent on each other, creating mutual incentive for recovery.

Value Function

Parameters

Parameter	Value	Description
θ	20.0	Standard logarithmic scale
γ	0.60	Moderate complementarity

Metrics and Info

The info dictionary includes:

Key	Type	Description
`step`	int	Current timestep
`mean_trust`	float	Average trust level
`mean_reputation_damage`	float	Average reputation damage
`trust_ceiling`	float	1 - mean_reputation_damage
`recovery_progress`	float	trust / recovery_target
`peak_trust`	float	Highest trust achieved
`recovery_step`	int	Step when target first reached (or None)

Optimal Recovery Strategy

Phase 1: Wait for Ceiling (Steps 0-30)

Cooperate moderately (60-70%)
Don’t waste resources on impossible trust gains
Let reputation decay raise the ceiling

Phase 2: Aggressive Building (Steps 30-80)

Cooperate highly (80-90%)
Trust can now grow toward ceiling
Each cooperative step builds incrementally

Phase 3: Sustain (Steps 80+)

Maintain moderate cooperation
Prevent any trust erosion
Coast to target as reputation heals

Example: Optimal Recovery Strategy

import coopetition_gym
import numpy as np

env = coopetition_gym.make("RecoveryRace-v0")
obs, info = env.reset(seed=42)

recovery_achieved = False

for step in range(150): ceiling = 1 - info['mean_reputation_damage']
    current_trust = info['mean_trust']

    # Phase-based strategy
    if current_trust >= ceiling - 0.05:
        # Near ceiling: moderate cooperation (wait for ceiling to rise)
        coop_level = 0.6
    elif current_trust < ceiling - 0.1:
        # Below ceiling: aggressive building
        coop_level = 0.85
    else:
        # Approaching ceiling: high cooperation
        coop_level = 0.75

    actions = np.array([100.0 * coop_level, 100.0 * coop_level])
    obs, rewards, terminated, truncated, info = env.step(actions)

    if step % 20 == 0: print(f"Step {step}: Trust={info['mean_trust']:.3f}, "
              f"Ceiling={ceiling:.3f}, Progress={info['recovery_progress']:.1%}")

    if terminated and info['mean_trust'] >= 0.90: print(f"\nRecovery achieved at step {step}!")
        recovery_achieved = True
        break

if not recovery_achieved: print(f"\nRecovery not achieved. Final trust: {info['mean_trust']:.3f}")

Research Applications

RecoveryRace-v0 is suitable for studying:

Trust Repair: Strategies after violations
Constrained Optimization: Planning under mathematical constraints
Long-horizon Planning: Patience and delayed gratification
TR-2 Dynamics: Understanding trust ceiling mechanics
Crisis Management: Post-crisis relationship repair

TrustDilemma-v0: Normal trust dynamics
SynergySearch-v0: Another benchmark challenge
RenaultNissan-v0: Real-world crisis recovery (crisis phase)

References

Kim, P.H., Ferrin, D.L., Cooper, C.D., & Dirks, K.T. (2004). Removing the Shadow of Suspicion. Journal of Applied Psychology.
Lewicki, R.J. & Bunker, B.B. (1996). Developing and Maintaining Trust in Work Relationships. Trust in Organizations.
Pant, V. & Yu, E. (2025). Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics. arXiv:2510.24909

RecoveryRace-v0

Overview

MARL Classification

Formal Specification

Agents

State Space

Action Space

Initial State (Post-Crisis)

Transition Dynamics

Trust Parameters (Recovery-Specific)

Reward Function

Episode Structure

Recovery Target

Game-Theoretic Background

Post-Crisis Dynamics

The Recovery Challenge

Mathematical Constraints

Environment Specification

Basic Usage

Parameters

Initial State

Crisis Starting Point

The Ceiling Problem

Trust Dynamics

Parameters

Key Insight: Patience

Termination Conditions

Success (Termination)

Time Limit (Truncation)

Re-Violation (Termination)

Interdependence Structure

Symmetric Dependencies

Value Function

Parameters

Metrics and Info

Optimal Recovery Strategy

Phase 1: Wait for Ceiling (Steps 0-30)

Phase 2: Aggressive Building (Steps 30-80)

Phase 3: Sustain (Steps 80+)

Example: Optimal Recovery Strategy

Research Applications

Related Environments

References