Coopetition-Gym Documentation

Multi-Agent Reinforcement Learning Environments for Strategic Coopetition

Compatibility and Requirements

Framework Compatibility

Framework	Version	Status	Notes
Python	3.9, 3.10, 3.11	Tested	3.9+ required
Gymnasium	0.29+	Compatible	Farama Foundation standard
PettingZoo	1.24+	Compatible	Parallel and AEC APIs
NumPy	1.21+	Required	Core dependency
SciPy	1.7+	Required	Mathematical functions

MARL Framework Integration

Framework	Integration	Notes
Stable-Baselines3	Direct	Use Gymnasium API with VecEnv
RLlib	Direct	Use PettingZoo API with MultiAgentEnv
TorchRL	Compatible	Use Gymnasium API
CleanRL	Compatible	Single-file implementations

Verification

import coopetition_gym
import gymnasium
import pettingzoo

# Verify installation
print(f"Coopetition-Gym environments: {len(coopetition_gym.list_environments())}")
print(f"Gymnasium version: {gymnasium.__version__}")
print(f"PettingZoo version: {pettingzoo.__version__}")

# Quick environment test
env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)
print(f"Observation shape: {obs.shape}")
print(f"Action space: {env.action_space}")

Overview

Coopetition-Gym is a Python research library providing multi-agent reinforcement learning environments for studying coopetitive dynamics,scenarios where agents must simultaneously cooperate and compete. The library implements mathematical frameworks from published research:

Key Features

20 Specialized Environments spanning dyadic relationships to multi-agent ecosystems
Validated Case Studies based on real business partnerships (Samsung-Sony, Renault-Nissan, Apache, Apple App Store)
Trust Dynamics with asymmetric updating and reputation hysteresis
Multiple APIs: Gymnasium (single-agent), PettingZoo Parallel, and PettingZoo AEC
Configurable Parameters for research flexibility

Modeling Approach

Coopetition-Gym v1.x implements the uniaxial treatment of coopetition, modeling strategic choice along the cooperation-defection continuum (Bengtsson & Kock, 2000). Agents choose cooperation levels in [0, endowment], with competitive dynamics emerging through structural parameters (interdependence matrix, bargaining shares, trust evolution). This foundational approach enables computational tractability while capturing core coopetitive phenomena validated against real-world cases.

Future versions will introduce biaxial treatment with independent cooperation and competition dimensions, following Brandenburger & Nalebuff (1996). See Scope and Strategic Roadmap for theoretical rationale and extension plans.

Quick Start

Installation

# Clone the repository
git clone https://github.com/your-org/strategic-coopetition.git
cd strategic-coopetition/coopetition_gym

# Install in development mode
pip install -e .

# Install with all dependencies
pip install -e ".[dev,viz,rl]"

Basic Usage

import coopetition_gym
import numpy as np

# Create environment
env = coopetition_gym.make("TrustDilemma-v0")

# Reset and run episode
obs, info = env.reset(seed=42)
done = False

while not done:
    # Agents choose cooperation levels
    actions = np.array([50.0, 50.0])  # 50% cooperation each
    obs, rewards, terminated, truncated, info = env.step(actions)
    done = terminated or truncated

print(f"Final trust: {info['mean_trust']:.2f}")

PettingZoo APIs

# Parallel API (simultaneous moves)
env = coopetition_gym.make_parallel("PlatformEcosystem-v0")
observations, infos = env.reset()
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)

# AEC API (sequential moves)
env = coopetition_gym.make_aec("TrustDilemma-v0")
env.reset()
for agent in env.agent_iter(): obs, reward, term, trunc, info = env.last()
    action = policy(obs) if not term else None
    env.step(action)

Environment Categories

Coopetition-Gym provides 20 environments organized into 7 categories:

Dyadic Environments (2-Agent)

Micro-level scenarios modeling direct partnerships between two agents.

Environment	Description	Key Challenge
TrustDilemma-v0	Continuous Prisoner’s Dilemma with trust dynamics	Long-horizon impulse control
PartnerHoldUp-v0	Asymmetric power relationship	Power dynamics and exploitation

Ecosystem Environments (N-Agent)

Macro-level scenarios with multiple interacting agents.

Environment	Description	Key Challenge
PlatformEcosystem-v0	Platform with N developers	Ecosystem health management
DynamicPartnerSelection-v0	Reputation-based partner matching	Social learning and signaling

Benchmark Environments

Research-focused environments for algorithm evaluation.

Environment	Description	Key Challenge
RecoveryRace-v0	Post-crisis trust recovery	Planning under trust constraints
SynergySearch-v0	Hidden complementarity discovery	Exploration vs. exploitation

Validated Case Studies

Environments with parameters validated against real business data.

Environment	Description	Validation
SLCD-v0	Samsung-Sony S-LCD Joint Venture	58/60 accuracy
RenaultNissan-v0	Renault-Nissan Alliance phases	Multi-phase dynamics

Extended Environments

Advanced scenarios with additional mechanics.

Environment	Description	Key Mechanics
CooperativeNegotiation-v0	Multi-round negotiation	Commitment and breach penalties
ReputationMarket-v0	Market with reputation tiers	Reputation as strategic asset

Collective Action Environments (TR-3)

Team production and collective action scenarios with loyalty dynamics.

Environment	Description	Key Challenge
TeamProduction-v0	Team production with free-rider dynamics	Nash equilibrium baseline
LoyaltyTeam-v0	Team production with loyalty mechanisms	Sustaining above-Nash cooperation
CoalitionFormation-v0	Dynamic coalition with entry/exit	Coalition stability under exclusion
ApacheProject-v0	Apache HTTP Server case study (52/60)	Phase-dependent contributor dynamics
PublicGoods-v0	Classic public goods game	Contribution and punishment dynamics

Reciprocity Environments (TR-4)

Sequential interaction and reciprocity scenarios with bounded memory.

Environment	Description	Key Challenge
ReciprocalDilemma-v0	Continuous PD with direct reciprocity	Conditional cooperation via memory
GiftExchange-v0	Asymmetric employer-worker exchange	Asymmetric reciprocity sensitivity
IndirectReciprocity-v0	4-agent reputation-mediated cooperation	Indirect reciprocity via image scoring
GraduatedSanction-v0	6-agent commons with graduated sanctions	Proportional punishment and escalation
AppleAppStore-v0	Apple iOS App Store (validated 48/55)	Platform power and reciprocity dynamics

Core Concepts

For Researchers: Full mathematical derivations, proofs, and validation methodology are available in the Theoretical Foundations documentation and the published technical reports.

For Practitioners: The summaries below provide the essential intuition needed to use the environments effectively.

Coopetitive Dynamics

Coopetition occurs when entities simultaneously cooperate (to create value) and compete (to capture value). As Brandenburger and Nalebuff articulated: actors “cooperate to grow the pie and compete to split it up.”

Real-World Examples:

Technology Standards: Competitors collaborate on standards while competing in products (e.g., Bluetooth SIG members)
Joint Ventures: Partners invest jointly but negotiate surplus division (e.g., Samsung-Sony S-LCD)
Platform Ecosystems: Developers depend on platforms that also compete with them (e.g., iOS App Store)
Supply Chains: Suppliers share information for efficiency while competing for contracts

The Coopetition Paradox: The same relationship exhibits both cooperative and competitive dynamics simultaneously, not sequentially or in separate domains. This creates strategic tension that standard game theory struggles to capture.

Interdependence & Structural Coupling (TR-1)

Interdependence captures why actors must consider partner outcomes even while competing. When Actor A depends on Actor B for critical resources, A’s success structurally requires B’s success, creating instrumental concern for B’s welfare distinct from altruism.

The Interdependence Matrix quantifies structural dependencies:

\[\Large D_{ij} = \frac{\sum_{d \in \mathcal{D}_i} w_d \cdot \text{Dep}(i,j,d) \cdot \text{crit}(i,j,d)}{\sum_{d \in \mathcal{D}_i} w_d}\]

Component	Meaning	Example
$w_d$	Importance weight of goal d	Revenue goal: 0.8, Brand goal: 0.2
$\text{Dep}(i,j,d)$	Does i depend on j for d?	Developer depends on platform for distribution
$\text{crit}(i,j,d)$	Criticality (1 = sole provider)	API provider with no alternatives: 1.0

Key Insight: $D_{ij} \neq D_{ji}$ in general. Asymmetric dependencies create power imbalances, a startup may critically depend on a platform ($D_{\text{startup,platform}} \approx 0.8$) while the platform barely notices any single startup ($D_{\text{platform,startup}} \approx 0.01$).

Integrated Utility Function (TR-1)

Agents maximize integrated utility that accounts for partner outcomes through structural coupling:

\[\Large U_i(\mathbf{a}) = \pi_i(\mathbf{a}) + \sum_{j \neq i} D_{ij} \cdot \pi_j(\mathbf{a})\]

Components Explained:

Term	Formula	Intuition
Private Payoff	$\pi_i = e_i - a_i + f(a_i) + \alpha_i \cdot \text{Synergy}$	What I keep + what I create + my share of joint value
Interdependence Term	$\sum_{j} D_{ij} \cdot \pi_j$	Partner success weighted by my dependency on them

Why This Matters: Classical Nash Equilibrium assumes purely self-interested payoffs. The Coopetitive Equilibrium extends Nash by incorporating dependency-weighted concern for partner outcomes, capturing why dependent actors rationally care about partner success.

Value Creation & Complementarity (TR-1)

Complementarity creates the cooperative incentive: joint action produces superadditive value exceeding independent contributions.

\[\Large V(\mathbf{a} \mid \gamma) = \sum_{i=1}^{N} f_i(a_i) + \gamma \cdot g(a_1, \ldots, a_N)\]

Two Validated Specifications:

Specification	Individual Value $f(a)$	Synergy $g(a)$	Best For
Logarithmic (default)	$\theta \cdot \ln(1 + a_i)$, $\theta=20$	Geometric mean	Manufacturing JVs (58/60 validation)
Power	$a_i^{\beta}$, $\beta=0.75$	Geometric mean	General scenarios (46/60 validation)

Key Parameters (validated across 22,000+ trials):

$\theta = 20.0$: Logarithmic scale producing realistic cooperation magnitudes
$\beta = 0.75$: Diminishing returns reflecting investment economics
$\gamma = 0.65$: Complementarity strength balancing individual and joint value

Trust Dynamics (TR-2)

Phase Space

Trust evolves through a two-layer architecture capturing both immediate behavioral responses and long-term memory:

Layer	Symbol	Updates	Captures
Immediate Trust	$T_{ij} \in [0,1]$	Every interaction	Current confidence in partner
Reputation Damage	$R_{ij} \in [0,1]$	On violations	Historical memory of betrayals

Asymmetric Evolution with Negativity Bias:

\[\Delta T = \begin{cases} \lambda^+ \cdot s \cdot (\Theta - T) & \text{if } s > 0 \; [\lambda^+ = 0.10] \\ -\lambda^- \cdot |s| \cdot T \cdot (1 + \xi D) & \text{if } s \leq 0 \; [\lambda^- = 0.30] \end{cases}\]

The 3:1 Ratio: Trust erodes approximately 3× faster than it builds ($\lambda^-/\lambda^+ \approx 3.0$). This negativity bias, validated against behavioral economics research, explains why:

A single major violation can destroy months of trust-building
Consistent cooperation is essential for sustainable partnerships
Recovery from betrayal requires sustained effort over extended periods

Trust Ceiling Mechanism:

\[\Large \Theta = 1 - R \quad \text{(reputation damage limits maximum achievable trust)}\]

Even with perfect cooperation, damaged reputation prevents trust from fully recovering, creating permanent relationship constraints (hysteresis).

Interdependence Amplification: High-dependency relationships experience 27% faster trust erosion for equivalent violations:

\[\Large \text{Erosion factor} = (1 + \xi \cdot D_{ij}) \quad \text{where } \xi = 0.50\]

When you depend heavily on a partner, their betrayal hurts more.

Reciprocity Dynamics (TR-4)

Reciprocity captures how agents condition current behavior on observed partner actions over a bounded memory window. Unlike slow-moving trust (TR-2), reciprocity enables fast behavioral responses within 1-10 steps.

Cooperation Signal (Equation 19):

\[s_{ij} = a_j - \bar{a}_j \quad \text{(deviation from recent average)}\]

Bounded Response (Equation 21):

\[\varphi(x) = \tanh(\kappa \cdot x) \quad \text{where } \kappa \text{ controls sensitivity}\]

Reciprocity Modifier (Equation 44):

\[U_{\text{recip},i} = \lambda_R \sum_{j \neq i} T_{ij} \cdot (1 + \omega D_{ij}) \cdot \rho_{ij} \cdot \varphi(s_{ij})\]

Key Property: Dependency-Scaled Reciprocity

\[\rho_{ij} = \rho_0 \cdot D_{ij}^{\eta} \quad \text{(higher dependency → stronger reciprocal response)}\]

Agents who depend more on a partner reciprocate more strongly, capturing why workers respond to wage changes more than employers respond to effort changes.

Empirical Validation

The mathematical framework has been validated against real business partnerships, open source projects, and platform ecosystems:

Case Study	Validation Score	Key Dynamics Captured
Samsung-Sony S-LCD (2004-2011)	58/60 (96.7%)	Interdependence, complementarity, cooperation levels
Renault-Nissan Alliance (1999-2025)	49/60 (81.7%)	Trust evolution, crisis, recovery across 5 phases
Apache HTTP Server (1995-2023)	52/60 (86.7%)	Loyalty dynamics, phase transitions, contributor effort
Apple iOS App Store (2008-2024)	48/55 (87.3%)	Reciprocity dynamics, platform power, phase transitions

These validations ensure the environments produce realistic coopetitive dynamics rather than artificial constructs.

Learn More: See Theoretical Foundations for complete mathematical derivations, Parameter Reference for validated values, and Benchmark Results for algorithm performance analysis.

Observation and Action Spaces

Observation Space

All environments provide observations containing:

Component	Shape	Description
Actions	`(N,)`	All agents’ cooperation levels
Trust Matrix	`(N, N)`	Pairwise trust levels
Reputation Matrix	`(N, N)`	Pairwise reputation damage
Interdependence	`(N, N)`	Structural dependencies
Step Count	`(1,)`	Normalized timestep

Action Space

Continuous actions representing cooperation level:

Box(low=0.0, high=endowment_i, shape=(1,), dtype=float32)

Higher actions = more cooperation/investment.

Common Parameters

Trust Parameters

Parameter	Symbol	Typical Range	Description
Trust Building Rate	$\lambda^+$	0.08 - 0.15	Speed of trust increase
Trust Erosion Rate	$\lambda^-$	0.25 - 0.45	Speed of trust decrease
Reputation Damage	$\mu_R$	0.45 - 0.70	Damage from violations
Reputation Decay	$\delta_R$	0.01 - 0.03	Forgetting rate
Interdependence Amp.	$\xi$	0.40 - 0.70	Dependency amplification
Signal Sensitivity	$\kappa$	1.0 - 1.5	Action sensitivity

Value Function Parameters

Parameter	Symbol	Typical Range	Description
Logarithmic Scale	θ	18 - 25	Value magnitude
Complementarity	γ	0.50 - 0.75	Synergy from cooperation
Power Exponent	β	0.70 - 0.80	Diminishing returns

Reciprocity Parameters (TR-4)

Parameter	Symbol	Typical Range	Description
Base Reciprocity	$\rho_0$	0.6 - 1.2	Reciprocity strength
Dependency Elasticity	$\eta$	1.0 - 1.5	How dependency scales reciprocity
Response Sensitivity	$\kappa$	0.8 - 1.0	Bounded response steepness
Memory Window	$k$	3 - 10	Steps of recent history
Reciprocity Weight	$\lambda_R$	1.0 - 1.8	Overall reciprocity scaling
Dependency Amplification	$\omega$	0.5 - 1.0	Dependency boost in trust gating

API Reference

Factory Functions

coopetition_gym.make(env_id, **kwargs)
# Returns: Gymnasium-compatible environment

coopetition_gym.make_parallel(env_id, **kwargs)
# Returns: PettingZoo ParallelEnv

coopetition_gym.make_aec(env_id, **kwargs)
# Returns: PettingZoo AECEnv

coopetition_gym.list_environments()
# Returns: List of available environment IDs

Common Methods

env.reset(seed=None, options=None)
# Returns: (observation, info)

env.step(action)
# Returns: (observation, reward, terminated, truncated, info)

env.render()
# Returns: Rendered output (if render_mode set)

env.close()
# Cleanup resources

Research Applications

Coopetition-Gym supports research in:

Multi-Agent Reinforcement Learning: Test MARL algorithms on strategic interaction problems
Game Theory: Study equilibria in repeated games with trust dynamics
Mechanism Design: Evaluate incentive structures for cooperation
Organizational Behavior: Model partnership dynamics and alliance management
AI Safety: Understand cooperation emergence and breakdown

Citation

If you use Coopetition-Gym in your research, please cite:

@software{coopetition_gym,
  title = {Coopetition-Gym: Multi-Agent RL Environments for Strategic Coopetition},
  author = {Pant, Vik and Yu, Eric},
  year = {2025},
  institution = {Faculty of Information and Department of Computer Science, University of Toronto},
  url = {https://github.com/your-org/strategic-coopetition}
}

@article{pant2025tr1,
  title = {Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity},
  author = {Pant, Vik and Yu, Eric},
  journal = {arXiv preprint arXiv:2510.18802},
  year = {2025}
}

@article{pant2025tr2,
  title = {Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics},
  author = {Pant, Vik and Yu, Eric},
  journal = {arXiv preprint arXiv:2510.24909},
  year = {2025}
}

@article{pant2026tr3,
  title = {Computational Foundations for Strategic Coopetition: Formalizing Collective Action and Loyalty},
  author = {Pant, Vik and Yu, Eric},
  journal = {arXiv preprint arXiv:2601.16237},
  year = {2026}
}

@article{pant2026tr4,
  title = {Computational Foundations for Strategic Coopetition: Formalizing Sequential Interaction and Reciprocity},
  author = {Pant, Vik and Yu, Eric},
  journal = {arXiv preprint arXiv:2604.01240},
  year = {2026}
}

License

Coopetition-Gym is released under the MIT License.

Getting Started

Reference

Environment Finder - Interactive tool to match research questions to environments
Environment Reference
API Documentation
Parameter Reference

Theory & Research

Theoretical Foundations
Benchmark Results
Implementation Roadmap
Scope and Strategic Roadmap NEW - Modeling philosophy and future extensions

Development

Benchmark Highlights

We have evaluated 20 MARL algorithms across the 5 TR-1 environments and 5 TR-2 environments with 760 experiments totaling 76,000 evaluation episodes. Benchmarks for the 5 TR-3 collective action environments and 5 TR-4 reciprocity environments are forthcoming. Key findings:

Finding	Implication
Simple heuristics (Constant_050) outperform all learning algorithms	Predictable cooperation builds trust
Trust-Return correlation: r = 0.552	Trust causally drives performance
Population methods (Self-Play, FCP) fail catastrophically	Nash equilibria are Pareto-suboptimal
CTDE methods cluster together	Centralized critic dominates actor architecture

See Benchmark Results for comprehensive analysis.