Quick Start Tutorial

This tutorial introduces Coopetition-Gym through hands-on examples. You’ll learn to create environments, interact with them, and understand the core concepts.

Time: 15 minutes Prerequisites: Python basics, NumPy familiarity Before starting: Complete installation

Understanding the Action Model

Coopetition-Gym v1.x environments use the uniaxial treatment of coopetition. Agents choose a single value representing their cooperation/investment level:

Action range: [0, endowment] where higher values mean more cooperation
Zero action: Non-participation or resource retention (not sabotage)
Competition: Emerges through structural parameters, who depends on whom, how value is divided, rather than explicit competitive actions

This follows established game-theoretic traditions for modeling social dilemmas. Future versions will introduce biaxial action spaces where cooperation and competition are independent dimensions.

Your First Environment

Let’s create and interact with the TrustDilemma-v0 environment:

import coopetition_gym
import numpy as np

# Create the environment
env = coopetition_gym.make("TrustDilemma-v0")

# Reset with a seed for reproducibility
obs, info = env.reset(seed=42)

print(f"Observation shape: {obs.shape}")
print(f"Action space: {env.action_space}")
print(f"Number of agents: {env.n_agents}")

Expected output:

Observation shape: (17,)
Action space: Box(0.0, 100.0, (2,), float32)
Number of agents: 2

Understanding the Observation

The observation contains the current state of the environment:

# Reset and examine observation
obs, info = env.reset(seed=42)

# The observation is a flat array with structure:
# [actions(2), trust_matrix(4), reputation_matrix(4), interdependence(4), metadata(3)]
print(f"Previous actions: {obs[0:2]}")
print(f"Trust matrix (flattened): {obs[2:6]}")
print(f"Reputation damage: {obs[6:10]}")
print(f"Interdependence: {obs[10:14]}")
print(f"Metadata: {obs[14:17]}")

Key components:

Actions: What each agent did last step
Trust matrix: Pairwise trust levels $\tau_{ij} \in [0, 1]$
Reputation damage: Accumulated damage $R_{ij} \in [0, 1]$
Interdependence: How much each agent depends on others $D_{ij}$

Taking Actions

Actions represent cooperation levels - how much each agent invests in the partnership:

# Both agents choose cooperation levels
# Agent 0: 60 out of 100 endowment (60% cooperation)
# Agent 1: 55 out of 100 endowment (55% cooperation)
actions = np.array([60.0, 55.0])

# Take a step
obs, rewards, terminated, truncated, info = env.step(actions)

print(f"Rewards: Agent 0 = {rewards[0]:.2f}, Agent 1 = {rewards[1]:.2f}")
print(f"Current trust: {info['mean_trust']:.3f}")
print(f"Terminated: {terminated}, Truncated: {truncated}")

Action interpretation:

High action (60-100): Cooperation - investing in joint value
Medium action (35-60): Cautious - balanced approach
Low action (0-35): Defection - prioritizing self-interest

Running a Complete Episode

Let’s run a full episode with a simple strategy:

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

total_rewards = np.zeros(2)
trust_history = []
step_count = 0

while True:
    # Simple strategy: both agents cooperate moderately
    actions = np.array([55.0, 55.0])

    obs, rewards, terminated, truncated, info = env.step(actions)
    total_rewards += rewards
    trust_history.append(info['mean_trust'])
    step_count += 1

    if terminated or truncated: break

print(f"Episode finished after {step_count} steps")
print(f"Total rewards: Agent 0 = {total_rewards[0]:.1f}, Agent 1 = {total_rewards[1]:.1f}")
print(f"Final trust: {trust_history[-1]:.3f}")
print(f"Trust range: [{min(trust_history):.3f}, {max(trust_history):.3f}]")

Understanding Rewards

Rewards come from integrated utility - a combination of individual value and partner outcomes:

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

# Compare different action profiles
scenarios = [
    ("Mutual cooperation", np.array([70.0, 70.0])),
    ("Mutual defection", np.array([20.0, 20.0])),
    ("Agent 0 defects", np.array([20.0, 70.0])),
    ("Agent 1 defects", np.array([70.0, 20.0])),
]

for name, actions in scenarios: obs, info = env.reset(seed=42)  # Reset to same state
    obs, rewards, _, _, info = env.step(actions)
    print(f"{name}: R0={rewards[0]:.1f}, R1={rewards[1]:.1f}, Trust={info['mean_trust']:.3f}")

Key insight: High mutual cooperation yields better total rewards, but defecting while the partner cooperates can be individually tempting (the dilemma!).

Trust Dynamics

Trust is the key state variable that evolves based on actions:

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

print("Demonstrating trust dynamics:\n")

# Phase 1: Build trust through cooperation
print("Phase 1: Building trust (cooperation)")
for _ in range(10): obs, rewards, _, _, info = env.step(np.array([70.0, 70.0]))
print(f"  Trust after cooperation: {info['mean_trust']:.3f}")

# Phase 2: Erode trust through defection
print("\nPhase 2: Eroding trust (defection)")
for _ in range(5): obs, rewards, _, _, info = env.step(np.array([20.0, 20.0]))
print(f"  Trust after defection: {info['mean_trust']:.3f}")

# Phase 3: Attempt recovery
print("\nPhase 3: Attempting recovery")
for _ in range(10): obs, rewards, _, _, info = env.step(np.array([70.0, 70.0]))
print(f"  Trust after recovery attempt: {info['mean_trust']:.3f}")

Key properties:

Trust builds slowly (λ⁺ ≈ 0.10-0.15)
Trust erodes quickly (λ⁻ ≈ 0.30-0.45)
3:1 negativity bias: Trust erodes 3× faster than it builds

Using Different APIs

Coopetition-Gym supports three APIs:

Gymnasium API (Default)

# Standard Gymnasium interface
env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)
actions = np.array([50.0, 50.0])  # Joint action array
obs, rewards, terminated, truncated, info = env.step(actions)

PettingZoo Parallel API

# For simultaneous-move multi-agent settings
env = coopetition_gym.make_parallel("TrustDilemma-v0")
observations, infos = env.reset(seed=42)

# Actions are dictionaries keyed by agent name
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)

print(f"Agents: {env.agents}")
print(f"Rewards: {rewards}")

PettingZoo AEC API

# For turn-based or sequential settings
env = coopetition_gym.make_aec("TrustDilemma-v0")
env.reset(seed=42)

for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last()

    if termination or truncation: action = None
    else: action = env.action_space(agent).sample()

    env.step(action)

Exploring Available Environments

Coopetition-Gym provides 20 environments:

# List all available environments
envs = coopetition_gym.list_environments()
print("Available environments:")
for env_id in envs: print(f"  - {env_id}")

Environment Categories

Category	Environments	Description
Dyadic	TrustDilemma-v0, PartnerHoldUp-v0	2-agent partnerships
Ecosystem	PlatformEcosystem-v0, DynamicPartnerSelection-v0	N-agent markets
Benchmark	RecoveryRace-v0, SynergySearch-v0	Algorithm evaluation
Case Study	SLCD-v0, RenaultNissan-v0	Validated real scenarios
Extended	CooperativeNegotiation-v0, ReputationMarket-v0	Advanced mechanics
Collective Action (TR-3)	TeamProduction-v0, LoyaltyTeam-v0, CoalitionFormation-v0, ApacheProject-v0, PublicGoods-v0	Team production with loyalty
Reciprocity (TR-4)	ReciprocalDilemma-v0, GiftExchange-v0, IndirectReciprocity-v0, GraduatedSanction-v0, AppleAppStore-v0	Memory-bounded reciprocity

Trying Different Environments

# Try the SLCD (Samsung-Sony) environment
env = coopetition_gym.make("SLCD-v0")
obs, info = env.reset(seed=42)
print(f"SLCD observation shape: {obs.shape}")

# Try an N-agent environment
env = coopetition_gym.make("PlatformEcosystem-v0", n_developers=4)
obs, info = env.reset(seed=42)
print(f"PlatformEcosystem observation shape: {obs.shape}")
print(f"Number of agents: {env.n_agents}")

Basic Strategy Implementation

Let’s implement a simple tit-for-tat strategy:

def tit_for_tat_episode(env, initial_action=60.0, num_steps=100):
    """Run episode with tit-for-tat strategy."""
    obs, info = env.reset(seed=42)

    # Start with initial cooperation
    my_action = initial_action
    partner_action = initial_action

    total_rewards = np.zeros(2)

    for step in range(num_steps):
        # Agent 0: Tit-for-tat (copy partner's last action)
        # Agent 1: Fixed moderate cooperation
        actions = np.array([my_action, 55.0])

        obs, rewards, terminated, truncated, info = env.step(actions)
        total_rewards += rewards

        # Update my_action to match partner's last action
        my_action = obs[1]  # Partner's last action

        if terminated or truncated: break

    return total_rewards, info

env = coopetition_gym.make("TrustDilemma-v0")
rewards, final_info = tit_for_tat_episode(env)
print(f"Tit-for-Tat results:")
print(f"  Total rewards: {rewards}")
print(f"  Final trust: {final_info['mean_trust']:.3f}")

Experiment: Cooperation vs. Defection

Run a comparison experiment:

def run_strategy(env_name, strategy_fn, episodes=5, seed=42):
    """Run multiple episodes with a strategy and return average rewards."""
    all_rewards = []

    for ep in range(episodes): env = coopetition_gym.make(env_name)
        obs, info = env.reset(seed=seed + ep)

        ep_rewards = np.zeros(2)
        for _ in range(100): actions = strategy_fn(obs, info)
            obs, rewards, terminated, truncated, info = env.step(actions)
            ep_rewards += rewards
            if terminated or truncated: break

        all_rewards.append(ep_rewards)
        env.close()

    return np.mean(all_rewards, axis=0)

# Define strategies
def always_cooperate(obs, info): return np.array([80.0, 80.0])

def always_defect(obs, info): return np.array([20.0, 20.0])

def mixed_strategy(obs, info): return np.array([50.0, 50.0])

# Compare strategies
print("Strategy Comparison on TrustDilemma-v0:")
print("-" * 45)

for name, fn in [("Always Cooperate", always_cooperate),
                  ("Always Defect", always_defect),
                  ("Mixed (50/50)", mixed_strategy)]: rewards = run_strategy("TrustDilemma-v0", fn)
    print(f"{name:20s}: Agent0={rewards[0]:7.1f}, Agent1={rewards[1]:7.1f}")

Summary

You’ve learned: 1. Creating environments with coopetition_gym.make()

Understanding observations - trust, reputation, interdependence
Taking actions - cooperation levels from 0 to endowment
Understanding rewards - integrated utility framework
Trust dynamics - the 3:1 negativity bias
Using different APIs - Gymnasium, PettingZoo Parallel, AEC

Next Steps

Environment Reference - Explore all 20 environments
SLCD-v0 - Try the validated Samsung-Sony case study
Training Tutorial - Train RL agents with Stable-Baselines3

Troubleshooting

ImportError: No module named ‘coopetition_gym’

Verify installation with pip show coopetition-gym
Ensure you’re in the correct virtual environment

Observation shape doesn’t match expected

Different environments have different observation dimensions
Check the specific environment documentation

Episode terminates early

Trust may have collapsed below threshold
Check info['mean_trust'] to monitor trust levels

Technical Reports

TR-1: Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity (arXiv:2510.18802)
TR-2: Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics (arXiv:2510.24909)
TR-3: Computational Foundations for Strategic Coopetition: Formalizing Collective Action and Loyalty (arXiv:2601.16237)
TR-4: Computational Foundations for Strategic Coopetition: Formalizing Sequential Interaction and Reciprocity (arXiv:2604.01240)