Troubleshooting

Common issues and solutions when working with Coopetition-Gym.

Installation Issues

ModuleNotFoundError: No module named ‘coopetition_gym’

Cause: Package not installed or installed in different environment.

Solution:

# Verify installation
pip show coopetition_gym

# If not found, install
cd strategic-coopetition/coopetition_gym
pip install -e .

# Verify
python -c "import coopetition_gym; print(coopetition_gym.list_environments())"

Version Conflicts with Gymnasium/PettingZoo

Cause: Incompatible package versions.

Solution:

# Check versions
pip show gymnasium pettingzoo

# Upgrade to compatible versions
pip install "gymnasium>=0.29" "pettingzoo>=1.24"

# Reinstall coopetition_gym
pip install -e .

Required versions:

Python: 3.9+
Gymnasium: 0.29+
PettingZoo: 1.24+
NumPy: 1.21+

Import Errors with NumPy

Error: AttributeError: module 'numpy' has no attribute 'bool'

Cause: NumPy 2.0 removed deprecated aliases.

Solution:

# Downgrade NumPy (if needed)
pip install "numpy<2.0"

# Or upgrade coopetition_gym to latest version
pip install -e . --upgrade

Environment Creation Issues

ValueError: Unknown environment ID

Error: ValueError: Environment 'TrustDilema-v0' not found

Cause: Typo in environment name or environment not registered.

Solution:

import coopetition_gym

# List all available environments
print(coopetition_gym.list_environments())
# ['TrustDilemma-v0', 'PartnerHoldUp-v0', ...]

# Correct spelling
env = coopetition_gym.make("TrustDilemma-v0")  # Note: 'Dilemma' not 'Dilema'

TypeError: init() got an unexpected keyword argument

Cause: Using invalid parameter for environment.

Solution:

# Check valid parameters in documentation
# Example: TrustDilemma-v0 parameters
env = coopetition_gym.make(
    "TrustDilemma-v0",
    max_steps=100,           # Valid
    lambda_plus=0.10,        # Valid
    lambda_minus=0.30,       # Valid
    # invalid_param=1.0,     # Would cause error
)

Training Issues

NaN in Observations or Rewards

Cause: Numerical instability from extreme actions or parameter values.

Solution:

import numpy as np

# Clip actions to valid range
action = np.clip(raw_action, env.action_space.low, env.action_space.high)

# Check for NaN before stepping
if np.isnan(action).any(): action = env.action_space.sample()  # Fallback to random

obs, reward, terminated, truncated, info = env.step(action)

# Verify output
if np.isnan(obs).any() or np.isnan(reward).any(): print("Warning: NaN detected in environment output")
    obs, info = env.reset()  # Reset if corrupted

Trust Collapses Too Quickly

Cause: High erosion rate or inconsistent actions.

Symptoms:

Trust drops to 0 within first 10-20 steps
Episodes terminate early due to trust collapse

Solution:

# Use more conservative trust parameters
env = coopetition_gym.make(
    "TrustDilemma-v0",
    lambda_plus=0.12,   # Faster building
    lambda_minus=0.25,  # Slower erosion (still 2:1 ratio)
    initial_trust=0.60, # Start higher
)

# Ensure consistent cooperation during training
# Avoid large swings in action values

Agent Learns to Always Defect

Cause: Short-term reward dominates; agent doesn’t discover cooperation benefits.

Symptoms:

Actions converge to minimum (0 or near-0)
Episode returns plateau at low value
Trust always near 0

Solution:

# 1. Increase exploration
from stable_baselines3 import PPO

model = PPO(
    "MlpPolicy",
    env,
    ent_coef=0.05,  # Higher entropy for more exploration
)

# 2. Use longer horizon
env = coopetition_gym.make("TrustDilemma-v0", max_steps=200)

# 3. Reward shaping (add cooperation bonus during training)
# Note: Only for training, remove for evaluation
class CooperationBonus(gym.Wrapper): def step(self, action): obs, reward, term, trunc, info = self.env.step(action)
        coop_bonus = 0.1 * np.mean(action)  # Small bonus for cooperating
        return obs, reward + coop_bonus, term, trunc, info

Agents Don’t Coordinate

Cause: Independent learning without communication mechanism.

Symptoms:

Agents oscillate between cooperation and defection
Trust never stabilizes
Returns have high variance

Solution:

# 1. Try parameter sharing
# Use same policy network for both agents

# 2. Use centralized training (MAPPO, QMIX)
# Agents share information during training

# 3. Increase training time
# Coordination often emerges later in training
model.learn(total_timesteps=2_000_000)  # More than default

Memory and Performance Issues

CUDA Out of Memory

Error: RuntimeError: CUDA out of memory

Cause: Batch size or network too large for GPU.

Solution:

# Reduce batch size
model = PPO("MlpPolicy", env, batch_size=32)  # Down from 64

# Smaller network
model = PPO(
    "MlpPolicy",
    env,
    policy_kwargs=dict(net_arch=[64, 64])  # Down from [128, 128]
)

# Force CPU training
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""  # Disable GPU

Training Too Slow

Cause: Inefficient environment stepping or excessive logging.

Solution:

# 1. Use vectorized environments
from stable_baselines3.common.vec_env import SubprocVecEnv

def make_env(seed): def _init(): env = coopetition_gym.make("TrustDilemma-v0")
        env.reset(seed=seed)
        return env
    return _init

env = SubprocVecEnv([make_env(i) for i in range(4)])  # 4 parallel envs

# 2. Reduce logging frequency
model.learn(total_timesteps=1_000_000, log_interval=100)  # Less frequent

# 3. Disable rendering during training
env = coopetition_gym.make("TrustDilemma-v0", render_mode=None)

High Memory Usage

Cause: Large replay buffer or storing unnecessary data.

Solution:

# Limit replay buffer (for off-policy algorithms)
from stable_baselines3 import SAC

model = SAC(
    "MlpPolicy",
    env,
    buffer_size=50_000,  # Down from default 1M
)

# Clear memory periodically
import gc
gc.collect()

# Monitor memory
import psutil
print(f"Memory: {psutil.Process().memory_info().rss / 1e9:.2f} GB")

API Issues

PettingZoo API Mismatch

Error: AttributeError: 'ParallelEnv' object has no attribute 'observation_spaces'

Cause: Using old PettingZoo API (pre-1.24).

Solution:

# Upgrade PettingZoo
pip install "pettingzoo>=1.24"

# New API uses methods, not properties
env = coopetition_gym.make_parallel("TrustDilemma-v0")

# Correct usage
obs_space = env.observation_space("agent_0")  # Method call
act_space = env.action_space("agent_0")       # Method call

# Not: env.observation_spaces["agent_0"]  # Old API

Action Space Mismatch

Error: AssertionError: Action outside of bounds

Cause: Action doesn’t match expected shape or range.

Solution:

env = coopetition_gym.make("TrustDilemma-v0")

# Check action space
print(f"Shape: {env.action_space.shape}")    # (2,) for dyadic
print(f"Low: {env.action_space.low}")        # [0., 0.]
print(f"High: {env.action_space.high}")      # [100., 100.]

# Ensure correct action format
action = np.array([50.0, 50.0], dtype=np.float32)

# Clip to valid range
action = np.clip(action, env.action_space.low, env.action_space.high)

AEC vs Parallel API Confusion

Error: Actions not accepted or wrong observation returned.

Cause: Mixing up AEC (sequential) and Parallel (simultaneous) APIs.

Solution:

# Parallel API: All agents act simultaneously
env = coopetition_gym.make_parallel("TrustDilemma-v0")
observations, infos = env.reset()
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)

# AEC API: Agents act sequentially
env = coopetition_gym.make_aec("TrustDilemma-v0")
env.reset()
for agent in env.agent_iter(): obs, reward, term, trunc, info = env.last()
    action = None if term or trunc else env.action_space(agent).sample()
    env.step(action)  # Single action, not dict

Reproducibility Issues

Different Results with Same Seed

Cause: Environment or algorithm has additional randomness sources.

Solution:

import numpy as np
import random
import torch

def set_all_seeds(seed):
    """Set all random seeds for reproducibility."""
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)

# Set seeds before creating environment
set_all_seeds(42)
env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

# For Stable-Baselines3
from stable_baselines3.common.utils import set_random_seed
set_random_seed(42)

Results Vary Across Runs

Cause: GPU non-determinism or multi-threading.

Solution:

# Force deterministic operations (slower)
import torch
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

# Or use CPU only for exact reproducibility
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""

Debugging Tips

Visualize Trust Evolution

import matplotlib.pyplot as plt

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

trust_history = [info['mean_trust']]
action_history = []

for _ in range(100): action = np.array([50.0, 50.0])
    obs, reward, term, trunc, info = env.step(action)
    trust_history.append(info['mean_trust'])
    action_history.append(action.mean())

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(trust_history)
plt.xlabel('Step')
plt.ylabel('Mean Trust')
plt.title('Trust Evolution')

plt.subplot(1, 2, 2)
plt.plot(action_history)
plt.xlabel('Step')
plt.ylabel('Mean Action')
plt.title('Cooperation Level')

plt.tight_layout()
plt.savefig('debug_trust.png')

Inspect Observation Structure

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

print(f"Observation shape: {obs.shape}")
print(f"Observation: {obs}")
print(f"Info keys: {info.keys()}")

# Decode observation components
print(f"Actions (0-1): {obs[0:2]}")
print(f"Trust matrix (2-5): {obs[2:6].reshape(2,2)}")
print(f"Reputation matrix (6-9): {obs[6:10].reshape(2,2)}")

Check Reward Scaling

# Run episode and check reward distribution
env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

rewards = []
for _ in range(100): action = np.array([50.0, 50.0])
    obs, reward, term, trunc, info = env.step(action)
    rewards.append(reward)

rewards = np.array(rewards)
print(f"Reward mean: {rewards.mean():.2f}")
print(f"Reward std: {rewards.std():.2f}")
print(f"Reward min: {rewards.min():.2f}")
print(f"Reward max: {rewards.max():.2f}")

Getting Help

If issues persist: 1. Check documentation: Environment Reference, API Documentation

Search issues: GitHub Issues
Open new issue, Include: - Python version
- Package versions (pip freeze)
- Minimal reproducing code
- Full error traceback
- Expected vs. actual behavior

Coopetition-Gym

Multi-agent reinforcement learning environments for studying mixed-motive coopetitive dynamics. Twenty environments organised into four mechanism classes, with reward-type ablation methodology and four validated case studies.

Troubleshooting

Installation Issues

ModuleNotFoundError: No module named ‘coopetition_gym’

Version Conflicts with Gymnasium/PettingZoo

Import Errors with NumPy

Environment Creation Issues

ValueError: Unknown environment ID

TypeError: init() got an unexpected keyword argument

Training Issues

NaN in Observations or Rewards

Trust Collapses Too Quickly

Agent Learns to Always Defect

Agents Don’t Coordinate

Memory and Performance Issues

CUDA Out of Memory

Training Too Slow

High Memory Usage

API Issues

PettingZoo API Mismatch

Action Space Mismatch

AEC vs Parallel API Confusion

Reproducibility Issues

Different Results with Same Seed

Results Vary Across Runs

Debugging Tips

Visualize Trust Evolution

Inspect Observation Structure

Check Reward Scaling

Getting Help

See Also