Skip to the content.

Troubleshooting

Common issues and solutions when working with Coopetition-Gym.


Installation Issues

ModuleNotFoundError: No module named ‘coopetition_gym’

Cause: Package not installed or installed in different environment.

Solution:

# Verify installation
pip show coopetition_gym

# If not found, install
cd strategic-coopetition/coopetition_gym
pip install -e .

# Verify
python -c "import coopetition_gym; print(coopetition_gym.list_environments())"

Version Conflicts with Gymnasium/PettingZoo

Cause: Incompatible package versions.

Solution:

# Check versions
pip show gymnasium pettingzoo

# Upgrade to compatible versions
pip install "gymnasium>=0.29" "pettingzoo>=1.24"

# Reinstall coopetition_gym
pip install -e .

Required versions:

Import Errors with NumPy

Error: AttributeError: module 'numpy' has no attribute 'bool'

Cause: NumPy 2.0 removed deprecated aliases.

Solution:

# Downgrade NumPy (if needed)
pip install "numpy<2.0"

# Or upgrade coopetition_gym to latest version
pip install -e . --upgrade

Environment Creation Issues

ValueError: Unknown environment ID

Error: ValueError: Environment 'TrustDilema-v0' not found

Cause: Typo in environment name or environment not registered.

Solution:

import coopetition_gym

# List all available environments
print(coopetition_gym.list_environments())
# ['TrustDilemma-v0', 'PartnerHoldUp-v0', ...]

# Correct spelling
env = coopetition_gym.make("TrustDilemma-v0")  # Note: 'Dilemma' not 'Dilema'

TypeError: init() got an unexpected keyword argument

Cause: Using invalid parameter for environment.

Solution:

# Check valid parameters in documentation
# Example: TrustDilemma-v0 parameters
env = coopetition_gym.make(
    "TrustDilemma-v0",
    max_steps=100,           # Valid
    lambda_plus=0.10,        # Valid
    lambda_minus=0.30,       # Valid
    # invalid_param=1.0,     # Would cause error
)

Training Issues

NaN in Observations or Rewards

Cause: Numerical instability from extreme actions or parameter values.

Solution:

import numpy as np

# Clip actions to valid range
action = np.clip(raw_action, env.action_space.low, env.action_space.high)

# Check for NaN before stepping
if np.isnan(action).any(): action = env.action_space.sample()  # Fallback to random

obs, reward, terminated, truncated, info = env.step(action)

# Verify output
if np.isnan(obs).any() or np.isnan(reward).any(): print("Warning: NaN detected in environment output")
    obs, info = env.reset()  # Reset if corrupted

Trust Collapses Too Quickly

Cause: High erosion rate or inconsistent actions.

Symptoms:

Solution:

# Use more conservative trust parameters
env = coopetition_gym.make(
    "TrustDilemma-v0",
    lambda_plus=0.12,   # Faster building
    lambda_minus=0.25,  # Slower erosion (still 2:1 ratio)
    initial_trust=0.60, # Start higher
)

# Ensure consistent cooperation during training
# Avoid large swings in action values

Agent Learns to Always Defect

Cause: Short-term reward dominates; agent doesn’t discover cooperation benefits.

Symptoms:

Solution:

# 1. Increase exploration
from stable_baselines3 import PPO

model = PPO(
    "MlpPolicy",
    env,
    ent_coef=0.05,  # Higher entropy for more exploration
)

# 2. Use longer horizon
env = coopetition_gym.make("TrustDilemma-v0", max_steps=200)

# 3. Reward shaping (add cooperation bonus during training)
# Note: Only for training, remove for evaluation
class CooperationBonus(gym.Wrapper): def step(self, action): obs, reward, term, trunc, info = self.env.step(action)
        coop_bonus = 0.1 * np.mean(action)  # Small bonus for cooperating
        return obs, reward + coop_bonus, term, trunc, info

Agents Don’t Coordinate

Cause: Independent learning without communication mechanism.

Symptoms:

Solution:

# 1. Try parameter sharing
# Use same policy network for both agents

# 2. Use centralized training (MAPPO, QMIX)
# Agents share information during training

# 3. Increase training time
# Coordination often emerges later in training
model.learn(total_timesteps=2_000_000)  # More than default

Memory and Performance Issues

CUDA Out of Memory

Error: RuntimeError: CUDA out of memory

Cause: Batch size or network too large for GPU.

Solution:

# Reduce batch size
model = PPO("MlpPolicy", env, batch_size=32)  # Down from 64

# Smaller network
model = PPO(
    "MlpPolicy",
    env,
    policy_kwargs=dict(net_arch=[64, 64])  # Down from [128, 128]
)

# Force CPU training
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""  # Disable GPU

Training Too Slow

Cause: Inefficient environment stepping or excessive logging.

Solution:

# 1. Use vectorized environments
from stable_baselines3.common.vec_env import SubprocVecEnv

def make_env(seed): def _init(): env = coopetition_gym.make("TrustDilemma-v0")
        env.reset(seed=seed)
        return env
    return _init

env = SubprocVecEnv([make_env(i) for i in range(4)])  # 4 parallel envs

# 2. Reduce logging frequency
model.learn(total_timesteps=1_000_000, log_interval=100)  # Less frequent

# 3. Disable rendering during training
env = coopetition_gym.make("TrustDilemma-v0", render_mode=None)

High Memory Usage

Cause: Large replay buffer or storing unnecessary data.

Solution:

# Limit replay buffer (for off-policy algorithms)
from stable_baselines3 import SAC

model = SAC(
    "MlpPolicy",
    env,
    buffer_size=50_000,  # Down from default 1M
)

# Clear memory periodically
import gc
gc.collect()

# Monitor memory
import psutil
print(f"Memory: {psutil.Process().memory_info().rss / 1e9:.2f} GB")

API Issues

PettingZoo API Mismatch

Error: AttributeError: 'ParallelEnv' object has no attribute 'observation_spaces'

Cause: Using old PettingZoo API (pre-1.24).

Solution:

# Upgrade PettingZoo
pip install "pettingzoo>=1.24"

# New API uses methods, not properties
env = coopetition_gym.make_parallel("TrustDilemma-v0")

# Correct usage
obs_space = env.observation_space("agent_0")  # Method call
act_space = env.action_space("agent_0")       # Method call

# Not: env.observation_spaces["agent_0"]  # Old API

Action Space Mismatch

Error: AssertionError: Action outside of bounds

Cause: Action doesn’t match expected shape or range.

Solution:

env = coopetition_gym.make("TrustDilemma-v0")

# Check action space
print(f"Shape: {env.action_space.shape}")    # (2,) for dyadic
print(f"Low: {env.action_space.low}")        # [0., 0.]
print(f"High: {env.action_space.high}")      # [100., 100.]

# Ensure correct action format
action = np.array([50.0, 50.0], dtype=np.float32)

# Clip to valid range
action = np.clip(action, env.action_space.low, env.action_space.high)

AEC vs Parallel API Confusion

Error: Actions not accepted or wrong observation returned.

Cause: Mixing up AEC (sequential) and Parallel (simultaneous) APIs.

Solution:

# Parallel API: All agents act simultaneously
env = coopetition_gym.make_parallel("TrustDilemma-v0")
observations, infos = env.reset()
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)

# AEC API: Agents act sequentially
env = coopetition_gym.make_aec("TrustDilemma-v0")
env.reset()
for agent in env.agent_iter(): obs, reward, term, trunc, info = env.last()
    action = None if term or trunc else env.action_space(agent).sample()
    env.step(action)  # Single action, not dict

Reproducibility Issues

Different Results with Same Seed

Cause: Environment or algorithm has additional randomness sources.

Solution:

import numpy as np
import random
import torch

def set_all_seeds(seed):
    """Set all random seeds for reproducibility."""
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)

# Set seeds before creating environment
set_all_seeds(42)
env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

# For Stable-Baselines3
from stable_baselines3.common.utils import set_random_seed
set_random_seed(42)

Results Vary Across Runs

Cause: GPU non-determinism or multi-threading.

Solution:

# Force deterministic operations (slower)
import torch
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

# Or use CPU only for exact reproducibility
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""

Debugging Tips

Visualize Trust Evolution

import matplotlib.pyplot as plt

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

trust_history = [info['mean_trust']]
action_history = []

for _ in range(100): action = np.array([50.0, 50.0])
    obs, reward, term, trunc, info = env.step(action)
    trust_history.append(info['mean_trust'])
    action_history.append(action.mean())

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(trust_history)
plt.xlabel('Step')
plt.ylabel('Mean Trust')
plt.title('Trust Evolution')

plt.subplot(1, 2, 2)
plt.plot(action_history)
plt.xlabel('Step')
plt.ylabel('Mean Action')
plt.title('Cooperation Level')

plt.tight_layout()
plt.savefig('debug_trust.png')

Inspect Observation Structure

env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

print(f"Observation shape: {obs.shape}")
print(f"Observation: {obs}")
print(f"Info keys: {info.keys()}")

# Decode observation components
print(f"Actions (0-1): {obs[0:2]}")
print(f"Trust matrix (2-5): {obs[2:6].reshape(2,2)}")
print(f"Reputation matrix (6-9): {obs[6:10].reshape(2,2)}")

Check Reward Scaling

# Run episode and check reward distribution
env = coopetition_gym.make("TrustDilemma-v0")
obs, info = env.reset(seed=42)

rewards = []
for _ in range(100): action = np.array([50.0, 50.0])
    obs, reward, term, trunc, info = env.step(action)
    rewards.append(reward)

rewards = np.array(rewards)
print(f"Reward mean: {rewards.mean():.2f}")
print(f"Reward std: {rewards.std():.2f}")
print(f"Reward min: {rewards.min():.2f}")
print(f"Reward max: {rewards.max():.2f}")

Getting Help

If issues persist: 1. Check documentation: Environment Reference, API Documentation

  1. Search issues: GitHub Issues
  2. Open new issue, Include: - Python version
    • Package versions (pip freeze)
    • Minimal reproducing code
    • Full error traceback
    • Expected vs. actual behavior

See Also