LoyaltyTeam-v0

Category: Collective Action Environment (TR-3) Agents: 4 (configurable) Difficulty: Intermediate-Advanced Source: coopetition_gym/envs/collective_action_envs.py

Overview

LoyaltyTeam-v0 implements team production with full TR-3 loyalty mechanisms. Unlike the baseline TeamProduction-v0, agents with high loyalty receive bonuses proportional to teammate welfare, creating positive-sum dynamics that can sustain cooperation above the Nash equilibrium.

This environment tests whether loyalty dynamics can overcome the free-rider problem and sustain cooperation in team settings.

MARL Classification

Property	Value
Game Type	N-player Markov Game (general-sum)
Cooperation Structure	Mixed-Motive with loyalty amplification
Observability	Full
Communication	Implicit (through actions)
Agent Symmetry	Symmetric (capabilities), asymmetric (loyalty states)
Reward Structure	Team share + loyalty modifier
Action Space	Continuous: $A_i = [0, 50]$
State Dynamics	Deterministic with loyalty evolution
Horizon	Finite, T = 100 steps
Canonical Comparison	Behavioral team theory; TR-3 loyalty model

Formal Specification

Mathematical Framework (TR-3)

Team Production Function: $Q(\mathbf{a}) = \omega \cdot \left(\sum_{i=1}^{n} a_i\right)^\beta$

Loyalty Modifier: $L_i = \theta_i \cdot \left[\phi_B \cdot \bar{\pi}_{-i} + \phi_C \cdot c \cdot a_i\right]$

Where:

$\theta_i \in [0,1]$ is agent $i$’s loyalty score
$\phi_B = 0.8$ is the loyalty benefit strength (care for teammates)
$\phi_C = 0.3$ is the cost tolerance strength (reduced burden from own effort)
$\bar{\pi}_{-i}$ is the average payoff of teammates

Loyalty-Augmented Utility: $U_i = \pi_i^{team} + L_i$

Loyalty Evolution

Loyalty scores evolve based on cooperation:

if cooperation_rate >= 0.5: loyalty += 0.02 * cooperation_rate  # Build slowly
else: loyalty -= 0.05 * (0.5 - cooperation_rate)  # Erode faster

Key Insight

High-loyalty agents ($\theta_i \approx 1$) receive substantial bonuses when:

Teammates are doing well ($\phi_B \cdot \bar{\pi}_{-i}$)
They contribute high effort themselves ($\phi_C \cdot c \cdot a_i$)

This creates intrinsic motivation for sustained cooperation.

Environment Specification

Basic Usage

import coopetition_gym
import numpy as np

# Create environment with default loyalty parameters
env = coopetition_gym.make("LoyaltyTeam-v0")

# Or customize loyalty strengths
env = coopetition_gym.make(
    "LoyaltyTeam-v0",
    phi_B=0.9,  # Strong teammate-welfare bonus
    phi_C=0.4,  # Moderate cost tolerance
)

obs, info = env.reset(seed=42)

# Sustained cooperation builds loyalty
for step in range(50):
    # High cooperation
    actions = np.array([40.0, 40.0, 40.0, 40.0])
    obs, rewards, terminated, truncated, info = env.step(actions)

print(f"Mean loyalty after 50 steps: {info['mean_loyalty']:.2f}")
print(f"Loyalty lift over Nash: {info['loyalty_lift']:.2f}x")

Parameters

Parameter	Default	Description
`n_agents`	4	Number of team members
`omega`	25.0	Productivity factor
`beta`	0.7	Returns to scale
`c`	1.0	Effort cost coefficient
`phi_B`	0.8	Loyalty benefit strength
`phi_C`	0.3	Cost tolerance strength
`max_steps`	100	Maximum timesteps
`render_mode`	None	Rendering mode

Spaces

Observation Space

Type: Box Dtype: float32

Includes actions, trust matrix, reputation, interdependence, loyalty scores, and step info.

Action Space

Type: Box Shape: (n_agents,) Dtype: float32 Range: [0.0, 50.0] for each agent

Uniaxial Treatment: This environment uses the single-dimension action space characteristic of Coopetition-Gym v1.x. Competition emerges through the free-rider dynamic modulated by loyalty mechanisms rather than explicit competitive actions.

Metrics and Info

The info dictionary contains:

Key	Type	Description
`step`	int	Current timestep
`team_output`	float	Q(a) = ω·(Σaᵢ)^β
`nash_equilibrium`	float	Theoretical Nash effort
`social_optimum`	float	Theoretical optimal effort
`mean_loyalty`	float	Average loyalty score
`loyalty_scores`	list	Per-agent loyalty [0,1]
`loyalty_lift`	float	Output ratio vs. Nash baseline
`team_cohesion`	float	Weighted loyalty metric
`free_rider_count`	int	Agents below threshold
`efficiency_ratio`	float	Actual/optimal effort ratio

Key Dynamics

Loyalty Building

Sustained cooperation builds loyalty over time:

Behavior	Loyalty Change
High cooperation (>50%)	+0.02 × rate
Low cooperation (<50%)	-0.05 × deficit
Sustained high	Approaches 1.0
Sustained low	Approaches 0.0

Loyalty Lift

The key metric measuring TR-3 effectiveness:

\[\text{Loyalty Lift} = \frac{\text{Actual Team Output}}{\text{Nash Equilibrium Output}}\]

With effective loyalty mechanisms, teams can achieve lift > 2.0.

Phase Dynamics

Early phase (steps 1-20): Loyalty neutral (~0.5), behavior determines trajectory
Middle phase (steps 20-60): Loyalty diverges based on cooperation patterns
Late phase (steps 60-100): Loyalty stable, determines final outcomes

Comparison with TeamProduction-v0

Aspect	TeamProduction-v0	LoyaltyTeam-v0
Loyalty mechanisms	None	Full TR-3
Expected equilibrium	Nash (low)	Above Nash possible
Free-rider penalty	Light	Moderate + loyalty erosion
Cooperation incentive	External only	Intrinsic (loyalty bonus)
Typical output	~baseline	1.5-2.5× baseline

Research Applications

LoyaltyTeam-v0 is suitable for studying:

Loyalty Dynamics: How loyalty builds and erodes
Cooperation Sustainability: Can loyalty sustain above-Nash cooperation?
Mechanism Design: Optimal φ_B and φ_C settings
Team Formation: Which agents develop loyalty?
Free-Rider Response: Do loyal agents punish free-riders?

Example: Loyalty Evolution Tracking

import coopetition_gym
import numpy as np

env = coopetition_gym.make("LoyaltyTeam-v0")
obs, info = env.reset(seed=42)

loyalty_history = []

for step in range(100):
    # Cooperative strategy
    actions = np.array([35.0, 35.0, 35.0, 35.0])
    obs, rewards, terminated, truncated, info = env.step(actions)

    loyalty_history.append(info['mean_loyalty'])

    if terminated or truncated: break

# Loyalty should increase over time with cooperation
print(f"Initial loyalty: {loyalty_history[0]:.2f}")
print(f"Final loyalty: {loyalty_history[-1]:.2f}")
print(f"Loyalty lift: {info['loyalty_lift']:.2f}x")

TeamProduction-v0: Baseline without loyalty
CoalitionFormation-v0: Dynamic membership
ApacheProject-v0: Validated case study

References

Pant, V. & Yu, E. (2026). Computational Foundations for Strategic Coopetition: Formalizing Collective Action and Loyalty. arXiv:2601.16237
Akerlof, G. & Kranton, R. (2010). Identity Economics. Princeton University Press.
Kandel, E. & Lazear, E. (1992). Peer Pressure and Partnerships. Journal of Political Economy.