ReputationMarket-v0
Category: Extended Environment
Agents: N (configurable)
Difficulty: Advanced
Source: coopetition_gym/envs/extended_envs.py
Overview
ReputationMarket-v0 models an N-agent competitive market where public reputation scores directly affect agent rewards through a tiered bonus system. High-reputation agents receive premium bonuses while low-reputation agents face penalties.
This environment tests:
- Reputation as strategic asset: Long-term investment in standing
- Market tier dynamics: Stratification effects
- Reputation competition: Positional goods
- Equilibrium in reputation games: Stable market states
Left: Four-tier reward structure showing Premium (1.30×), Standard (1.00×), Probation (0.70×), and Excluded (0.40×) multipliers. Right: Agent trajectories as reputations evolve toward different tiers.
MARL Classification
| Property | Value |
|---|---|
| Game Type | Markov Game (N-player, general-sum) with tiered reward modifiers |
| Cooperation Structure | Competitive-Cooperative (reputation competition, cooperation builds reputation) |
| Observability | Full (including public reputation scores and tier assignments) |
| Communication | Implicit (actions + public reputation signals) |
| Agent Symmetry | Symmetric (homogeneous endowments and capabilities) |
| Reward Structure | Mixed + tier-based multipliers (0.40× to 1.30×) |
| Action Space | Continuous: A_i = [0, 100] |
| State Dynamics | Deterministic with discrete tier transitions |
| Horizon | Finite, T = 100 |
| Canonical Comparison | Tiered reputation markets; cf. Shapiro (1983), Tadelis (1999) reputation economics |
Formal Specification
This environment is formalized as an N-player Markov Game with tiered reputation bonuses.
Agents
N = {1, …, n} where n = n_agents (default 5), all symmetric:
| Property | Value |
|---|---|
| Endowment | 100.0 |
| Baseline | 35.0 |
| Bargaining α | 1/N |
State Space
S ⊆ ℝ^d where d = N + 3N² + 1 + N (standard + reputation vector)
| Component | Dimension | Description |
|---|---|---|
| Actions | N | Previous cooperation levels |
| Trust Matrix | N² | Pairwise trust |
| Reputation Damage | N² | Pairwise damage |
| Interdependence | N² | Uniform D = 0.35 |
| Timestep | 1 | Normalized t/T |
| Public Reputations | N | ρ_i ∈ [0, 1] |
Action Space
A_i = [0, 100] ⊂ ℝ for each agent
Uniaxial Treatment: This environment uses the single-dimension action space characteristic of Coopetition-Gym v1.x. Market competition emerges through reputation tier dynamics rather than explicit competitive actions.
Interdependence Matrix
D_ij = 0.35 for all i ≠ j
D_ii = 0.00
Moderate uniform interdependence across market.
Reputation Tier System
| Tier | Reputation Threshold | Reward Multiplier | Interpretation |
|---|---|---|---|
| Premium | ρ ≥ 0.80 | 1.30× | Elite status, +30% bonus |
| Standard | ρ ≥ 0.50 | 1.00× | Normal market access |
| Probation | ρ ≥ 0.25 | 0.70× | Restricted, -30% penalty |
| Excluded | ρ < 0.25 | 0.40× | Severely limited, -60% penalty |
Reputation Dynamics
ρ_i(t+1) = clip(ρ_i(t) + 0.1 · (a_i/e_i - 0.5), 0, 1)
Cooperation above 50% builds reputation; below 50% erodes it.
Reward Function
Rewards are tier-multiplied:
r_i = multiplier(ρ_i) × [π_i + 0.35 · Σ_{j≠i} π_j]
where multiplier is determined by tier assignment.
Trust Parameters
| Parameter | Symbol | Value |
|---|---|---|
| Trust Building | λ⁺ | 0.10 |
| Trust Erosion | λ⁻ | 0.30 |
| Reputation Damage | $\mu_R$ | 0.55 |
| Reputation Decay | $\delta_R$ | 0.015 |
Episode Structure
- Horizon: T = 100 steps
- Truncation: t ≥ T
- Termination: None (markets don’t collapse)
- Discount: γ = 1.0
Initial State
- τ_ij(0) = 0.50
- R_ij(0) = 0.00
- ρ_i(0) = 0.50 (all agents start in Standard tier)
Scaling
| n_agents | State Dim | Tier Competition |
|---|---|---|
| 5 | ~105 | Moderate |
| 10 | ~310 | High |
| 20 | ~1210 | Very High |
Game-Theoretic Background
Reputation Markets
Real-world examples:
- Freelance platforms: Star ratings affect job access
- Credit markets: Credit scores determine rates
- Professional services: Reputation affects pricing power
- Academic markets: Citations affect opportunities
The Tier System
Markets often feature discrete tiers:
- Premium tier: Best opportunities, highest margins
- Standard tier: Normal market conditions
- Probation tier: Limited access, lower returns
- Excluded tier: Severely restricted
Strategic Implications
Agents must balance:
- Short-term returns: Defection yields immediate gains
- Reputation investment: Cooperation builds standing
- Tier thresholds: Incentives concentrate near boundaries
Environment Specification
Basic Usage
import coopetition_gym
import numpy as np
# Create environment with 5 agents
env = coopetition_gym.make("ReputationMarket-v0", n_agents=5)
obs, info = env.reset(seed=42)
for step in range(100):
# All agents choose cooperation levels
actions = np.random.uniform(40, 70, size=5)
obs, rewards, terminated, truncated, info = env.step(actions)
print(f"Reputation ranking: {info['reputation_ranking']}")
print(f"Agent tiers: {info['agent_tiers']}")
Parameters
| Parameter | Default | Description |
|---|---|---|
n_agents |
5 | Number of market participants |
max_steps |
100 | Maximum timesteps |
reputation_visibility |
1.0 | Observation noise (1.0 = perfect) |
tier_enabled |
True | Whether tiers affect rewards |
render_mode |
None | Rendering mode |
Reputation Tier System
Tier Definitions
| Tier | Reputation Threshold | Reward Multiplier |
|---|---|---|
| Premium | ≥ 0.80 | 1.30× (30% bonus) |
| Standard | ≥ 0.50 | 1.00× (no change) |
| Probation | ≥ 0.25 | 0.70× (30% penalty) |
| Excluded | < 0.25 | 0.40× (60% penalty) |
Tier Transitions
# Reputation update
coop_score = action / endowment # [0, 1]
reputation = reputation + 0.1 * (coop_score - 0.5)
reputation = np.clip(reputation, 0, 1)
Moving up/down tiers:
- Sustained cooperation → reputation rises → tier promotion
- Sustained defection → reputation falls → tier demotion
Tier Bonuses Applied
base_reward = compute_integrated_utility(...)
tier = get_tier(reputation)
multiplier = tier_multipliers[tier]
final_reward = base_reward * multiplier
Observation Space
Extended Observation
| Component | Shape | Description |
|---|---|---|
| Standard | N + 3N² + 1 | Actions, trust, rep, interdep, step |
| Public Reputations | N | Visible reputation scores |
Total dimension: Base + N
Observation Noise
If reputation_visibility < 1.0:
noise_std = 1 - reputation_visibility
observed_rep = true_rep + np.random.normal(0, noise_std)
This models imperfect reputation observation.
Agent Configuration
Endowments
All agents have equal endowment:
- Endowment: 100.0 for each agent
Interdependence
Fully connected market:
D[i,j] = 0.35 for all i ≠ j
D[i,i] = 0.00
Trust Dynamics
Parameters
| Parameter | Symbol | Value | Description |
|---|---|---|---|
| Trust Building Rate | λ⁺ | 0.10 | Standard building |
| Trust Erosion Rate | λ⁻ | 0.30 | Standard erosion |
| Reputation Damage | $\mu_R$ | 0.55 | Moderate damage |
| Reputation Decay | $\delta_R$ | 0.015 | Slow forgetting |
| Interdependence Amp. | ξ | 0.45 | Moderate amplification |
| Signal Sensitivity | κ | 1.0 | Standard sensitivity |
| Initial Trust | τ₀ | 0.50 | Neutral start |
Value Function
Parameters
| Parameter | Value | Description |
|---|---|---|
| θ | 18.0 | Moderate logarithmic scale |
| γ | 0.50 | Moderate complementarity |
Metrics and Info
The info dictionary includes:
| Key | Type | Description |
|---|---|---|
step |
int | Current timestep |
public_reputations |
ndarray | All agents’ reputations |
reputation_ranking |
list | Agents sorted by reputation |
agent_tiers |
dict | Tier assignment per agent |
mean_reputation |
float | Market average |
reputation_inequality |
float | Standard deviation |
tier_distribution |
dict | Count per tier |
Strategic Analysis
Tier Threshold Effects
Agents near tier boundaries face strong incentives:
Near Premium threshold (0.80):
- Small reputation gain → 30% bonus
- Worth investing heavily in cooperation
Near Probation threshold (0.25):
- Small reputation loss → move from 70% to 40%
- Strong incentive to avoid this boundary
Equilibrium Dynamics
High-reputation equilibrium:
- All agents cooperate highly
- All stay in Premium tier
- Stable if no one defects
Low-reputation equilibrium:
- All agents defect
- All in Excluded tier
- Stable but suboptimal
Stratified equilibrium:
- Some Premium, some Standard, some lower
- Competition for limited Premium slots
- Persistent inequality
Competition Effects
With limited Premium slots (reputation is relative):
- Agent A improving → may push Agent B down
- Zero-sum dynamics near tier boundaries
- Positional competition
Example: Tier-Aware Strategy
import coopetition_gym
import numpy as np
env = coopetition_gym.make("ReputationMarket-v0", n_agents=5)
obs, info = env.reset(seed=42)
# I am agent 0
my_reputation_history = []
my_tier_history = []
for step in range(100): my_rep = info['public_reputations'][0]
my_tier = info['agent_tiers'].get(0, 'Standard')
# Tier-aware strategy
if my_rep < 0.30:
# Near Excluded: desperate cooperation
my_action = 90.0
elif my_rep < 0.55:
# Near Probation/Standard boundary
my_action = 70.0
elif my_rep < 0.82:
# Near Standard/Premium boundary
my_action = 75.0
else:
# In Premium: maintain with moderate cooperation
my_action = 60.0
# Other agents: random
other_actions = np.random.uniform(40, 60, size=4)
actions = np.concatenate([[my_action], other_actions])
obs, rewards, terminated, truncated, info = env.step(actions)
my_reputation_history.append(info['public_reputations'][0])
my_tier_history.append(info['agent_tiers'].get(0, 'Standard'))
# Summary
print(f"Final reputation: {my_reputation_history[-1]:.3f}")
print(f"Final tier: {my_tier_history[-1]}")
print(f"Time in Premium: {my_tier_history.count('Premium')} steps")
Research Applications
ReputationMarket-v0 is suitable for studying:
- Reputation Systems: Design and incentive effects
- Market Design: Tier structures and stratification
- Positional Competition: Relative standing games
- Credit Markets: Rating-based dynamics
- Multi-Agent RL: Learning in competitive environments
Scaling Considerations
Agent Count
| n_agents | Observation Dim | Tier Competition |
|---|---|---|
| 5 | ~105 | Moderate |
| 10 | ~310 | High |
| 20 | ~1210 | Very High |
Tier Dynamics with Scale
With more agents:
- More competition for Premium tier
- Steeper reputation gradients
- More stratification
Related Environments
- DynamicPartnerSelection-v0: Reputation without tiers
- PlatformEcosystem-v0: Platform-mediated market
- CooperativeNegotiation-v0: Explicit contracts
References
- Shapiro, C. (1983). Premiums for High Quality Products as Returns to Reputations. Quarterly Journal of Economics.
- Tadelis, S. (1999). What’s in a Name? Reputation as a Tradeable Asset. American Economic Review.
- Pant, V. & Yu, E. (2025). Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics. arXiv:2510.24909