Skip to the content.

DynamicPartnerSelection-v0

Category: Ecosystem Environment Agents: N (configurable) Difficulty: Advanced Source: coopetition_gym/envs/ecosystem_envs.py


Overview

DynamicPartnerSelection-v0 models a marketplace where N agents observe public reputation scores and form partnerships dynamically. This environment captures scenarios like freelance markets, academic collaborations, and business-to-business matching.

The key challenge is learning to:

  1. Interpret reputation signals as indicators of partner quality
  2. Maintain own reputation to attract high-quality partners
  3. Navigate social learning dynamics in a multi-agent setting

Dynamic Partner Network Six-agent marketplace with evolving public reputations. Node size indicates reputation score, edge color shows pairwise trust levels (green=high, red=low). Cooperation builds reputation and attracts quality partners.


MARL Classification

Property Value
Game Type Markov Game (N-player, general-sum); Mean-Field Game approximation for large N
Cooperation Structure Mixed-Motive with reputation externalities (cooperation builds public reputation)
Observability Full state + public reputation scores (extended observation)
Communication Implicit (actions + public reputation signals)
Agent Symmetry Symmetric (homogeneous agents with equal endowments and dependencies)
Reward Structure Mixed with uniform interdependence (D_ij = 0.40 for all i≠j)
Action Space Continuous: A_i = [0, 100] for all agents
State Dynamics Deterministic
Horizon Finite, T = 50 steps
Canonical Comparison Reputation games; cf. Resnick & Zeckhauser (2002), rating systems in matching markets

Formal Specification

This environment is formalized as an N-player symmetric Markov Game with public reputation signals.

Agents

N = {1, …, n} where n = n_agents (default 6), all symmetric:

Property Value Description
Endowment 100.0 Equal for all agents
Baseline 35.0 35% cooperation threshold
Bargaining α 1/N Equal surplus sharing

State Space

S ⊆ ℝ^d where d = N + 3N² + 1 + N (standard state + reputation vector)

Component Dimension Description
Actions N Previous cooperation levels
Trust Matrix Pairwise trust τ_ij
Reputation Damage Reputation damage R_ij
Interdependence Uniform dependencies D_ij = 0.40
Timestep 1 Normalized t/T
Public Reputations N Global reputation scores ρ_i ∈ [0,1]

Total dimension: d = N + 3N² + 1 + N = 4N² + 2N + 1

Action Space

For each agent i ∈ {1, …, N}:

A_i = [0, 100] ⊂ ℝ

Uniaxial Treatment: This environment uses the single-dimension action space characteristic of Coopetition-Gym v1.x. Partner competition emerges through reputation-based selection rather than explicit competitive actions.

Interdependence Matrix (Fully Connected)

D_ij = 0.40 for all i ≠ j
D_ii = 0.00

All agents value each other’s outcomes equally, no preferential partnerships.

Reputation Dynamics

Public Reputation Update (exponential moving average):

ρ_i(t+1) = 0.9 · ρ_i(t) + 0.1 · (a_i(t) / e_i)

Properties:

Reward Function

Standard integrated utility with symmetric weights:

r_i = π_i + 0.40 · Σ_{j≠i} π_j

Value function uses $\theta = 18.0$, $\gamma = 0.55$ (moderate complementarity).

Episode Structure

Initial State


Game-Theoretic Background

Reputation-Based Matching

In many real markets:

The Reputation Investment Problem

Short-term incentive: Free-ride on current reputation by defecting

Long-term incentive: Maintain or build reputation for:

Social Learning Dynamics

Agents learn from observing:


Environment Specification

Basic Usage

import coopetition_gym
import numpy as np

# Create environment with 6 agents (default)
env = coopetition_gym.make("DynamicPartnerSelection-v0")

# Or customize
env = coopetition_gym.make("DynamicPartnerSelection-v0", n_agents=8)

obs, info = env.reset(seed=42)

# Run episode
for step in range(50):
    # All agents choose cooperation levels
    actions = np.random.uniform(40, 70, size=env.n_agents)
    obs, rewards, terminated, truncated, info = env.step(actions)

print(f"Reputation ranking: {info['reputation_ranking']}")

Parameters

Parameter Default Description
n_agents 6 Number of agents in marketplace
max_steps 50 Maximum timesteps per episode
reputation_weight 0.5 How much reputation affects compatibility
render_mode None Rendering mode

Agent Configuration

Endowments

All agents have equal endowment:

Bargaining Shares

Equal bargaining power:

Interdependence

Fully connected graph with uniform dependencies:

D[i,j] = 0.40 for all i ≠ j
D[i,i] = 0.00

All agents value each other’s outcomes equally.


Reputation System

Global Reputation Scores

Each agent maintains a public reputation score in [0, 1]:

# Initial reputation
reputations = [0.5] * n_agents  # All start at 0.5

# Update after each step
for i, agent in enumerate(agents): coop_score = actions[i] / endowments[i]  # [0, 1]
    reputations[i] = 0.9 * reputations[i] + 0.1 * coop_score

Reputation Properties

  1. Persistence: Exponential moving average (α = 0.1)
  2. Observable: Included in extended observation space
  3. Bounded: Clamped to [0, 1]
  4. Episode persistence: Can carry across episodes (optional)

Resetting Reputation

# Reset reputation at episode start
obs, info = env.reset(options={"reset_reputation": True})

# Keep reputation from previous episode
obs, info = env.reset(options={"reset_reputation": False})

Observation Space

Extended Observation

The observation includes standard components plus reputation:

Component Shape Description
Standard N + 3N² + 1 Actions, trust, reputation damage, interdependence, step
Reputation N Public reputation scores

Total dimension: Base + N

Reputation Observation

Agents can see all other agents’ reputations:

obs[-n_agents:]  # Reputation scores for all agents

Trust Dynamics

Parameters

Parameter Symbol Value Description
Trust Building Rate λ⁺ 0.12 Moderate building
Trust Erosion Rate λ⁻ 0.35 Standard erosion
Reputation Damage $\mu_R$ 0.60 Strong reputation effects
Reputation Decay $\delta_R$ 0.015 Very slow forgetting
Interdependence Amp. ξ 0.45 Moderate amplification
Signal Sensitivity κ 1.2 Moderate sensitivity
Initial Trust τ₀ 0.50 Neutral start

Reputation-Trust Interaction

High reputation leads to:


Value Function

Parameters

Parameter Value Description
θ 18.0 Moderate logarithmic scale
γ 0.55 Moderate complementarity

Metrics and Info

The info dictionary includes:

Key Type Description
step int Current timestep
public_reputations ndarray All agents’ reputations
reputation_ranking list Agents sorted by reputation (descending)
mean_reputation float Market average reputation
reputation_std float Reputation inequality measure
mean_cooperation float Average cooperation level

Strategic Analysis

Reputation Investment Strategies

Build Reputation Early:

Maintain Reputation:

Exploit Reputation:

Partner Selection Logic

Agents implicitly “select” partners through action levels:

Equilibrium Dynamics

In equilibrium:


Example: Reputation-Based Strategy

import coopetition_gym
import numpy as np

env = coopetition_gym.make("DynamicPartnerSelection-v0", n_agents=6)
obs, info = env.reset(seed=42, options={"reset_reputation": True})

# Track my reputation (agent 0)
my_reputation_history = []

for step in range(50):
    # Strategy: Cooperate proportionally to own reputation
    # High reputation -> maintain it; Low -> build it
    my_rep = info['public_reputations'][0]

    if my_rep < 0.6:
        # Build reputation
        my_action = 80.0  # 80% cooperation
    else:
        # Maintain with moderate cooperation
        my_action = 60.0  # 60% cooperation

    # Other agents: Random strategies
    other_actions = np.random.uniform(30, 70, size=5)

    actions = np.concatenate([[my_action], other_actions])
    obs, rewards, terminated, truncated, info = env.step(actions)

    my_reputation_history.append(info['public_reputations'][0])

print(f"Final reputation: {my_reputation_history[-1]:.3f}")
print(f"My rank: {info['reputation_ranking'].index(0) + 1} of 6")

Research Applications

DynamicPartnerSelection-v0 is suitable for studying:



References

  1. Resnick, P. & Zeckhauser, R. (2002). Trust Among Strangers in Internet Transactions. Advances in Applied Microeconomics.
  2. Dellarocas, C. (2003). The Digitization of Word of Mouth. Management Science.
  3. Pant, V. & Yu, E. (2025). Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics. arXiv:2510.24909