PlatformEcosystem-v0

Category: Ecosystem Environment Agents: 1 + N (Platform + Developers) Difficulty: Advanced Source: coopetition_gym/envs/ecosystem_envs.py

Overview

PlatformEcosystem-v0 models a platform economy with one central platform (Agent 0) and N developer agents (Agents 1-N). This environment captures the dynamics of app stores, online marketplaces, cloud platforms, and other multi-sided markets.

The platform must balance short-term revenue extraction against long-term ecosystem health. Developers must decide how much to invest in the platform given its policies and the behavior of other developers.

MARL Classification

Property	Value
Game Type	Markov Game (N+1 players, general-sum); Mean-Field approximation applicable for large N
Cooperation Structure	Mixed-Motive with hub-spoke topology (platform vs developers, no inter-developer competition)
Observability	Full (all agents observe complete state)
Communication	Implicit (through actions only)
Agent Symmetry	Heterogeneous (1 platform + N homogeneous developers)
Reward Structure	Mixed with hub-spoke interdependence (developers: D=0.75 on platform; platform: D=0.25 per developer)
Action Space	Continuous: A_platform=[0,150], A_dev=[0,80]
State Dynamics	Deterministic
Horizon	Finite, T=100 (early termination on ecosystem collapse)
Canonical Comparison	Multi-agent platform games; cf. Mogul (ICML 2020), Multi-Principal Multi-Agent problems

Formal Specification

This environment is formalized as an (N+1)-player Markov Game with hub-spoke structure.

Agents

N = {Platform} ∪ {Dev_1, …, Dev_N} where N = n_developers (default 4)

Role	Count	Endowment	Baseline	Bargaining α
Platform	1	150.0	52.5 (35%)	0.30
Developer	N	80.0 each	28.0 (35%)	0.70/N each

State Space

S ⊆ ℝ^d where d = (N+1) + 3(N+1)² + 1

Component	Dimension	Description
Actions	N+1	Previous cooperation levels
Trust Matrix	(N+1)²	Pairwise trust τ_ij
Reputation Matrix	(N+1)²	Reputation damage R_ij
Interdependence	(N+1)²	Hub-spoke dependencies D_ij
Timestep	1	Normalized t/T

Dimension formula: d = (N+1) + 3(N+1)² + 1

n_developers	Total Agents	State Dim
4	5	81
8	9	253
16	17	885

Action Space

Platform: A_0 = [0, 150] ⊂ ℝ
Each Developer: A_i = [0, 80] ⊂ ℝ for i ∈ {1, …, N}

Uniaxial Treatment: This environment uses the single-dimension action space characteristic of Coopetition-Gym v1.x. Platform-developer competition emerges through hub-spoke interdependence asymmetry rather than explicit competitive actions.

Interdependence Matrix (Hub-Spoke Topology)

D = | 0.00   0.25   0.25   ...  0.25  |   ← Platform row (depends equally on all devs)
    | 0.75   0.00   0.00   ...  0.00  |   ← Dev 1 (depends heavily on platform)
    | 0.75   0.00   0.00   ...  0.00  |   ← Dev 2
    |  ⋮      ⋮      ⋮     ⋱    ⋮    |
    | 0.75   0.00   0.00   ...  0.00  |   ← Dev N

Key properties:

Platform→Developers: D[0,j] = 0.25 for all j>0 (moderate, distributed dependency)
Developer→Platform: D[i,0] = 0.75 for all i>0 (high, concentrated dependency)
Developer→Developer: D[i,j] = 0.00 for i,j>0 (no direct dependencies)

Transition Dynamics

Trust dynamics follow TR-2 with ecosystem-specific parameters:

Trust Update:

τ_ij(t+1) = clip(τ_ij(t) + Δτ_ij, 0, Θ_ij)

Critical Ecosystem Metric:

avg_dev_trust = (1/N) · Σᵢ τ[i,0]   (developers' trust in platform)

If avg_dev_trust < 0.15, ecosystem collapses (termination).

Reward Function

Platform reward:

r_platform = π_platform + 0.25 · Σⱼ π_dev_j

Developer i reward:

r_dev_i = π_dev_i + 0.75 · π_platform

where private payoffs use $\theta = 25.0$ and $\gamma = 0.75$ (strong network effects).

Episode Structure

Horizon: T = 100 steps
Truncation: t ≥ T
Termination: avg(τ[1:,0]) < 0.15 (ecosystem death)
Discount: γ = 1.0

Initial State

τ_ij(0) = 0.60 (baseline ecosystem trust)
R_ij(0) = 0.00
D fixed as hub-spoke matrix above

Game-Theoretic Background

Platform Economics

Multi-sided platforms create value by:

Network Effects: More developers attract more users, increasing value
Complementarity: Developer contributions complement the platform’s infrastructure
Ecosystem Health: Trust between platform and developers enables investment

The Platform’s Dilemma

Short-term incentive: Extract maximum value (high fees, restrictive policies)

Long-term incentive: Maintain developer trust and participation to:

Sustain network effects
Encourage quality investments
Prevent developer exodus

Developer Dynamics

Developers face:

Platform dependency: High switching costs once invested
Collective action problem: Individual defection may not trigger platform response
Trust fragility: Platform abuse can trigger coordinated exit

Theoretical Foundations

Relationship to Classical Game Theory

PlatformEcosystem-v0 extends the classical two-sided markets literature by incorporating: 1. Dynamic trust: Rather than static participation decisions, agents maintain evolving trust relationships

Hub-spoke topology: Explicit modeling of platform centrality in interdependence structure
Ecosystem collapse: Endogenous termination from collective trust breakdown
Continuous investment: Graduated participation rather than binary join/leave decisions

Key Theoretical Results

Stage-Game Analysis: In the single-shot version (ignoring trust dynamics):

Platform’s myopic optimum: a_P* ≈ 52.5 (baseline contribution)
- At this level, platform extracts maximum surplus from developers’ investments
Developers’ best response: a_D* ≈ 28-40 (defensive given platform extraction)
- Individual developer cannot profitably increase investment unilaterally
Nash equilibrium: (a_P, a_D) ≈ (55, 35) - Mutual low investment
Pareto-optimal outcome: (a_P, a_D) ≈ (120, 65) - High mutual investment

Multi-Agent Coordination: With N developers, additional coordination challenges emerge:

Free-rider problem: Individual developer’s defection has diluted effect
Collective punishment: Coordinated developer response required to discipline platform
Mean-field approximation: For large N, individual developer impact on platform → 0

Repeated Game Equilibria: With T = 100 repetitions and trust dynamics:

Platform exploitation equilibrium: Platform extracts until trust threshold approached
Cooperative equilibrium: High mutual investment sustained by trust
Trigger equilibrium: Developers coordinate punishment of platform defection

The critical threshold avg_dev_trust < 0.15 creates a collective action trigger that enables developer coordination.

Connections to Prior Work

Concept	PlatformEcosystem-v0	Classical Reference
Two-sided markets	Hub-spoke D matrix	Rochet & Tirole (2003)
Network effects	$\gamma = 0.75$ complementarity	Katz & Shapiro (1985)
Platform governance	Trust dynamics	Evans & Schmalensee (2016)
Ecosystem collapse	Trust threshold termination	Mean-field game literature
Multi-homing	Developer D = 0.75	Armstrong (2006)

Literature Connections

Rochet & Tirole (2003): Two-sided markets and platform competition. PlatformEcosystem-v0 operationalizes:

Platform intermediation → Hub-spoke interdependence
Cross-group externalities → Complementarity parameter $\gamma = 0.75$
Participation decisions → Continuous investment levels

Parker & Van Alstyne (2005): Two-sided network effects. The environment captures:

Same-side effects → Developer-developer dynamics (indirect via platform trust)
Cross-side effects → Platform-developer interdependence (D matrix)
Network scaling → State dimension grows as O(N²)

Mogul (ICML 2020): Multi-agent platform optimization. Similar structure with:

Central coordinating agent (platform)
Multiple peripheral agents (developers)
Asymmetric dependencies and information

Mean-Field Approximation

For large N, the environment admits a mean-field game approximation: 1. Developer anonymity: Individual developer impact on platform → 1/N → 0

Platform aggregates: Platform observes mean developer behavior
Symmetric equilibrium: All developers play identical mixed strategies
Tractable analysis: Reduces N+1 player game to 2-player structure

The mean-field limit:

lim_{N→∞} Platform sees: avg(a_dev) and avg(τ_dev→platform)

This makes the environment suitable for both exact (small N) and approximate (large N) analysis.

Environment Specification

Basic Usage

import coopetition_gym
import numpy as np

# Create environment with 4 developers (default)
env = coopetition_gym.make("PlatformEcosystem-v0")

# Or customize number of developers
env = coopetition_gym.make("PlatformEcosystem-v0", n_developers=6)

obs, info = env.reset(seed=42)

# Run episode
for step in range(100):
    # Platform invests 90 out of 150 (60%)
    platform_action = 90.0

    # Developers each invest 50 out of 80 (62.5%)
    developer_actions = [50.0] * env.n_agents - 1

    actions = np.array([platform_action] + developer_actions)
    obs, rewards, terminated, truncated, info = env.step(actions)

    if terminated: print(f"Ecosystem collapsed at step {step}")
        break

print(f"Platform reward: {rewards[0]:.1f}")
print(f"Mean developer reward: {np.mean(rewards[1:]):.1f}")
print(f"Developer trust in platform: {info['developer_trust_in_platform']:.2f}")

Parameters

Parameter	Default	Description
`n_developers`	4	Number of developer agents
`max_steps`	100	Maximum timesteps per episode
`render_mode`	None	Rendering mode

Agent Configuration

Endowments

Agent	Role	Endowment	Description
0	Platform	150.0	Large infrastructure budget
1-N	Developers	80.0 each	Individual development capacity

Bargaining Shares (Alpha)

Agent	Alpha	Description
Platform	0.30	Platform captures 30% of ecosystem surplus
Each Developer	0.70/N	Remaining 70% split among developers

Example with 4 developers:

Platform: 30%
Each developer: 17.5% (70% / 4)

Interdependence Structure

Hub-Spoke Topology

The platform acts as a hub with developers as spokes:

              Developer 1
                   |
Developer 4 ---- Platform ---- Developer 2
                   |
              Developer 3

Dependency Matrix

Created using create_hub_spoke_interdependence():

D = [[ 0.00,  0.25,  0.25,  0.25,  0.25 ],   # Platform's row
     [ 0.75,  0.00,  0.00,  0.00,  0.00 ],   # Dev 1's row
     [ 0.75,  0.00,  0.00,  0.00,  0.00 ],   # Dev 2's row
     [ 0.75,  0.00,  0.00,  0.00,  0.00 ],   # Dev 3's row
     [ 0.75,  0.00,  0.00,  0.00,  0.00 ]]   # Dev 4's row

Interpretation:

D[0,j] = 0.25: Platform depends moderately on each developer
D[i,0] = 0.75: Developers depend heavily on platform
D[i,j] = 0.00 (i,j > 0): Developers don’t directly depend on each other

Trust Dynamics

Parameters

Parameter	Symbol	Value	Description
Trust Building Rate	λ⁺	0.08	Slower institutional trust building
Trust Erosion Rate	λ⁻	0.25	Moderate erosion
Reputation Damage	$\mu_R$	0.45	Moderate damage
Reputation Decay	$\delta_R$	0.02	Standard decay
Interdependence Amp.	ξ	0.40	Lower than dyadic (more actors)
Signal Sensitivity	κ	1.0	Standard sensitivity
Initial Trust	τ₀	0.60	Baseline platform trust

Critical Trust Metric

The most important trust metric is average developer trust in platform:

developer_trust = mean(trust_matrix[1:, 0])  # All developers' trust in platform

If this falls below 0.15, the ecosystem “dies” (episode terminates).

Termination Conditions

Normal Truncation

Episode ends at max_steps (100) if ecosystem persists.

Ecosystem Death

Critical condition: If average developer trust in platform falls below 0.15:

if mean(trust_matrix[1:, 0]) < 0.15: terminated = True
    # Ecosystem collapse - developers abandon platform

This represents:

Mass developer exodus
Platform becoming unviable
Network effects reversing

Value Function

Parameters

Parameter	Value	Description
θ	25.0	Higher scale for platform (larger transactions)
γ	0.75	Strong complementarity (network effects)

Network Effects

The high complementarity ($\gamma = 0.75$) means:

Value grows superlinearly with total investment
Platform and developers benefit from mutual cooperation
Defection by any party reduces ecosystem value

Reward Structure

Platform Rewards

Platform_payoff = kept_resources + f(platform_action) + 0.30 × synergy + Σ(0.25 × developer_payoffs)

The platform benefits from:

Its own investment returns
30% of ecosystem synergy
Moderate weight on developer success

Developer Rewards

Developer_i_payoff = kept_resources + f(action_i) + (0.70/N) × synergy + 0.75 × platform_payoff

Developers benefit from:

Their own investment returns
Share of ecosystem synergy
Strong weight on platform success (75%)

Metrics and Info

The info dictionary includes:

Key	Type	Description
`step`	int	Current timestep
`platform_investment`	float	Platform’s action
`mean_developer_investment`	float	Average developer action
`developer_investment_std`	float	Variation among developers
`developer_trust_in_platform`	float	Critical ecosystem health metric
`platform_trust_in_developers`	float	Platform’s view of developers
`total_ecosystem_value`	float	Total value created

Strategic Analysis

Platform Strategies

Extractive Strategy:

Invest minimally in infrastructure
Capture maximum surplus
Risk: Developer exit if trust falls

Growth Strategy:

Invest heavily in infrastructure
Build developer trust
Sacrifice short-term for ecosystem growth

Balanced Strategy:

Moderate investment matching developers
Maintain sustainable trust levels
Optimize for long-term value

Developer Strategies

High Investment:

Commit heavily to platform
Benefit from network effects
Risk: Platform exploitation

Defensive Investment:

Invest minimally above threshold
Reduce platform dependency
Maintain exit option

Coordinated Response:

Mirror platform’s behavior
Punish exploitation collectively
Requires implicit coordination

Example: Ecosystem Dynamics

import coopetition_gym
import numpy as np

env = coopetition_gym.make("PlatformEcosystem-v0", n_developers=4)
obs, info = env.reset(seed=42)

# Track ecosystem health
trust_history = []
value_history = []

for step in range(100):
    # Platform: Responsive to developer trust
    dev_trust = info.get('developer_trust_in_platform', 0.6)

    # High trust -> high investment; Low trust -> defensive
    platform_action = 150.0 * min(0.8, dev_trust + 0.2)

    # Developers: Tit-for-tat with platform
    platform_last = obs[0] if step > 0 else 75.0
    dev_action = 80.0 * (platform_last / 150.0)

    actions = np.array([platform_action] + [dev_action] * 4)
    obs, rewards, terminated, truncated, info = env.step(actions)

    trust_history.append(info['developer_trust_in_platform'])
    value_history.append(info['total_ecosystem_value'])

    if terminated: print(f"Ecosystem collapsed at step {step}")
        break

print(f"Final developer trust: {trust_history[-1]:.3f}")
print(f"Average ecosystem value: {np.mean(value_history):.1f}")

Research Applications

PlatformEcosystem-v0 is suitable for studying:

Platform Economics: Multi-sided market dynamics
Mechanism Design: Incentive structures for ecosystems
Network Effects: Value creation in platform economies
Credit Assignment: MARL challenge with shared outcomes
Collective Action: Coordinated responses to platform behavior

Scaling Considerations

Agent Count

n_developers	Total Agents	Observation Dim	Complexity
4	5	81	Moderate
8	9	253	High
16	17	885	Very High

Recommendations

Algorithm testing: Use n_developers=4 (default)
Scalability studies: Increase progressively
Production: Consider attention-based policies for large N

DynamicPartnerSelection-v0: Peer-to-peer dynamics
ReputationMarket-v0: Market with reputation tiers
PartnerHoldUp-v0: Dyadic asymmetric power

References

Rochet, J.C. & Tirole, J. (2003). Platform Competition in Two-Sided Markets. JEEA.
Parker, G.G. & Van Alstyne, M.W. (2005). Two-Sided Network Effects. Management Science.
Pant, V. & Yu, E. (2025). Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity. arXiv:2510.18802

PlatformEcosystem-v0

Overview

MARL Classification

Formal Specification

Agents

State Space

Action Space

Interdependence Matrix (Hub-Spoke Topology)

Transition Dynamics

Reward Function

Episode Structure

Initial State

Game-Theoretic Background

Platform Economics

The Platform’s Dilemma

Developer Dynamics

Theoretical Foundations

Relationship to Classical Game Theory

Key Theoretical Results

Connections to Prior Work

Literature Connections

Mean-Field Approximation

Environment Specification

Basic Usage

Parameters

Agent Configuration

Endowments

Bargaining Shares (Alpha)

Interdependence Structure

Hub-Spoke Topology

Dependency Matrix

Trust Dynamics

Parameters

Critical Trust Metric

Termination Conditions

Normal Truncation

Ecosystem Death

Value Function

Parameters

Network Effects

Reward Structure

Platform Rewards

Developer Rewards

Metrics and Info

Strategic Analysis

Platform Strategies

Developer Strategies

Example: Ecosystem Dynamics

Research Applications

Scaling Considerations

Agent Count

Recommendations

Related Environments

References