CooperativeNegotiation-v0

Category: Extended Environment Agents: 2 Difficulty: Advanced Source: coopetition_gym/envs/extended_envs.py

Overview

CooperativeNegotiation-v0 implements a multi-round negotiation environment with explicit commitment mechanics. Agents submit proposals that, when aligned, form binding agreements. Violating these agreements incurs significant penalties.

This environment tests:

Communication through actions: Signaling intentions via proposals
Trust building: Establishing credibility through consistency
Commitment and enforcement: Understanding contract dynamics
Breach penalties: Learning consequences of defection

Negotiation Flow Dynamics Left: Proposal convergence as agents adjust positions toward agreement. Right: Agreement status timeline showing negotiation (blue), agreement (green), and breach (red) phases with 3× penalty multiplier.

MARL Classification

Property	Value
Game Type	Markov Game with endogenous commitment (contract formation)
Cooperation Structure	Mixed-Motive with enforceable agreements
Observability	Full (including agreement status and terms)
Communication	Implicit proposals (actions encode offers) + explicit agreements
Agent Symmetry	Symmetric
Reward Structure	Mixed + breach penalties (3× multiplier on violations)
Action Space	Continuous: A_i = [0, 100] (proposals/commitments)
State Dynamics	Deterministic with discrete agreement transitions
Horizon	Finite, T = 100
Canonical Comparison	Contract games with commitment; cf. Crawford & Sobel (1982), Raiffa (1982) negotiation theory

Formal Specification

This environment is formalized as a 2-player Markov Game with endogenous agreement formation.

Agents

N = {1, 2} (symmetric negotiators)

Property	Value
Endowment	100.0
Baseline	35.0
Bargaining α	0.50

State Space

S ⊆ ℝ¹⁸ (extended with agreement info)

Component	Dimension	Description
Standard State	15	Actions, trust, reputation, interdependence, time
Agreement Levels	2	Agreed cooperation levels (or -1 if no agreement)
Has Agreement	1	Boolean flag (0 or 1)

Action Space

A_i = [0, 100] ⊂ ℝ representing either:

Pre-agreement: Proposal for cooperation level
Post-agreement: Actual cooperation level (subject to breach detection)

Uniaxial Treatment: This environment uses the single-dimension action space characteristic of Coopetition-Gym v1.x. Competition emerges through commitment enforcement and breach penalties rather than explicit competitive actions.

Agreement Formation

Agreement condition: Proposals converge within threshold

|a_1 - a_2| ≤ agreement_threshold (default: 10.0)

When agreement forms:

agreed_level = (a_1 + a_2) / 2

Breach Detection

Post-agreement, deviations trigger penalties:

if |a_i - agreed_level_i| > agreement_threshold: breach_penalty = breach_multiplier × |a_i - agreed_level_i|
    r_i = r_i - breach_penalty

where breach_multiplier = 3.0 (default).

Trust Parameters

Parameter	Symbol	Value	Note
Trust Building	λ⁺	0.12	Faster (agreement builds trust)
Trust Erosion	λ⁻	0.40	High (breach penalty)
Reputation Damage	$\mu_R$	0.65	Strong (breach = reputation damage)
Reputation Decay	$\delta_R$	0.02	Standard

Reward Function

r_i = π_i + 0.55 · π_j - breach_penalty_i

Breach penalty makes defection costly post-agreement.

Episode Structure

Horizon: T = 100 steps
Truncation: t ≥ T
Termination: mean(τ) < 0.05
Discount: γ = 1.0

Initial State

τ_ij(0) = 0.50 (neutral)
R_ij(0) = 0.00
has_agreement = False
agreed_levels = None

Game-Theoretic Background

Negotiation Theory

Real-world negotiations involve:

Proposal exchange: Parties state positions
Convergence: Moving toward mutual agreement
Commitment: Agreeing to specific terms
Enforcement: Penalties for violation

The Commitment Problem

Without commitment mechanisms:

Agreements are cheap talk
Defection is tempting post-agreement
Trust is hard to establish

With commitment mechanisms:

Agreements become credible
Breach penalties align incentives
Trust can be built through consistency

Environment Specification

Basic Usage

import coopetition_gym
import numpy as np

env = coopetition_gym.make("CooperativeNegotiation-v0")
obs, info = env.reset(seed=42)

for step in range(100):
    # Actions are proposals/commitments
    agent_0_proposal = 55.0  # Propose 55% cooperation
    agent_1_proposal = 52.0  # Propose 52% cooperation

    actions = np.array([agent_0_proposal, agent_1_proposal])
    obs, rewards, terminated, truncated, info = env.step(actions)

    if info['agreement_reached']: print(f"Agreement reached at step {step}!")
        print(f"Agreed levels: {info['current_agreement']}")

Parameters

Parameter	Default	Description
`max_steps`	100	Maximum timesteps
`agreement_threshold`	10.0	Max proposal difference for agreement
`breach_penalty_multiplier`	3.0	Penalty for violating agreement
`negotiation_rounds`	5	Rounds before commitment (unused in current)
`render_mode`	None	Rendering mode

Agreement Mechanics

Reaching Agreement

An agreement is reached when proposals are close:

proposal_diff = abs(action_0 - action_1)
if proposal_diff <= agreement_threshold: agreement_reached = True
    agreed_level = (action_0 + action_1) / 2  # Average

Agreement State

Once an agreement is reached:

Agreed levels are recorded
Subsequent actions are compared to agreement
Deviations trigger breach penalties

Breach Detection

if has_agreement: for agent in [0, 1]: deviation = abs(action[agent] - agreement[agent])
        if deviation > agreement_threshold: breach_occurred = True
            penalty = breach_penalty_multiplier * deviation
            rewards[agent] -= penalty

Observation Space

Extended Observation

Component	Shape	Description
Standard	15	Actions, trust, rep, interdep, step
Agreement Levels	2	Agreed cooperation levels (or -1)
Has Agreement	1	Boolean flag

Total dimension: 18

Agreement Information in Observation

# Check agreement status from observation
has_agreement = obs[-1] > 0.5  # Last element is flag
if has_agreement: agreed_levels = obs[-3:-1]  # Second-to-last elements

Reward Structure

Base Rewards

Standard integrated utility (see TrustDilemma-v0).

Breach Penalty

When an agent deviates from agreement:

penalty = breach_penalty_multiplier × |action - agreed_level|
reward = base_reward - penalty

With multiplier = 3.0:

Deviation of 10 units → penalty of 30
Deviation of 20 units → penalty of 60

This makes breach very costly.

Trust Dynamics

Parameters

Parameter	Symbol	Value	Description
Trust Building Rate	λ⁺	0.12	Moderate building
Trust Erosion Rate	λ⁻	0.40	High erosion (breach penalty)
Reputation Damage	$\mu_R$	0.65	Strong reputation effects
Reputation Decay	$\delta_R$	0.02	Standard decay
Interdependence Amp.	ξ	0.55	Moderate amplification
Signal Sensitivity	κ	1.2	Enhanced sensitivity
Initial Trust	τ₀	0.50	Neutral start

Breach Impact on Trust

Breaching an agreement:

Triggers trust erosion (high λ⁻)
Adds reputation damage (high $\mu_R$)
Affects future interactions

Metrics and Info

The info dictionary includes:

Key	Type	Description
`step`	int	Current timestep
`agreement_reached`	bool	Whether agreement exists
`current_agreement`	tuple	Agreed levels (or None)
`breach_occurred`	bool	Whether breach this step
`total_agreements`	int	Cumulative agreements
`total_breaches`	int	Cumulative breaches
`proposal_convergence`	float	1 - (diff / 100)

Strategic Analysis

Negotiation Strategies

Gradual Convergence:

Start with moderate proposals
Adjust based on partner’s response
Converge to mutually acceptable level

Aggressive Opening:

Start with favorable proposal
See if partner concedes
Risk: may delay agreement

Matching Strategy:

Mirror partner’s proposal
Signal willingness to agree
Quick convergence but may not optimize

Post-Agreement Strategies

Honor Commitment:

Match agreed level exactly
Build trust for future negotiations
Avoid all penalties

Minor Deviation:

Small deviation under threshold
No penalty but trust impact
Short-term gain possible

Major Breach:

Large deviation exceeding threshold
Severe penalty
Trust collapses

Example: Negotiation Protocol

import coopetition_gym
import numpy as np

env = coopetition_gym.make("CooperativeNegotiation-v0")
obs, info = env.reset(seed=42)

# Negotiation state
my_proposal = 60.0
partner_last = 50.0

for step in range(100):
    # Simple convergence: move toward partner
    if not info.get('agreement_reached', False):
        # Negotiation phase: move toward partner
        my_proposal = 0.7 * my_proposal + 0.3 * partner_last
    else:
        # Agreement phase: honor commitment
        my_proposal = info['current_agreement'][0]

    # Partner uses similar logic (simulated)
    partner_proposal = 0.6 * partner_last + 0.4 * my_proposal

    actions = np.array([my_proposal, partner_proposal])
    obs, rewards, terminated, truncated, info = env.step(actions)

    partner_last = obs[1]  # Partner's last action

    if step == 20: print(f"Step 20: Agreement={info['agreement_reached']}, "
              f"Proposals=({my_proposal:.1f}, {partner_proposal:.1f})")

print(f"\nFinal: {info['total_agreements']} agreements, "
      f"{info['total_breaches']} breaches")

Research Applications

CooperativeNegotiation-v0 is suitable for studying:

Contract Theory: Commitment and enforcement
Negotiation: Multi-round bargaining dynamics
Communication: Signaling through actions
Institutional Design: Agreement mechanisms
MARL with Commitments: Learning with binding agreements

TrustDilemma-v0: Without explicit contracts
PartnerHoldUp-v0: Power-asymmetric negotiation
ReputationMarket-v0: Market-level agreements

References

Crawford, V.P. & Sobel, J. (1982). Strategic Information Transmission. Econometrica.
Raiffa, H. (1982). The Art and Science of Negotiation. Harvard University Press.
Pant, V. & Yu, E. (2025). Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics. arXiv:2510.24909

CooperativeNegotiation-v0

Overview

MARL Classification

Formal Specification

Agents

State Space

Action Space

Agreement Formation

Breach Detection

Trust Parameters

Reward Function

Episode Structure

Initial State

Game-Theoretic Background

Negotiation Theory

The Commitment Problem

Environment Specification

Basic Usage

Parameters

Agreement Mechanics

Reaching Agreement

Agreement State

Breach Detection

Observation Space

Extended Observation

Agreement Information in Observation

Reward Structure

Base Rewards

Breach Penalty

Trust Dynamics

Parameters

Breach Impact on Trust

Metrics and Info

Strategic Analysis

Negotiation Strategies

Post-Agreement Strategies

Example: Negotiation Protocol

Research Applications

Related Environments

References