ADR-0003: DashMap for per-request phase state
- Status: Accepted
- Date: 2026-05-20
- Authors: angelnicolasc
- Reviewers: sole-maintainer decision record
Context
The PhaseRouter mutates per-request state (ThinkPhase) on every decoded
token. A single decode worker processes a continuous-batch step that can
touch dozens of requests; in a tensor-parallel deployment, multiple worker
threads may invoke on_token for different requests in the same wall-clock
window. We need:
- O(1) lookup keyed by
req_id. - Concurrent mutation of different keys without serialising the whole map.
- Cheap clone / sharing — the router is shared across the scheduler, the block manager touch path, and the vLLM plugin.
- No GC, no allocations on the hot path.
Candidates evaluated:
parking_lot::RwLock<HashMap<u64, ThinkPhase>>DashMap<u64, ThinkPhase>(shardedHashMapbehind per-shardRwLock)papaya::HashMap<u64, ThinkPhase>(lock-free, June 2025)scc::HashMap<u64, ThinkPhase>(lock-free, sharded)
Decision
DashMap 6.x. It is the boring, well-audited choice that hits the
performance floor we need without introducing a less-battle-tested
dependency on the hot path.
Consequences
Positive
- Sharded locking. Operations on different
req_ids do not contend. - Mature API.
get_mut,entry,removecover every access pattern inPhaseRouter. No need to invent abstractions on top. - No
unsafe. Internally backed byparking_lot; we keep the#![forbid(unsafe_code)]invariant inmeridian-coreintact. - Crates.io top-100. Wide deployment, frequent security audits, stable semver.
Negative / risks
- Sharded lock is not truly lock-free. Under extreme contention (a
single shard absorbing many requests because of hash skew), throughput
degrades. Mitigation: monitor
meridian.phase_router.tracked_requests; if the gauge approachesn_shards × shard_capacity(default ~512 per shard), revisit withpapayaor shard-aware partitioning. In our workload (~1k concurrent requests max per worker) we are far from this ceiling. AHasherby default. Good for our integer keys; not a downside, but worth noting that we benefit from non-cryptographic hashing here.
Neutral
- Memory per entry is slightly higher than a flat
HashMapbecause of the shard metadata. Negligible (~64 bytes overhead total).
Alternatives considered
RwLock<HashMap<...>>
Rejected: every on_token call needs a write lock to bump
tokens_so_far, so the RwLock collapses to a Mutex in practice.
Single-shard serialisation across all requests is unacceptable.
papaya::HashMap
Watched, not adopted. Genuinely lock-free, with better tail latency under
contention than DashMap. Adoption deferred until: (a) it stabilises a
1.0 API, and (b) we can demonstrate a workload where DashMap is the
bottleneck. Tracked as future work in the DEVLOG.
scc::HashMap
Comparable to papaya but more API surface; same deferral rationale.
References
- DashMap source: https://github.com/xacrimon/dashmap.
- The phase_router bench
measures the steady-state cost of
on_tokenunder no contention.