Glossary
Terms used throughout the Meridian documentation and codebase.
TTFT — Time to First Token. Wall-clock time from the moment a request is submitted to the moment the first token is returned to the client. Dominated by prefill latency for long prompts.
TTOT — Time to Output Token. Wall-clock time from the emission of the
</think> boundary token to the first user-visible output token. The metric
Meridian is specifically designed to protect.
TPOT — Time Per Output Token (also ITL: Inter-Token Latency). Time between consecutive output tokens during streaming. The user-perceived "speed" of the stream.
ITL — Inter-Token Latency. See TPOT.
EAT — Entropy-Aware Termination. A budget-forcing signal based on the variance of the EMA of per-token Shannon entropy over the reasoning chain. When EAT EMA variance drops below a threshold, the model is inferred to have converged on an answer. Defined in arXiv:2509.26522.
RPDI — Reasoning Phase Divergence Index. A signal based on the ratio of local transition-token frequency to global transition-token frequency. A high ratio indicates the model is cycling through redundant reasoning steps ("overthinking"). Defined in arXiv:2603.14251.
Entropy (Shannon) — Measure of uncertainty in the next-token distribution.
Computed from the logit vector after softmax as -Σ p_i · log(p_i), in nats.
High entropy = uncertain prediction; low entropy = confident prediction.
EMA — Exponential Moving Average. A smoothed average where recent values
are weighted more heavily. Controlled by ema_alpha (smaller = longer memory).
KV — Key-Value cache. The GPU memory store holding the attention keys and values for each token in active requests. KV memory is the primary capacity constraint in serving systems.
KV block — The unit of KV cache allocation. A fixed-size chunk (default 16 KiB) covering a fixed number of tokens (default 16). Blocks are allocated at the request level and freed on eviction or request completion.
ThinkComplete — Block tier for KV blocks that belonged to a request's
reasoning phase after </think> has been emitted. Lowest eviction priority —
these blocks are freed first under memory pressure.
ThinkActive — Block tier for KV blocks belonging to a request currently in the reasoning phase.
OutputCritical — Block tier for KV blocks belonging to a request in the
output phase. Highest eviction priority — evicting these causes user-visible
stream disruption. Any eviction at this tier fires meridian.output_critical_eviction.
Disagg / disaggregated serving — Prefill-decode disaggregation: the model prefill (prompt processing) and decode (token generation) steps are executed on separate hardware. Disagg reduces head-of-line blocking by separating the two workloads, which have very different GPU utilisation profiles.
NIXL — NVIDIA Inference eXchange Layer. NVIDIA's reference fabric for transferring KV blocks between prefill and decode nodes in a disaggregated serving topology.
Mooncake — An open-source disaggregated serving framework with a KV-transfer protocol. Meridian's disagg surface is documented as Mooncake-compatible in ADR-0006.
vLLM — An open-source LLM serving framework. Meridian's primary integration target. Meridian wraps vLLM's scheduler without forking the codebase.
DashMap — A concurrent hash map crate used for the PhaseRouter's
per-request state. Provides O(1) read and write with sharded locking.
See ADR-0003.
EOS — End of Sequence. The special token that signals request completion. vLLM emits an EOS event that the Meridian plugin uses to trigger request teardown and router state reaping.
Conventional Commits — A commit message standard used throughout this
repository. Format: <type>(<scope>): <summary>. See
conventionalcommits.org.
DCO — Developer Certificate of Origin. A sign-off mechanism (git commit -s)
that certifies the contributor has the right to submit the code under the
project's license. Required for all contributions — see CONTRIBUTING.md.
SLSA — Supply-chain Levels for Software Artifacts. A framework for
supply-chain security. Meridian attests Level 2 provenance on every tagged
release via slsa-github-generator. See ADR-0007.
SBOM — Software Bill of Materials. A machine-readable inventory of software components and their licenses. Meridian generates a CycloneDX SBOM for each release, attached as a GitHub release asset.