Leaderboard

We present the evaluation results on our own devices for reference. All models were evaluated uniformly on Spec-Bench using the same device and the testing environment. We report the mean speedup over 3 different runs and #mean accepted tokens per decoding step (which is 1.00 for vanilla autoregressive decoding).

❗️It is important to note that model speedup rates may differ across various devices. For more precise speedup metrics, we recommend conducting evaluations of specific models on your intended devices.

🤔 This is a gentle reminder that while speedup is the primary metric for assessing Speculative Decoding methods, other benefits are worth considering. For example, PLD and Lookahead require no extra parameters, making it simpler to integrate a wider range of models.

Leaderboard on 3090

Device: a single NVIDIA GeForce RTX 3090 GPU (24GB) with 12 CPU cores
Testing environment: Pytorch 2.0.1, under CUDA 11.8
Experimental Settings: Vicuna-7B-v1.3, greedy decoding, FP16 precision, batch size = 1

Models	Multi-turn Conversation	Translation	Summa-rization	Question Answering	Mathematical Reasoning	Retrieval-aug. Generation	#Mean Accepted Tokens	Overall
EAGLE🏅	2.44x	1.81x	2.13x	2.11x	2.54x	1.82x	3.57	2.16x
SpS🥈	1.98x	1.37x	2.00x	1.95x	1.89x	1.76x	2.29	1.83x
Hydra🥉	2.04x	1.67x	1.56x	1.81x	2.16x	1.48x	3.26	1.80x
PLD	1.57x	1.07x	2.31x	1.25x	1.62x	1.56x	1.74	1.55x
Medusa	1.60x	1.38x	1.28x	1.46x	1.64x	1.22x	2.32	1.44x
REST	1.49x	1.18x	1.21x	1.46x	1.35x	1.27x	1.63	1.32x
Lookahead	1.13x	0.97x	1.05x	1.07x	1.29x	0.98x	1.65	1.08x

Leaderboard on A100

Device: a single NVIDIA A100 GPU (80GB) with 64 CPU cores
Testing environment: Pytorch 2.0.1, under CUDA 11.4
Experimental Settings: greedy decoding, FP16 precision, batch size = 1

Vicuna-7B-v1.3

Models	Multi-turn Conversation	Translation	Summa-rization	Question Answering	Mathematical Reasoning	Retrieval-aug. Generation	#Mean Accepted Tokens	Overall
EAGLE🏅	2.67x	1.99x	2.23x	2.12x	2.67x	2.04x	3.61	2.29x
Hydra🥈	2.45x	1.94x	1.79x	2.03x	2.49x	1.77x	3.24	2.09x
Medusa🥉	2.05x	1.73x	1.57x	1.75x	2.05x	1.51x	2.32	1.78x
PLD	1.64x	1.04x	2.43x	1.14x	1.61x	1.71x	1.73	1.59x
SpS	1.66x	1.13x	1.62x	1.49x	1.47x	1.55x	2.28	1.49x
REST	1.63x	1.31x	1.36x	1.66x	1.21x	1.73x	1.82	1.48x
Lookahead	1.40x	1.14x	1.19x	1.24x	1.55x	1.09x	1.66	1.27x

Vicuna-13B-v1.3

Models	Multi-turn Conversation	Translation	Summa-rization	Question Answering	Mathematical Reasoning	Retrieval-aug. Generation	#Mean Accepted Tokens	Overall
EAGLE🏅	2.68x	1.96x	2.44x	2.04x	2.70x	2.23x	3.64	2.34x
Hydra🥈	2.46x	1.90x	1.93x	1.96x	2.48x	1.92x	3.35	2.12x
Medusa🥉	1.96x	1.66x	1.63x	1.63x	2.00x	1.58x	2.39	1.75x
SpS	1.60x	1.13x	1.68x	1.39x	1.53x	1.67x	2.18	1.49x
PLD	1.47x	1.02x	2.19x	1.03x	1.57x	1.71x	1.68	1.48x
REST	1.52x	1.17x	1.37x	1.53x	1.19x	1.55x	1.82	1.38x
Lookahead	1.30x	1.06x	1.20x	1.12x	1.48x	1.12x	1.63	1.22x

Vicuna-33B-v1.3

Models	Multi-turn Conversation	Translation	Summa-rization	Question Answering	Mathematical Reasoning	Retrieval-aug. Generation	#Mean Accepted Tokens	Overall
EAGLE🏅	2.79x	2.05x	2.51x	2.17x	2.99x	2.27x	3.39	2.47x
Hydra🥈	2.59x	2.01x	2.04x	2.11x	2.71x	2.06x	3.24	2.26x
Medusa🥉	1.98x	1.73x	1.64x	1.66x	2.07x	1.62x	2.33	1.79x
SpS	1.75x	1.28x	1.76x	1.53x	1.69x	1.68x	2.01	1.61x
REST	1.63x	1.27x	1.45x	1.61x	1.30x	1.61x	1.80	1.48x
PLD	1.44x	1.06x	2.00x	1.07x	1.55x	1.45x	1.55	1.42x
Lookahead	1.32x	1.08x	1.20x	1.16x	1.54x	1.15x	1.61	1.24x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard.md

Leaderboard.md

Leaderboard

Leaderboard on 3090

Leaderboard on A100

Vicuna-7B-v1.3

Vicuna-13B-v1.3

Vicuna-33B-v1.3

Files

Leaderboard.md

Latest commit

History

Leaderboard.md

File metadata and controls

Leaderboard

Leaderboard on 3090

Leaderboard on A100

Vicuna-7B-v1.3

Vicuna-13B-v1.3

Vicuna-33B-v1.3