cover

Multi-Token Prediction Performance on GSM8K Mathematical Reasoning

10 Jun 2025

This figure reveals the nuanced impact of multi-token prediction on GSM8K accuracy, showing an advantage for n=2 models at lower data

cover

Multi-Token Prediction for Abstractive Text Summarization: ROUGE Metrics

10 Jun 2025

Discover how multi-token prediction significantly improves ROUGE-N and ROUGE-L scores for 7B parameter LLMs on various abstractive text summarization benchmarks

cover

Limited Gains: Multi-Token Training on Natural Language Choice Tasks

10 Jun 2025

This figure indicates that multi-token prediction with 7B models yields limited or no improvement in accuracy on standard NLP benchmarks

cover

CodeContests Finetuning: Details for Multi-Token LLMs

10 Jun 2025

Explore the detailed methodology for finetuning multi-token pretrained LLMs on the challenging CodeContests dataset

cover

LLM Performance Scaling: Multi-Token Prediction Across Model Sizes

10 Jun 2025

This table provides a detailed comparison of multi-token and next-token prediction performance on HumanEval and MBPP across a wide range of LLM sizes.

cover

Llama 2 Finetuning Results: Multi-Token Prediction on Coding Benchmarks

10 Jun 2025

This table evaluates the impact of multi-token prediction on Llama 2 fine-tuning, suggesting that it does not significantly improve performance on various tasks

cover

Training Time Comparison: Multi-Token vs. Next-Token Prediction

8 Jun 2025

This table (S5) quantifies the training time overhead of multi-token prediction relative to next-token prediction

cover

Alternative Architectures for Multi-Token Prediction in LLMs

6 Jun 2025

Explore and compare alternative architectural designs for implementing multi-token prediction in large language models

cover

Self-Speculative Decoding Speeds for Multi-Token LLMs

6 Jun 2025

Figure S10 illustrates the relative throughput and latency improvements of self-speculative decoding with k heads for a 4-token prediction code model