Back to Papers

Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling of LLM

Chenxia Tang

2025ACL

Abstract

Large language models (LLMs) rely heavily on sampling methods to generate diverse and high-quality text.While existing sampling methods like top-p and min-p have identified the detrimental effects of low-probability tails in LLMs’ outputs, they still fail to effectively distinguish between diversity and noise. This limitation stems from their reliance on probability-based metrics that are inherently sensitive to temperature scaling. Through empirical and theoretical analysis, we make two key discoveries: (1) the pre-softmax logits exhibit a clear statistical separation between informative tokens and noise, and (2) we prove the mathematical equivalence of min-p and top-(1-p) under uniform distribution over logits. These findings motivate the design of top-n𝜎, a novel sampling method that identifies informative tokens by eliminating noise directly in logit space.Unlike existing methods that become unstable at high temperatures, top-n𝜎 achieves temperature-invariant token selection while preserving output diversity. Extensive experiments across reasoning and creative writing tasks demonstrate that our method consistently outperforms existing approaches, with particularly significant improvements in high-temperature settings.

Relevance Assessment

Research Gap

hasn't been applied to translation

Notes

Notes are automatically saved as you type

Tags

evaluation › automatic metrics

Search Queries

Paper ID: 4473f03e-19d6-4e68-ade3-47b1ffde80fbAdded: 9/21/2025