NeurIPS 2025 — Best Paper Awards

Snapshot of the four Best Papers and three Runners-Up awarded at NeurIPS 2025, spanning LLMs, RL, diffusion models, theory and benchmarks.

Announced · November 26, 2025

7 papers • 4 Best · 3 Runner-up

Tracks • Main + Datasets & Benchmarks

Domains • Generative models, RL, LLMs, Theory

🔍 Explore • Full NeurIPS 2025 Explorer

Total Awards 7 4 Best · 3 Runner-up

🏆

Key Themes 5 LLMs, RL, diffusion, theory, benchmarks

🧠

Awarded Tracks 2 Main + Datasets & Benchmarks

📊

Research Axes Method + Theory Architectures, dynamics, limits

🧬

Theme Legend

Color-coded by dominant research area

LLMs & Architectures

Reinforcement Learning

Diffusion & Generative Models

Learning Theory & Scaling

Datasets & Benchmarks

Best Papers

Four Best Papers (incl. one in Datasets & Benchmarks)

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Best Paper · D&B

Infinity-Chat benchmark for open-ended prompts + large-scale analysis of diversity and “artificial hivemind” effects in LLM generations.

🤖 LLM Behaviour 📊 Datasets & Benchmarks 🌐 Diversity & Society

Introduces Infinity-Chat (26k open-ended queries + dense human annotations) and shows strong intra-model repetition and inter-model homogeneity, raising concerns about long-term creativity and value pluralism.

Liwei Jiang et al.

View paper ↗

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Best Paper

Simple head-specific sigmoid gating after SDPA that stabilizes training, reduces attention sink, and improves long-context performance in large-scale LLMs.

🧠 LLM Architectures 📈 Scaling & Stability 🧩 Gating Mechanisms

Compares dozens of gated-attention variants on 15B MoE and 1.7B dense models and finds a consistently strong win for a single, easy-to-implement gating design now used in Qwen3-Next models.

Zihan Qiu et al.

View paper ↗

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Best Paper

Demonstrates that very deep (up to 1024-layer) self-supervised RL agents can achieve strong goal-reaching performance without explicit rewards.

🎮 Reinforcement Learning 📏 Depth Scaling 🦾 Locomotion & Control

Challenges the assumption that RL is incompatible with very deep nets. Using contrastive, goal-conditioned self-supervision, depth scaling yields higher success rates and richer emergent behaviours in simulated tasks.

Kevin Wang et al.

View paper ↗

Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training

Best Paper

Explains how training dynamics create a window where diffusion models generalize well before memorization sets in, even in over-parameterized regimes.

🌀 Diffusion Models 📐 Implicit Regularization ⚖️ Generalization vs. Memorization

Identifies two characteristic timescales for diffusion training and shows that the generalization window grows with dataset size, tying practical success to provable dynamical regularization.

Tony Bonnaire et al.

View paper ↗

Runner-Up Papers

Three additional papers recognized for outstanding contributions

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Runner-up

Systematic probe of RL with verifiable rewards (RLVR) shows improved sampling efficiency but no fundamentally new reasoning patterns beyond the base model.

🧠 LLM Reasoning 🎯 RLVR 🧪 Evaluation & Limits

Across models, algorithms and benchmarks, RLVR narrows exploration and amplifies rewarded trajectories without expanding the underlying reasoning frontier; distillation is found to add truly new reasoning patterns.

Yang Yue et al.

View paper ↗

Optimal Mistake Bounds for Transductive Online Learning

Runner-up

Resolves a 30-year-old open problem on the value of unlabeled data in online learning with tight Ω(√d) / O(√d) bounds.

📚 Learning Theory 📉 Mistake Bounds 🔍 Transductive Setting

Shows a quadratic gap between transductive and standard online learning and clarifies when advanced access to unlabeled instances yields exponential improvements over prior lower bounds.

Zachary Chase et al.

View paper ↗

Superposition Yields Robust Neural Scaling

Runner-up

Argues that representation superposition is a key mechanism behind neural scaling laws in large models.

🧬 Neural Scaling Laws 🧩 Feature Superposition 📏 Chinchilla Regime

Uses a controlled toy model plus empirical analysis of open LLMs to show that strong superposition naturally produces inverse-dimension scaling of loss, explaining when scaling laws hold or break.

Yizhou Liu et al.

View paper ↗