456 Episodes

  1. AI-Powered Bayesian Inference

    Published: 5/10/2025
  2. Can Unconfident LLM Annotations Be Used for Confident Conclusions?

    Published: 5/9/2025
  3. Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

    Published: 5/9/2025
  4. Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

    Published: 5/9/2025
  5. How to Evaluate Reward Models for RLHF

    Published: 5/9/2025
  6. LLMs as Judges: Survey of Evaluation Methods

    Published: 5/9/2025
  7. The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

    Published: 5/9/2025
  8. Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

    Published: 5/9/2025
  9. Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

    Published: 5/9/2025
  10. Accelerating Unbiased LLM Evaluation via Synthetic Feedback

    Published: 5/9/2025
  11. Prediction-Powered Statistical Inference Framework

    Published: 5/9/2025
  12. Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

    Published: 5/9/2025
  13. RM-R1: Reward Modeling as Reasoning

    Published: 5/9/2025
  14. Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

    Published: 5/8/2025
  15. Decoding Claude Code: Terminal Agent for Developers

    Published: 5/7/2025
  16. Emergent Strategic AI Equilibrium from Pre-trained Reasoning

    Published: 5/7/2025
  17. Benefiting from Proprietary Data with Siloed Training

    Published: 5/6/2025
  18. Advantage Alignment Algorithms

    Published: 5/6/2025
  19. Asymptotic Safety Guarantees Based On Scalable Oversight

    Published: 5/6/2025
  20. What Makes a Reward Model a Good Teacher? An Optimization Perspective

    Published: 5/6/2025

15 / 23

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.