456 Episodes

  1. Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

    Published: 4/24/2025
  2. γ-Bench: Evaluating LLMs in Multi-Agent Games

    Published: 4/24/2025
  3. DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

    Published: 4/24/2025
  4. Optimal Prediction Sets for Enhanced Human-AI Accuracy

    Published: 4/24/2025
  5. Self-Correction via Reinforcement Learning for Language Models

    Published: 4/24/2025
  6. Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

    Published: 4/24/2025
  7. Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

    Published: 4/24/2025
  8. Iterative Nash Policy Optimization for Language Model Alignment

    Published: 4/24/2025
  9. SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

    Published: 4/23/2025
  10. Stack AI: Democratizing Enterprise AI Development

    Published: 4/22/2025
  11. Evaluating Modern Recommender Systems: Challenges and Future Directions

    Published: 4/22/2025
  12. AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI

    Published: 4/22/2025
  13. Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Published: 4/21/2025
  14. AI Agent Protocols and Human Preference

    Published: 4/21/2025
  15. Cross-Environment Cooperation for Zero-Shot Multi-Agent Coordination

    Published: 4/20/2025
  16. Sutton and Silver: The Era of Experience: Learning Beyond Human Data

    Published: 4/19/2025
  17. Sample, Don't Search: Rethinking Test-Time Alignment for Language Models

    Published: 4/19/2025
  18. AI Agents: Echoes of Past Technology Pivots?

    Published: 4/19/2025
  19. Minimalist LLM Reasoning: Rejection Sampling to Reinforcement

    Published: 4/19/2025
  20. Securing the Model Context Protocol in Enterprise Environments

    Published: 4/19/2025

18 / 23

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.