Neural intel Pod

Episodes (Page 8)

Reinforcement Learning in Non-Stationary Environments

Jun 26, 2025 · 00:31:26

✨ NS-NAC algorithm for non-stationary environments.

00:31:26

Personalized Policy Learning from Heterogeneous Data

Jun 25, 2025 · 00:38:42

✨ Offline RL for personalized policies from diverse data.

00:38:42

Boosting Reinforcement Learning with Human Feedback via SeRA

Jun 23, 2025 · 00:34:05

✨ SeRA mitigates spurious correlations in RLHF.

00:34:05

AXIOM: Active Inference Object-Centric World Models

Jun 22, 2025 · 00:36:09

✨ AXIOM uses object-centric models and active inference.

00:36:09

Entropy and Reinforcement Learning for LLMs

Jun 21, 2025 · 00:31:10

✨ Policy entropy declines rapidly in RL for LLMs, limiting exploration.

00:31:10

FLEX Robot-Agnostic Force-Based Manipulation Learning

Jun 19, 2025 · 00:56:34

✨ Episode: FLEX Robot-Agnostic Force-Based Manipulation Learning

00:56:34

Agent RL Scaling for Mathematical Problem Solving

Jun 18, 2025 · 00:51:16

✨ ZeroTIR trains LLMs to use Python for math via RL.

00:51:16

Beyond Reward: Limits of RL in LLM Reasoning

Jun 17, 2025 · 00:39:57

✨ RLVR may not fundamentally improve LLM reasoning beyond base models.

00:39:57

Reward Model Variance in RLHF

Jun 15, 2025 · 00:50:58

✨ Reward model quality, not just accuracy, impacts RLHF efficiency.

00:50:58

Power Grid Topological Control with Graph Reinforcement Learning

Jun 14, 2025 · 00:57:47

✨ Graph RL optimizes power grid control with masked actions.

Graph Reinforcement Learning

00:57:47

Decentralized RL for Multi-Resource Allocation via Dynamic Cluster Agreements

Jun 13, 2025 · 00:52:32

✨ LGTC-IPPO uses dynamic cluster agreements for decentralized resource allocation.

00:52:32

Reinforcement Learning for Humanoid Dexterous Manipulation

Jun 12, 2025 · 00:42:03

✨ RL enables humanoid robots for dexterous manipulation using vision.

00:42:03

µCODE: Code Generation with Single-Step Rewards

Jun 11, 2025 · 00:50:32

✨ µCODE generates code iteratively using single-step execution rewards.

00:50:32

Confidence-Reward Preference Optimization for Machine Translation

Jun 10, 2025 · 00:55:38

✨ CRPO improves machine translation data selection for LLMs.

00:55:38

Personalized Preference Learning with MiCRo

Jun 9, 2025 · 00:47:37

✨ MiCRo framework learns diverse human preferences for LLMs

00:47:37

ProRL Expands LLM Reasoning Boundaries

Jun 8, 2025 · 00:41:43

✨ ProRL enhances LLM reasoning with KL divergence control

00:41:43

Open CaptchaWorld: Benchmarking MLLM Agents

Jun 7, 2025 · 00:12:43

✨ Open CaptchaWorld benchmarks multimodal AI agents

00:12:43

ProxyThinker: Guiding Large Models with Small Reasoners

Jun 7, 2025 · 00:44:31

✨ ProxyThinker guides large models with small reasoners

Small Reasoners

00:44:31

DexMachina: Functional Dexterous Bimanual Manipulation

Jun 6, 2025 · 00:16:28

✨ DexMachina enables functional bimanual manipulation

00:16:28

3DMEM-BENCH: Long-Term Memory for Embodied AI

Jun 5, 2025 · 00:13:58

✨ 3DMEM-BENCH advances embodied AI with dual-memory system

00:13:58

← Prev 1 2 3 … 7 8 9 … 16 17 Next →

Episodes (Page 8)

Track New Episodes & Guest Appearances