This demo showcases MedGRPO fine-tuned on MedVidBench for medical video question answering across 8 tasks: temporal reasoning, spatial grounding, captioning, and clinical assessment.
📄 Paper 🌐 Project Page 💾 Dataset 🤖 Model 💻 GitHub 📊 Leaderboard
Browse pre-computed predictions from the test set (no GPU needed).
Identify when specific surgical actions occur in the video (start–end times).
Upload a medical video or frames and ask a question, or try a pre-loaded example. The model runs on ZeroGPU (may take 30–60s on first load).
Try a Pre-loaded Example (click a card below):