MedGRPO Demo — Medical Video Understanding

This demo runs uAI-NEXUS-MedVLM-1.0b-4B-RL (base: Qwen3-VL-4B), part of the uAI-NEXUS-MedVLM 1.0 family trained with SFT + MedGRPO on MedVidBench, for medical video question answering across 8 tasks: temporal reasoning, spatial grounding, captioning, and clinical assessment. Sibling release: uAI-NEXUS-MedVLM-1.0a-7B-RL (Qwen2.5-VL-7B base).

📄 Paper 🌐 Project Page 💾 Dataset 🤖 Model 💻 GitHub 📊 Leaderboard

Browse pre-computed predictions from the test set (no GPU needed).

Select Task

⏱️ Temporal Action Localization (TAL)

Identify when specific surgical actions occur in the video (start–end times).

Choose Example

Video Frames

Question

Ground Truth

Model Prediction