Video demo is here!
LiveGuide
LiveGuide is a real-time scene assistance system built with YOLO (OpenImages) + VLM (Qwen/Gemini) + TTS (ElevenLabs). It supports webcam streaming and local video debugging, designed for low-latency, observable navigation guidance.
Features
● Real-time Perception: Leverages YOLOv8n-oiv7 (detecting 601 classes) enriched with MiDaS depth estimation for immediate object awareness.
● Adaptive AI Backends: Supports hot-switching between qwen_local and gemini_api, optimized by a unique VLM triggering algorithm including interval and similarity gating.
● Concurrency Control: Ensures stable multi-user support via Gradio through per-session pipeline isolation.
Architecture
Camera / Video
-> preprocess (resize/jpeg)
-> YOLO detect (+ MiDaS depth)
-> gating (interval + similarity)
-> LLM queue (thread/process worker)
-> short actionable sentence
-> TTS
-> Gradio UI + JSONL logs
Quick Start
1) Install
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install opencv-python
2) Configure API Keys
Verify config/runtime_config.yaml:
llm.provider: gemini_api | qwen_local
tts.provider: elevenlabs
3) Run Web App
python gradio_app.py