Smart Assistant // AI Research Project
AMADEUS
SYSTEM
A modular, open-source multimodal agent platform — featuring real-time speech processing, camera-based visual understanding, and multi-provider LLM integration. Fully customizable: define your agent's persona and prompts, swap Live2D models, and deploy via a unified management dashboard.
DIVERGENCE METER 1.048596%
SYSTEM IN ACTION
Multimodal conversation with the Amadeus agent — voice input, camera vision, LLM reasoning, and TTS output in a unified pipeline.
SYSTEM ARCHITECTURE
Four specialized modules engineered to work in concert, from raw audio to rendered avatar.
Core Python library powering the full multimodal pipeline — real-time Voice Activity Detection, streaming ASR transcription, neural TTS synthesis, camera snapshot capture for visual context, and a unified LLM API abstraction layer with support for multiple providers.
Central orchestration layer for the agent. Manages conversation state and context windows, routes intents through the skill pipeline, interfaces with MCP tools, triggers scheduled tasks, and maintains persistent cross-session memory.
Full-featured admin dashboard for deploying and configuring agents at runtime. Manage Live2D models, persona prompts, MCP integrations, skill modules, scheduled autonomous tasks, and multi-provider API credentials.
The user-facing frontend delivering an immersive multimodal interaction experience. Live2D avatar rendering, real-time lip-sync, and a clean conversational interface connected via WebSocket to the agent backend.