No description
  • HTML 56.3%
  • Python 33.3%
  • JavaScript 8.7%
  • Just 0.7%
  • Dockerfile 0.7%
  • Other 0.3%
Find a file
2026-05-06 09:09:49 +03:00
docs feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
public feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
stt feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
tests feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
tts feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
.dockerignore feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
.env.example build(env): separate pytorch defaults for local dev and docker containers 2026-05-06 09:09:49 +03:00
.gitignore feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00
demo-gif.sh Initial commit 2026-02-27 20:30:34 +03:00
Dockerfile build(env): separate pytorch defaults for local dev and docker containers 2026-05-06 09:09:49 +03:00
justfile build(just): allow passing extra docker build arguments 2026-05-05 21:35:48 +03:00
LICENSE Initial commit 2026-02-27 20:30:34 +03:00
pyproject.toml feat(tts): add kokoro english backend, bilingual voice catalog and lazy model loading 2026-04-16 22:38:08 +03:00
README.md feat(tts): add kokoro english backend, bilingual voice catalog and lazy model loading 2026-04-16 22:38:08 +03:00
server.py feat(stt): add long-form jobs and modularize speech pipelines 2026-04-20 05:10:29 +03:00

Resonance

Unified Speech-to-Text (STT) and Text-to-Speech (TTS) API Server.

Features

  • STT: GigaAM-v3 model for Russian speech recognition
  • TTS: Russian Silero v5 voices and English Kokoro voices
  • i18n: Interface available in English, Russian, Chinese

Demo

Demo

Quick Start

cp .env.example .env
just build
just run

Open http://localhost:8000

GPU: set DEVICE=cuda in .env before building.

Models are loaded lazily on first real STT/TTS use. Container startup does not pre-download or pre-load model weights, so the first request to a specific backend may take noticeably longer.

Configuration

See .env.example for available options.

API Endpoints

Endpoint Method Description
/api/health GET Health check
/api/config GET Public configuration including TTS language -> voice catalog
/api/models GET List backend/model status plus TTS catalog
/api/jobs GET List current session jobs (compact DTO); query limit (default 60), offset; JSON includes has_more, next_offset
/api/jobs/stt POST Start STT job, returns job_id
/api/jobs/tts POST Start TTS job with text, language, voice_id; returns job_id
/api/jobs/{job_id} GET Get job status/result (session-scoped)
/api/jobs/{job_id}/events GET Stream job events (SSE, session-scoped)
/api/jobs/{job_id}/cancel POST Cancel active job (session-scoped)
/api/stream/download GET Download TTS audio

F5 Recovery Model

  • Frontend stores only active job IDs in localStorage:
    • resonance_stt_active_job_id
    • resonance_tts_active_job_id
  • Drawer jobs list is loaded from GET /api/jobs (paginated with offset / has_more) and contains only jobs for the current browser session.
  • Backend assigns resonance_session_id cookie and enforces ownership on status/events/cancel endpoints.
  • After page reload, UI restores state via GET /api/jobs/{job_id} and continues progress via /events.
  • Job data is in-memory on server (JobRegistry), so after server restart unknown job_id is cleared on client and UI resets to neutral state.