No description

HTML 56.3%
Python 33.3%
JavaScript 8.7%
Just 0.7%
Dockerfile 0.7%
Other 0.3%

Find a file

y9938 178bcd35ad build(env): separate pytorch defaults for local dev and docker containers		2026-05-06 09:09:49 +03:00
docs	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
public	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
stt	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
tests	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
tts	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
.dockerignore	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
.env.example	build(env): separate pytorch defaults for local dev and docker containers	2026-05-06 09:09:49 +03:00
.gitignore	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00
demo-gif.sh	Initial commit	2026-02-27 20:30:34 +03:00
Dockerfile	build(env): separate pytorch defaults for local dev and docker containers	2026-05-06 09:09:49 +03:00
justfile	build(just): allow passing extra docker build arguments	2026-05-05 21:35:48 +03:00
LICENSE	Initial commit	2026-02-27 20:30:34 +03:00
pyproject.toml	feat(tts): add kokoro english backend, bilingual voice catalog and lazy model loading	2026-04-16 22:38:08 +03:00
README.md	feat(tts): add kokoro english backend, bilingual voice catalog and lazy model loading	2026-04-16 22:38:08 +03:00
server.py	feat(stt): add long-form jobs and modularize speech pipelines	2026-04-20 05:10:29 +03:00

README.md

Resonance

Unified Speech-to-Text (STT) and Text-to-Speech (TTS) API Server.

Features

STT: GigaAM-v3 model for Russian speech recognition
TTS: Russian Silero v5 voices and English Kokoro voices
i18n: Interface available in English, Russian, Chinese

Demo

Quick Start

cp .env.example .env
just build
just run

Open http://localhost:8000

GPU: set DEVICE=cuda in .env before building.

Models are loaded lazily on first real STT/TTS use. Container startup does not pre-download or pre-load model weights, so the first request to a specific backend may take noticeably longer.

Configuration

See .env.example for available options.

API Endpoints

Endpoint	Method	Description
`/api/health`	GET	Health check
`/api/config`	GET	Public configuration including TTS `language -> voice` catalog
`/api/models`	GET	List backend/model status plus TTS catalog
`/api/jobs`	GET	List current session jobs (compact DTO); query `limit` (default 60), `offset`; JSON includes `has_more`, `next_offset`
`/api/jobs/stt`	POST	Start STT job, returns `job_id`
`/api/jobs/tts`	POST	Start TTS job with `text`, `language`, `voice_id`; returns `job_id`
`/api/jobs/{job_id}`	GET	Get job status/result (session-scoped)
`/api/jobs/{job_id}/events`	GET	Stream job events (SSE, session-scoped)
`/api/jobs/{job_id}/cancel`	POST	Cancel active job (session-scoped)
`/api/stream/download`	GET	Download TTS audio

F5 Recovery Model

Frontend stores only active job IDs in localStorage:
- resonance_stt_active_job_id
- resonance_tts_active_job_id
Drawer jobs list is loaded from GET /api/jobs (paginated with offset / has_more) and contains only jobs for the current browser session.
Backend assigns resonance_session_id cookie and enforces ownership on status/events/cancel endpoints.
After page reload, UI restores state via GET /api/jobs/{job_id} and continues progress via /events.
Job data is in-memory on server (JobRegistry), so after server restart unknown job_id is cleared on client and UI resets to neutral state.