Pick any AI model — Chinese, Japanese, Whisper — behind one clean interface. If a model goes down or runs out of quota, we auto-switch. No shared API key, no lock-in.
A public, free voice-to-text app for web + mobile, powered by multiple open-source AI models — not just one. Users switch models freely, and the app keeps working even when an individual model fails or hits its quota. Next: React web + Flutter mobile on a Python backend.
Speak in-app or upload audio. We normalize to 16 kHz mono, chunk long files, and transcribe.
A clean picker of models with languages, speed and "offline?" tags. Choose what fits your language.
If your model is down or over quota, the orchestrator falls back to the next best one — transparently.
These are two different things people constantly confuse. Keep them separate; they compose beautifully.
A Model Picker + Engine Registry. The user chooses a Chinese model, a Japanese model, or any model that supports the language they need. ✓ Easy once every engine sits behind one interface.
"One model stops / its daily quota ends → auto-switch." An AI Gateway / Router with health checks, circuit breakers, and fallback chains. ✓ Exactly what LiteLLM & Portkey (open-source) do.
What actually goes wrong for the person using the app — and what each problem really means.
Definition: a free model allows only N requests/day. Problem: at request N+1 everyone is blocked and the transcript just fails.
Definition: a provider has an outage. Problem: the whole app feels broken though only one model failed.
Definition: a model covers only some languages. Problem: a Cantonese or Japanese user gets garbage from an English-only model.
Definition: big models on weak hardware. Problem: users wait 30s for a 10s clip and give up.
Definition: cloud models send audio off-device. Problem: sensitive voice notes leave the device — a privacy issue.
Definition: free credits run out. Problem: a free app starts demanding a card, breaking trust.
Four viable architectures trading off cost, speed, privacy, and complexity. Each shows its limitations, best use case, and a monthly cost band. Vote for your favourite below ↓
Orchestrate free cloud APIs (Groq, HF, Cloudflare). Heavy users bring their own key.
faster-whisper INT8 / Vosk / SenseVoice on a CPU VPS. No keys, no quotas.
Self-hosted default; bursts to cloud / serverless GPU for spikes. BYOK supported.
Dedicated/serverless GPU running Whisper large-v3 / Qwen3-ASR at real-time.
Scores are indicative (1–10 where noted) to aid reasoning — not hard benchmarks. The user-likes chart updates live as you vote.
USD per month (typical)
Relative score, 1–10
Speed · accuracy · privacy · scale · ease · cost
Live — vote below
Out of 100
Mockups of the planned mobile & desktop experience — 36 screens each, the full journey end-to-end. Slide to explore. (Concept designs; the real apps come next on a Python backend.)
What the app does, what it means, how it's built, and when it lands.
| Feature | What it means | How we achieve it | Stage |
|---|---|---|---|
| Multi-language | Transcribe 99+ languages, CN/JP first-class | Whisper large-v3 default + SenseVoice (CN) / Kotoba (JP) routing | MVP |
| Switch models | User chooses any model, any time | Model registry behind one engine interface (Concept A) | MVP |
| Auto-failover | Keeps working when a model dies / hits quota | Orchestrator: health checks + fallback chains (LiteLLM/Portkey) | v1 |
| No shared key | Public-safe, no quota burn or ToS ban | Self-host models + per-user limits + optional BYOK | MVP |
| Offline / on-device | Works with no network; audio never leaves device | Vosk / Moonshine on mobile (Flutter/RN) | Later |
| Privacy | Sensitive audio stays on your infra | Self-hosted inference; consent + retention controls | v1 |
| Export | Save transcripts in standard formats | .txt / .srt / .vtt with word-level timestamps | v1 |
| Web + Mobile | One backend, two clients | React web + Flutter mobile + Python backend | Later |
Tap to vote for the solution that's cheaper, faster, or best fits your needs. The user-likes chart above updates live.
Demo mode stores locally. Once the waitingList-service (Supabase) endpoint is configured, it posts to the backend.
Tell us which solution you'd pay for and what result you need. We'll email you when the Voice2Text app is live.
No spam. One confirmation email. Leave any time.