The vision

What we're building

A public, free voice-to-text app for web + mobile, powered by multiple open-source AI models — not just one. Users switch models freely, and the app keeps working even when an individual model fails or hits its quota. Next: React web + Flutter mobile on a Python backend.

🎙️

Record or upload

Speak in-app or upload audio. We normalize to 16 kHz mono, chunk long files, and transcribe.

🔀

Pick any model

A clean picker of models with languages, speed and "offline?" tags. Choose what fits your language.

🛟

Never breaks

If your model is down or over quota, the orchestrator falls back to the next best one — transparently.

Definitions — get these right first

Two concepts, correctly separated

These are two different things people constantly confuse. Keep them separate; they compose beautifully.

Concept A

User picks the model

A Model Picker + Engine Registry. The user chooses a Chinese model, a Japanese model, or any model that supports the language they need. ✓ Easy once every engine sits behind one interface.

Concept B

AI Orchestration

"One model stops / its daily quota ends → auto-switch." An AI Gateway / Router with health checks, circuit breakers, and fallback chains. ✓ Exactly what LiteLLM & Portkey (open-source) do.

🧩 They compose: the user sets a preference (A); the orchestrator honors it but auto-falls-back (B) if that model is unavailable — e.g. "switched to Whisper because SenseVoice was over quota."

Why this is hard — in plain language

Real-world problems for end users

What actually goes wrong for the person using the app — and what each problem really means.

🚫

"Quota exceeded"

Definition: a free model allows only N requests/day. Problem: at request N+1 everyone is blocked and the transcript just fails.

💥

"Model is down"

Definition: a provider has an outage. Problem: the whole app feels broken though only one model failed.

🌐

"My language isn't supported"

Definition: a model covers only some languages. Problem: a Cantonese or Japanese user gets garbage from an English-only model.

🐢

"It's so slow"

Definition: big models on weak hardware. Problem: users wait 30s for a 10s clip and give up.

🔓

"Where did my audio go?"

Definition: cloud models send audio off-device. Problem: sensitive voice notes leave the device — a privacy issue.

💸

"Why is it suddenly paid?"

Definition: free credits run out. Problem: a free app starts demanding a card, breaking trust.

The single most important risk

The shared-API-key trap

⚠️ One API key in `.env` for a public app is a trap.

A single shared key means one free quota burns for everyone in minutes, one abuser kills the app, the key will leak if it touches client code, and most free tiers' Terms of Service forbid proxying one account to many users (→ account ban).

The clean fix for a "fully open-source + public" app — combine these:

Layer	What	Why it solves your problem
⭐ Self-host the open-source models Whisper, SenseVoice, Vosk, Qwen3-ASR…	Run them on your server / GPU	No API key, no per-day quota at all — you trade per-request quota for compute. Fits "fully open-source, multiple models, public."
Per-user rate limit (Redis)	Cap audio-minutes per user / IP / day	Protects your compute from abuse even with no keys.
Orchestrator	Honor the user's model choice + auto-failover	Delivers Concept A + Concept B together.
Optional: BYOK	Users paste their own Groq/HF key (encrypted)	Their quota, not yours — so there's no shared key, ever.

✅ So: self-host = no shared key + unlimited (your compute) + per-user caps, with BYOK as an opt-in for cloud models. Never ship one shared key.

How many solutions really exist?

4 feasible solutions — pick what fits

Four viable architectures trading off cost, speed, privacy, and complexity. Each shows its limitations, best use case, and a monthly cost band. Vote for your favourite below ↓

Cheapest

① Free-Cloud Orchestrator

$3 /mo

$0–$5 · free tiers + BYOK

Orchestrate free cloud APIs (Groq, HF, Cloudflare). Heavy users bring their own key.

Almost free; zero infra
Very fast (Groq LPU)
Depends on 3rd-party quotas
Audio leaves device

Best for: launching the MVP fast on a tiny budget.

Budget · Private

② CPU Self-Host

$12 /mo

$5–$20 · one small VPS

faster-whisper INT8 / Vosk / SenseVoice on a CPU VPS. No keys, no quotas.

Fully open-source; no shared key
Audio stays private
Predictable flat cost
Slower (CPU)

Best for: privacy-first, moderate traffic.

BEST FIT

Intermediate

③ Hybrid Smart-Router

$25 /mo

$15–$40 · self-host + cloud burst

Self-hosted default; bursts to cloud / serverless GPU for spikes. BYOK supported.

Best balance of cost / speed
Graceful scaling, strong uptime
No shared key; private default
More complexity

Best for: the recommended production path.

Fastest · Scale

④ GPU Powerhouse

$120 /mo

$50–$300 · or serverless

Dedicated/serverless GPU running Whisper large-v3 / Qwen3-ASR at real-time.

Fastest + most accurate
Real-time streaming, high concurrency
Highest cost; ops
Idle GPU cost

Best for: scale & real-time apps.

Comparative analysis

Compare: pricing, performance & ranking

Scores are indicative (1–10 where noted) to aid reasoning — not hard benchmarks. The user-likes chart updates live as you vote.

Monthly cost

USD per month (typical)

Performance & accuracy

Relative score, 1–10

Multi-factor ranking

Speed · accuracy · privacy · scale · ease · cost

User likes

Live — vote below

Overall MVP-fit score

Out of 100

At a glance

Features & how we achieve them

What the app does, what it means, how it's built, and when it lands.

Feature	What it means	How we achieve it	Stage
Multi-language	Transcribe 99+ languages, CN/JP first-class	Whisper large-v3 default + SenseVoice (CN) / Kotoba (JP) routing	MVP
Switch models	User chooses any model, any time	Model registry behind one engine interface (Concept A)	MVP
Auto-failover	Keeps working when a model dies / hits quota	Orchestrator: health checks + fallback chains (LiteLLM/Portkey)	v1
No shared key	Public-safe, no quota burn or ToS ban	Self-host models + per-user limits + optional BYOK	MVP
Offline / on-device	Works with no network; audio never leaves device	Vosk / Moonshine on mobile (Flutter/RN)	Later
Privacy	Sensitive audio stays on your infra	Self-hosted inference; consent + retention controls	v1
Export	Save transcripts in standard formats	.txt / .srt / .vtt with word-level timestamps	v1
Web + Mobile	One backend, two clients	React web + Flutter mobile + Python backend	Later

Help us decide

Which solution should we build first?

Tap to vote for the solution that's cheaper, faster, or best fits your needs. The user-likes chart above updates live.

① Free-Cloud

② CPU Self-Host

③ Hybrid ⭐

④ GPU

What users are saying

Leave a rating & comment

Your name (optional)

Which solution fits you best?

Your rating: 5/5

★★★★★

Comment — what's better / cheaper / what you need *

Demo mode stores locally. Once the waitingList-service (Supabase) endpoint is configured, it posts to the backend.

🎙️ Voice2Text · be first in line

Be first

Join the waitlist

Tell us which solution you'd pay for and what result you need. We'll email you when the Voice2Text app is live.

Email *

Preferred solution

Main language you need

What result matters most?

No spam. One confirmation email. Leave any time.

Speech to text,
in any language.

What we're building

Record or upload

Pick any model

Never breaks

Two concepts, correctly separated

User picks the model

AI Orchestration

Real-world problems for end users

"Quota exceeded"

"Model is down"

"My language isn't supported"

"It's so slow"

"Where did my audio go?"

"Why is it suddenly paid?"

The shared-API-key trap

⚠️ One API key in `.env` for a public app is a trap.

The clean fix for a "fully open-source + public" app — combine these:

4 feasible solutions — pick what fits

① Free-Cloud Orchestrator

② CPU Self-Host

③ Hybrid Smart-Router

④ GPU Powerhouse

Compare: pricing, performance & ranking

Monthly cost

Performance & accuracy

Multi-factor ranking

User likes

Overall MVP-fit score

App preview

Features & how we achieve them

Which solution should we build first?

What users are saying

Leave a rating & comment

Join the waitlist

What we're building

Record or upload

Pick any model

Never breaks

Two concepts, correctly separated

User picks the model

AI Orchestration

Real-world problems for end users

"Quota exceeded"

"Model is down"

"My language isn't supported"

"It's so slow"

"Where did my audio go?"

"Why is it suddenly paid?"

The shared-API-key trap

⚠️ One API key in .env for a public app is a trap.

The clean fix for a "fully open-source + public" app — combine these:

4 feasible solutions — pick what fits

① Free-Cloud Orchestrator

② CPU Self-Host

③ Hybrid Smart-Router

④ GPU Powerhouse

Compare: pricing, performance & ranking

Monthly cost

Performance & accuracy

Multi-factor ranking

User likes

Overall MVP-fit score

App preview

Features & how we achieve them

Which solution should we build first?

What users are saying

Leave a rating & comment

Join the waitlist

⚠️ One API key in `.env` for a public app is a trap.