AudioWeb

AudioWeb runs the full Demucs AI stem-separation pipeline 100% client-side — no audio is ever uploaded, no server does any processing, and no GPU is rented. You paste a YouTube link, the backend downloads the audio, sends the raw file to your browser, and from that point on everything — inference, playback, encoding, storage — happens locally on your machine.

Why fully client-side matters

Most stem-separation services stream your audio to a GPU server, charge per-minute, and store your files on their infrastructure. AudioWeb eliminates all of that: the model runs in your browser tab, stems play immediately after inference, and files live only in your IndexedDB until you choose to sync them to the cloud library.

AI & Engineering Highlights

In-browser AI inference — all client-side — Demucs htdemucs_6s (6-stem: vocals, drums, bass, guitar, piano, other) runs entirely inside your browser via onnxruntime-web. The ~300MB ONNX model is downloaded once and cached; every subsequent separation runs offline with zero network calls. WebGPU acceleration is attempted first; WASM SIMD threads (ort-wasm-simd-threaded.jsep.wasm) serve as a universal fallback.
AI ported from Python to the browser — Demucs is a Python/PyTorch research project. Getting it into a browser required exporting to ONNX, fixing tensor shape mismatches across model versions, and wiring up onnxruntime-web's thread pool correctly — none of which is documented for this model.
In-browser MP3 encoding — client-side — After inference, 6 stereo WAV stems at 44.1 kHz/32-bit float total ~360 MB for a 4-minute song. They are compressed to MP3 at 192 kbps entirely in the browser using lamejs, reducing the sync payload by ~90% (~30–40 MB). The encoder runs in a yielding loop (setTimeout 0 every 50 frames) to avoid freezing the main thread without the complexity of a Web Worker.
Local-first storage — Stems are written to IndexedDB immediately after inference and are playable with zero latency — no waiting for upload.
Optional cloud sync with chunked upload protocol — A background IndexedDB queue uploads stems to Railway in a three-phase handshake (init → per-stem × 6 → finalize). Each per-stem request is 5–10 MB, bypassing Vercel's 4.5 MB proxy limit and Railway's ingress limit that blocked the original single-multipart approach.
Resilient sync with exponential backoff — The queue persists across browser restarts and retries with exponential backoff (capped at 60 s, max 10 attempts). Upload is fire-and-forget; stems are playable the instant inference finishes.
Privacy-first cross-device library — No accounts or passwords. A library key (xxxx-xxxx-xxxx) lives in localStorage and is sent as an X-Library-Key header. The backend hashes it with SHA-256 before persisting — the raw key never touches the server, but identical keys always resolve to the same data scope.

Deployment

Frontend on Vercel (React 19 + Vite 7). Backend on Railway (FastAPI + Python, yt-dlp for YouTube audio extraction). Production frontend calls Railway directly to bypass Vercel's body-size proxy limit.

Platform

Web only — no native mobile or desktop app. The UI is responsive and works on mobile browsers, but browser AI inference is CPU-intensive and performs best on desktop.

AudioWeb

About

Reviews (0)