API Reference

Endpoints for text-to-speech, speech retrieval, speech-to-text, and subtitles, plus account management.

Authentication

Most endpoints expect your Speech7 API key. Send it as apiKey in the request body or use the x-api-key header (or ?apiKey=... query) for fetch requests that do not accept JSON bodies.

Admin endpoints require an admin login in the web UI or an admin API key (set isAdmin=true when creating the account).

Capabilities depend on provider keys saved on the account. Use GET https://app.speech7.com/account/capabilities with your API key to see if TTS and STT are configured.

POST /tts

Generate speech from text. Streams an MP3 response. Requires a TTS provider key saved on the account.

POST https://app.speech7.com/tts
Content-Type: application/json
{
  "apiKey": "your_api_key",
  "text": "Hello from Speech7"
}

Limits: min 500 chars / 20 words, max 500 words.

Add "json": true to return JSON metadata instead of streaming audio (fields: audioPath and id).

Quick test: curl -o out.mp3 -X POST https://app.speech7.com/tts -H "Content-Type: application/json" -d '{"apiKey":"YOUR_KEY","text":"Hello world from the API."}'

GET /speeches

List recent speeches for your API key (newest first).

GET https://app.speech7.com/speeches?limit=50
Headers: x-api-key: your_api_key

Returns speeches array with id, timestamps, usage stats, and audioPath. Use the IDs with GET /speeches/:speechId.

Quick test: curl -H "x-api-key: YOUR_KEY" https://app.speech7.com/speeches

GET /speeches/:speechId

Fetch details for a specific speech. Add ?download=1 to stream the stored MP3.

GET https://app.speech7.com/speeches/file001-123456?apiKey=your_api_key

Quick test: use an ID from /speeches then curl -L "https://app.speech7.com/speeches/ID?download=1&apiKey=YOUR_KEY" -o speech.mp3

POST /stt

Transcribe audio or video to text using the configured speech-to-text provider. Requires an STT provider key saved on the account. STT usage consumes minutes from the same pool as TTS (quantized to 0.5-minute increments). Responses include text, speakers, and transcript entries.

Speaker diarization is disabled by default; enable it by sending enableSpeakerDiarization=true. You can also override speaker labels by passing speaker1Label, speaker2Label, speaker3Label, etc. in the form data (e.g., speaker1Label=John). These labels are used in both the diarized text and the speakers array in the response.

Custom dictionaries are supported via transcriptionTerms/customDictionary (merged into context.terms) and translationTerms/translationDictionary (merged into context.translation_terms). Provide JSON or delimited text.

Also accepted for transcription dictionaries: terms and customTerms. If you already send context, dictionary fields are merged into it, not replaced.

Max upload size: 200 MB by default. Exceeding this returns HTTP 413 File too large. Supported types: audio/*, video/*.

Add json=true to get the structured transcript array: [{ transcript, text, durationMs }].

curl -X POST https://app.speech7.com/stt \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@sample.wav" \
  -F "transcriptionTerms=[\"Speech7\",\"Soniox\"]" \
  -F "translationTerms={\"račun\":\"invoice\"}"

Quick test (text only): curl -X POST https://app.speech7.com/stt -F "audio=@sample.mp3" | jq .text

POST /subtitle

Generate SubRip subtitles (.srt) for an audio or video file. Uses the same STT provider as /stt and charges minutes from your pool.

Upload the media as audio (multipart/form-data). You can optionally include languageHints and speaker labels (speaker1Label, speaker2Label, …). The response is an SRT file download.

Subtitle endpoints accept the same dictionary fields as /stt (transcriptionTerms, translationTerms, etc.) and pass them to the STT provider context.

Optional formatting: maxCharsPerLine (default 70, min 30, max 120) and maxLines (default 2, min 1, max 4) to control line wrapping.

Max upload size: 25 MB by default. Supported: audio/*, video/*.

curl -X POST https://app.speech7.com/subtitle \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@sample.mp4" \
  -o subtitles.srt

Async subtitles (large files)

For long uploads or when your client may time out, start a job and fetch the SRT later. Jobs are kept for 24 hours.

# 1) Start a job (multipart upload)
curl -X POST https://app.speech7.com/subtitle/jobs \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@movie.mp4"

# Response: {"id":"sub_ab12...","status":"queued","statusUrl":"/subtitle/jobs/sub_ab12..."}

# 2) Poll status (repeat until status=completed)
curl -H "x-api-key: YOUR_API_KEY" \
  https://app.speech7.com/subtitle/jobs/sub_ab12...

# 3) Download SRT when ready
curl -L -H "x-api-key: YOUR_API_KEY" \
  https://app.speech7.com/subtitle/jobs/sub_ab12.../file \
  -o subtitles.srt

Status responses return minutesUsed, durationMs, and a downloadUrl when ready.