Speech7 API Docs

Authentication

Most endpoints expect your Speech7 API key. Send it as apiKey in the request body or use the x-api-key header (or ?apiKey=... query) for fetch requests that do not accept JSON bodies.

Admin endpoints require an admin login in the web UI or an admin API key (set isAdmin=true when creating the account).

Capabilities depend on provider keys saved on the account. Use GET https://app.speech7.com/account/capabilities with your API key to see if TTS and STT are configured.

Check capabilities

POST /tts

Generate speech from text. Streams an MP3 response. Requires a TTS provider key saved on the account.

POST https://app.speech7.com/tts
Content-Type: application/json
{
  "apiKey": "your_api_key",
  "text": "Hello from Speech7"
}

Limits: min 500 chars / 20 words, max 500 words.

Add "json": true to return JSON metadata instead of streaming audio (fields: audioPath and id).

Quick test: curl -o out.mp3 -X POST https://app.speech7.com/tts -H "Content-Type: application/json" -d '{"apiKey":"YOUR_KEY","text":"Hello world from the API."}'

GET /speeches

List recent speeches for your API key (newest first).

GET https://app.speech7.com/speeches?limit=50
Headers: x-api-key: your_api_key

Returns speeches array with id, timestamps, usage stats, and audioPath. Use the IDs with GET /speeches/:speechId.

Quick test: curl -H "x-api-key: YOUR_KEY" https://app.speech7.com/speeches

GET /speeches/:speechId

Fetch details for a specific speech. Add ?download=1 to stream the stored MP3.

GET https://app.speech7.com/speeches/file001-123456?apiKey=your_api_key

Quick test: use an ID from /speeches then curl -L "https://app.speech7.com/speeches/ID?download=1&apiKey=YOUR_KEY" -o speech.mp3

POST /stt

Transcribe audio or video to text using the configured speech-to-text provider. Requires an STT provider key saved on the account. STT usage consumes minutes from the same pool as TTS (quantized to 0.5-minute increments). Responses include text, speakers, and transcript entries.

Speaker diarization is disabled by default; enable it by sending enableSpeakerDiarization=true. You can also override speaker labels by passing speaker1Label, speaker2Label, speaker3Label, etc. in the form data (e.g., speaker1Label=John). These labels are used in both the diarized text and the speakers array in the response.

Custom dictionaries are supported via transcriptionTerms/customDictionary (merged into context.terms) and translationTerms/translationDictionary (merged into context.translation_terms). Provide JSON or delimited text.

Also accepted for transcription dictionaries: terms and customTerms. If you already send context, dictionary fields are merged into it, not replaced.

Max upload size: 200 MB by default. Exceeding this returns HTTP 413 File too large. Supported types: audio/*, video/*.

Add json=true to get the structured transcript array: [{ transcript, text, durationMs }].

curl -X POST https://app.speech7.com/stt \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@sample.wav" \
  -F "transcriptionTerms=[\"Speech7\",\"Soniox\"]" \
  -F "translationTerms={\"račun\":\"invoice\"}"

Quick test (text only): curl -X POST https://app.speech7.com/stt -F "audio=@sample.mp3" | jq .text

POST /subtitle

Generate SubRip subtitles (.srt) for an audio or video file. Uses the same STT provider as /stt and charges minutes from your pool.

Upload the media as audio (multipart/form-data). You can optionally include languageHints and speaker labels (speaker1Label, speaker2Label, …). The response is an SRT file download.

Subtitle endpoints accept the same dictionary fields as /stt (transcriptionTerms, translationTerms, etc.) and pass them to the STT provider context.

Optional formatting: maxCharsPerLine (default 70, min 30, max 120) and maxLines (default 2, min 1, max 4) to control line wrapping.

Max upload size: 25 MB by default. Supported: audio/*, video/*.

curl -X POST https://app.speech7.com/subtitle \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@sample.mp4" \
  -o subtitles.srt

Async subtitles (large files)

For long uploads or when your client may time out, start a job and fetch the SRT later. Jobs are kept for 24 hours.

# 1) Start a job (multipart upload)
curl -X POST https://app.speech7.com/subtitle/jobs \
  -H "x-api-key: YOUR_API_KEY" \
  -F "audio=@movie.mp4"

# Response: {"id":"sub_ab12...","status":"queued","statusUrl":"/subtitle/jobs/sub_ab12..."}

# 2) Poll status (repeat until status=completed)
curl -H "x-api-key: YOUR_API_KEY" \
  https://app.speech7.com/subtitle/jobs/sub_ab12...

# 3) Download SRT when ready
curl -L -H "x-api-key: YOUR_API_KEY" \
  https://app.speech7.com/subtitle/jobs/sub_ab12.../file \
  -o subtitles.srt

Status responses return minutesUsed, durationMs, and a downloadUrl when ready.

Realtime STT (WebSocket)

Stream microphone audio in real time and receive partial/final transcripts as JSON. Requires an STT provider key on the account.

The frontend includes a mic demo card on the home page. You can also connect directly to the WebSocket endpoint:

In the demo, choose Language A and Language B to enable two-way translation; leave both as “None” for transcription only.

Realtime config also accepts dictionary fields transcriptionTerms/customDictionary and translationTerms/translationDictionary. They are merged into context.terms and context.translation_terms.

You can also use terms or customTerms for transcription dictionaries in realtime config payloads.

wss://app.speech7.com/stt/realtime?apiKey=YOUR_KEY
// First message must be JSON config (no api_key needed)
{
  "audio_format": "auto",
  "language_hints": ["sl"],
  "enable_speaker_diarization": true,
  "enable_language_identification": true,
  "enable_endpoint_detection": true,
  "transcriptionTerms": ["Speech7", "Soniox"],
  "translationTerms": { "račun": "invoice" },
  // Optional: turn on two-way translation by sending both languageA and languageB
  "languageA": "en",
  "languageB": "sl"
}

// Then send audio frames as binary (e.g., Opus WebM chunks or PCM)
// Send an empty frame ("") to finalize and close.

When translation is enabled, each message includes a translation summary:

{
  "languageA": { "code": "sr", "finalText": "Dobar dan. Kako smo?", "nonFinalText": "" },
  "languageB": { "code": "de", "finalText": "Guten Tag. Wie geht es uns?", "nonFinalText": "" },
  "type": "two_way_transcription"
}

Final tokens include is_final. A finished: true message is sent before the server closes the connection.

Quick browser test: open the home page, enter your API key, and use the "WS /stt/realtime" card to start the mic.

Programmatic account creation

If you have an API key flagged as admin (isAdmin=true ), call:

POST /accounts
{
  "apiKey": "admin_api_key",
  "startingMinutes": 60,
  "minutes": 60,
  "isAdmin": true,
  "ttsApiKey": "your-tts-provider-key",
  "sttApiKey": "your-stt-provider-key",
  "companyId": "cmp_demo_123",
  "companyName": "Acme",
  "contactName": "Jane Doe",
  "contactPhone": "+1 555 555 5555",
  "companyAddress": "123 Example St",
  "companyVATId": "EU123456789",
  "companyRegistrationNumber": "HRB 98765",
  "dateAddedCompany": "2025-11-21",
  "timeAddedCompany": "09:00:00"
}

Quick test: curl -X POST https://app.speech7.com/accounts -H "Content-Type: application/json" -d '{"apiKey":"ADMIN_KEY","startingMinutes":30,"companyName":"Test Co","contactName":"Jane"}'