XPRO API Reference
Endpoints for text-to-speech, speech retrieval, speech-to-text, and subtitles, plus account management.
Authentication
Most endpoints expect your Speech7 API key. Send it as
apiKey in the request body or use the
x-api-key header (or ?apiKey=... query)
for fetch requests that do not accept JSON bodies.
Admin endpoints require an admin login in the web UI or an admin API
key (set isAdmin=true when creating the account).
Capabilities depend on provider keys saved on the account. Use
GET https://app.speech7.com/account/capabilities with
your API key to see if TTS and STT are configured.
POST /tts
Generate speech from text. Streams an MP3 response. Requires a TTS provider key saved on the account.
POST https://app.speech7.com/tts
Content-Type: application/json
{
"apiKey": "your_api_key",
"text": "Hello from Speech7"
}
Limits: min 500 chars / 20 words, max 500 words.
Add "json": true to return JSON metadata instead of
streaming audio (fields: audioPath and
id).
Quick test:
curl -o out.mp3 -X POST https://app.speech7.com/tts -H
"Content-Type: application/json" -d
'{"apiKey":"YOUR_KEY","text":"Hello world from the API."}'
GET /speeches
List recent speeches for your API key (newest first).
GET https://app.speech7.com/speeches?limit=50
Headers: x-api-key: your_api_key
Returns speeches array with id,
timestamps, usage stats, and audioPath. Use the IDs
with GET /speeches/:speechId.
Quick test:
curl -H "x-api-key: YOUR_KEY"
https://app.speech7.com/speeches
GET /speeches/:speechId
Fetch details for a specific speech. Add ?download=1 to
stream the stored MP3.
GET https://app.speech7.com/speeches/file001-123456?apiKey=your_api_key
Quick test: use an ID from /speeches then
curl -L
"https://app.speech7.com/speeches/ID?download=1&apiKey=YOUR_KEY"
-o speech.mp3
POST /stt
Transcribe audio or video to text using the configured
speech-to-text provider. Requires an STT provider key saved on the
account. STT usage consumes minutes from the same pool as TTS
(quantized to 0.5-minute increments). Responses include
text, speakers, and
transcript entries.
Speaker diarization is disabled by default; enable it by sending
enableSpeakerDiarization=true. You can also override
speaker labels by passing speaker1Label,
speaker2Label, speaker3Label, etc. in the
form data (e.g., speaker1Label=John). These labels are
used in both the diarized text and the speakers array
in the response.
Custom dictionaries are supported via
transcriptionTerms/customDictionary
(merged into context.terms) and
translationTerms/translationDictionary
(merged into context.translation_terms). Provide JSON
or delimited text.
Also accepted for transcription dictionaries:
terms and customTerms. If you already send
context, dictionary fields are merged into it, not
replaced.
Max upload size: 200 MB by default. Exceeding this returns HTTP 413
File too large. Supported types: audio/*, video/*.
Add json=true to get the structured transcript array:
[{ transcript, text, durationMs }].
curl -X POST https://app.speech7.com/stt \
-H "x-api-key: YOUR_API_KEY" \
-F "audio=@sample.wav" \
-F "transcriptionTerms=[\"Speech7\",\"Soniox\"]" \
-F "translationTerms={\"račun\":\"invoice\"}"
Quick test (text only):
curl -X POST https://app.speech7.com/stt -F "audio=@sample.mp3" |
jq .text
POST /subtitle
Generate SubRip subtitles (.srt) for an audio or video file. Uses
the same STT provider as /stt and charges minutes from
your pool.
Upload the media as audio (multipart/form-data). You
can optionally include languageHints and speaker labels
(speaker1Label, speaker2Label, …). The
response is an SRT file download.
Subtitle endpoints accept the same dictionary fields as
/stt (transcriptionTerms,
translationTerms, etc.) and pass them to the STT
provider context.
Optional formatting: maxCharsPerLine (default 70, min
30, max 120) and maxLines (default 2, min 1, max 4) to
control line wrapping.
Max upload size: 25 MB by default. Supported: audio/*, video/*.
curl -X POST https://app.speech7.com/subtitle \
-H "x-api-key: YOUR_API_KEY" \
-F "audio=@sample.mp4" \
-o subtitles.srt
Async subtitles (large files)
For long uploads or when your client may time out, start a job and fetch the SRT later. Jobs are kept for 24 hours.
# 1) Start a job (multipart upload)
curl -X POST https://app.speech7.com/subtitle/jobs \
-H "x-api-key: YOUR_API_KEY" \
-F "audio=@movie.mp4"
# Response: {"id":"sub_ab12...","status":"queued","statusUrl":"/subtitle/jobs/sub_ab12..."}
# 2) Poll status (repeat until status=completed)
curl -H "x-api-key: YOUR_API_KEY" \
https://app.speech7.com/subtitle/jobs/sub_ab12...
# 3) Download SRT when ready
curl -L -H "x-api-key: YOUR_API_KEY" \
https://app.speech7.com/subtitle/jobs/sub_ab12.../file \
-o subtitles.srt
Status responses return minutesUsed,
durationMs, and a downloadUrl when ready.
Realtime STT (WebSocket)
Stream microphone audio in real time and receive partial/final transcripts as JSON. Requires an STT provider key on the account.
The frontend includes a mic demo card on the home page. You can also connect directly to the WebSocket endpoint:
In the demo, choose Language A and Language B to enable two-way translation; leave both as “None” for transcription only.
Realtime config also accepts dictionary fields
transcriptionTerms/customDictionary and
translationTerms/translationDictionary.
They are merged into context.terms and
context.translation_terms.
You can also use terms or customTerms for
transcription dictionaries in realtime config payloads.
wss://app.speech7.com/stt/realtime?apiKey=YOUR_KEY
// First message must be JSON config (no api_key needed)
{
"audio_format": "auto",
"language_hints": ["sl"],
"enable_speaker_diarization": true,
"enable_language_identification": true,
"enable_endpoint_detection": true,
"transcriptionTerms": ["Speech7", "Soniox"],
"translationTerms": { "račun": "invoice" },
// Optional: turn on two-way translation by sending both languageA and languageB
"languageA": "en",
"languageB": "sl"
}
// Then send audio frames as binary (e.g., Opus WebM chunks or PCM)
// Send an empty frame ("") to finalize and close.
When translation is enabled, each message includes a
translation summary:
{
"languageA": { "code": "sr", "finalText": "Dobar dan. Kako smo?", "nonFinalText": "" },
"languageB": { "code": "de", "finalText": "Guten Tag. Wie geht es uns?", "nonFinalText": "" },
"type": "two_way_transcription"
}
Final tokens include is_final. A
finished: true message is sent before the server closes
the connection.
Quick browser test: open the home page, enter your API key, and use the "WS /stt/realtime" card to start the mic.
Programmatic account creation
If you have an API key flagged as admin (isAdmin=true
), call:
POST /accounts
{
"apiKey": "admin_api_key",
"startingMinutes": 60,
"minutes": 60,
"isAdmin": true,
"ttsApiKey": "your-tts-provider-key",
"sttApiKey": "your-stt-provider-key",
"companyId": "cmp_demo_123",
"companyName": "Acme",
"contactName": "Jane Doe",
"contactPhone": "+1 555 555 5555",
"companyAddress": "123 Example St",
"companyVATId": "EU123456789",
"companyRegistrationNumber": "HRB 98765",
"dateAddedCompany": "2025-11-21",
"timeAddedCompany": "09:00:00"
}
Quick test:
curl -X POST https://app.speech7.com/accounts -H "Content-Type:
application/json" -d
'{"apiKey":"ADMIN_KEY","startingMinutes":30,"companyName":"Test
Co","contactName":"Jane"}'