Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
6.2.1. Speech-to-Text (STT)
đź”§ Implementation Reference: Speech-to-Text
| Item | Value |
|---|---|
| Package | azure-cognitiveservices-speech |
| Classes | SpeechConfig, SpeechRecognizer |
| Endpoint | POST https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1 |
Testable Pattern:
import azure.cognitiveservices.speech as speechsdk
speech_config = speechsdk.SpeechConfig(subscription=key, region=region)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
result = recognizer.recognize_once()
text = result.text
Error Handling Pattern:
import azure.cognitiveservices.speech as speechsdk
def transcribe_with_error_handling(audio_file: str) -> str:
"""Speech-to-text with comprehensive error handling."""
speech_config = speechsdk.SpeechConfig(subscription=key, region=region)
audio_config = speechsdk.AudioConfig(filename=audio_file)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
result = recognizer.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
return result.text
elif result.reason == speechsdk.ResultReason.NoMatch:
# Audio was processed but no speech detected
no_match_detail = result.no_match_details
logging.warning(f"No speech recognized: {no_match_detail.reason}")
return ""
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation = result.cancellation_details
if cancellation.reason == speechsdk.CancellationReason.Error:
logging.error(f"Speech error: {cancellation.error_code} - {cancellation.error_details}")
if cancellation.error_code == speechsdk.CancellationErrorCode.ConnectionFailure:
raise ConnectionError("Speech service connection failed")
elif cancellation.error_code == speechsdk.CancellationErrorCode.AuthenticationFailure:
raise PermissionError("Invalid speech subscription key or region")
return ""
CLI Equivalent (REST):
curl -X POST "https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: audio/wav" \
--data-binary @audio.wav
⚠️ Exam Trap: Check result.reason for speech operations—NoMatch means audio was processed but no speech detected; Canceled with ErrorCode indicates a service error.
Written byAlvin Varughese
Founder•15 professional certifications