I'm using Google Cloud Speech-to-Text V2 streaming recognition with the chirp_3 model over gRPC.
I enabled interim results, but I only ever receive final results ( isFinal=true ). I never receive any interim results ( isFinal=false ).
My streaming config is essentially:
StreamingRecognitionConfig streamingConfig =
StreamingRecognitionConfig.newBuilder()
.setConfig(
RecognitionConfig.newBuilder()
.addLanguageCodes(languageCode)
.setModel("chirp_3")
.setAutoDecodingConfig(AutoDetectDecodingConfig.newBuilder().build())
.setFeatures(
RecognitionFeatures.newBuilder()
.setEnableAutomaticPunctuation(true)
.build())
.build())
.setStreamingFeatures(
StreamingRecognitionFeatures.newBuilder()
.setInterimResults(true)
.build())
.build();
I tested with:
- cmn-Hans-CN
- en-US
- WEBM_OPUS
- LINEAR16
The following is the code for processing the returned result:
private void consumeV2Responses(RuntimeSession runtimeSession) {
for (com.google.cloud.speech.v2.StreamingRecognizeResponse response : runtimeSession.getV2Stream()) {
for (com.google.cloud.speech.v2.StreamingRecognitionResult result : response.getResultsList()) {
if (result.getAlternativesCount() == 0) {
continue;
}
com.google.cloud.speech.v2.SpeechRecognitionAlternative alternative = result.getAlternatives(0);
logRecognizeResult(runtimeSession, result.getIsFinal(), alternative.getTranscript(),
resolveConfidence(alternative.getConfidence(), result.getIsFinal()));
if (!StringUtils.hasText(alternative.getTranscript())) {
continue;
}
sendTranscriptResult(runtimeSession, alternative.getTranscript(),
resolveConfidence(alternative.getConfidence(), result.getIsFinal()),
result.getIsFinal());
}
}
}
private void logRecognizeResult(RuntimeSession runtimeSession, boolean finalResult, String transcript, Float confidence) {
log.info("[recognizeResult][sessionId({}) meetingId({}) provider({}) final({}) transcriptLength({}) confidence({}) transcript({})]",
runtimeSession.getSessionId(), runtimeSession.getMeetingId(), runtimeSession.getProvider(),
finalResult, transcript != null ? transcript.length() : 0, confidence, abbreviateTranscript(transcript));
}
In all cases, my server logs only show final results. For example:
[recognizeResult][sessionId(...)] provider(v2_chirp3) final(true) transcript(Hello everyone, my name is Nick.)
I never see any final(false) / isFinal=false results.
I also verified that:
- I am definitely using V2 StreamingRecognize
- interimResults=true is actually being sent
- my server is not filtering partial results before logging them
- I tested both WEBM_OPUS and raw LINEAR16
So my question is:
- Does chirp_3 actually emit interim results in practice?
- Or is it expected that V2 chirp_3 may return final-only results even when interim_results=true ?
If anyone has a confirmed example of V2 chirp_3 returning isFinal=false , that would be very helpful.