Skip to content

useSpeechRecognition

Hook to use SpeechRecognition API.

See: Web Speech API.

Demo

INFO

This component use useSpeechRecognition hook to simulate a Speech color change app. When button start is clicked, you can say an HTML color keyword and the bordered div color will change to that color.

Loading demo…
Show source code
tsx
import { useCallback, useMemo, useRef, useState } from "react";
import { SpeechGrammarList, SpeechRecognitionErrorEvent, usePerformAction, useSpeechRecognition } from "../../../.."

const SpeechGrammarListP = ((window as any).SpeechGrammarList || (window as any).webkitSpeechGrammarList);

export const UseSpeechRecognition = () => {
	const colors = useRef(['aqua', 'azure', 'beige', 'bisque', 'black', 'blue', 'brown', 'chocolate', 'coral', 'crimson', 'cyan', 'fuchsia', 'ghostwhite', 'gold', 'goldenrod', 'gray', 'green', 'indigo', 'ivory', 'khaki', 'lavender', 'lime', 'linen', 'magenta', 'maroon', 'moccasin', 'navy', 'olive', 'orange', 'orchid', 'peru', 'pink', 'plum', 'purple', 'red', 'salmon', 'sienna', 'silver', 'snow', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'white', 'yellow', 'transparent']);
	const grammar = `#JSGF V1.0; grammar colors; public <color> = ${colors.current.join(' | ')} ;`

	const btnRef = useRef<HTMLButtonElement>(null);
	const perform = usePerformAction(() => btnRef.current?.focus());

	const [message, setMessage] = useState("Ready");

	const [state, start] = useSpeechRecognition({
		onStart: useCallback(() => {
			setMessage("Listening...")
		}, []),
		onEnd: useCallback((ev, {stop}) => {
			stop();
			setMessage("Finish");
			perform();
		}, [perform]),
		onNoMatch: useCallback(() => {
			setMessage("Color not recognized.")
		}, []),
		onError: useCallback((ev: SpeechRecognitionErrorEvent) => {
			setMessage(`Error occurred in recognition: ${ev.message ? ev.message : ev.error}`);
		}, []),
	});

	const onStart = () => {
		const grammars = new SpeechGrammarListP() as SpeechGrammarList;
		grammars.addFromString(grammar, 1);
		start({
			lang: "en-US",
			continuous: false,
			interimResults: false,
			maxAlternatives: 1,
			grammars
		});
	}

	const color = useMemo(() => {
		let colr = "transparent";
		if (state.result.results) {
			const color = state.result.results[0][0].transcript;
			if (colors.current.includes(color)) {
				colr = color;
			}
		}
		return colr;
	}, [state.result.results]);

	return <div style={{ display: "flex", flexDirection: "column", justifyContent: "center", gap: 10 }}>
		{
			state.isSupported
				? <>
					<div style={{ display: 'flex', flexDirection: 'column' }}>
						<p>Click on start to say a color to change backgroundColor of bordered div. Try</p>
						<div style={{ display: 'flex', flexWrap: "wrap", gap: 10 }}>
							{
								colors.current.map(el => <span key={el} style={{ color: el }}>{el}</span>)
							}
						</div>
					</div>
					<p>{message}</p>
					<div style={{ border: "1px solid lightgray", width: 300, height: 150, backgroundColor: color, margin: '0 auto' }}>
						{
							state.result && <p>Color is: {color}</p>
						}
					</div>
					<div style={{ display: 'flex', justifyContent: "center", gap: 10 }}>
						<button ref={btnRef} onClick={onStart} disabled={state.isListening}>Start</button>
					</div>
				</>
				: <p>Speech Recognition not supported</p>
		}
	</div>
}

Types

SpeechRecognitionControls

Imperative controls for the speech recognition instance, injected as the second argument of every event callback in {@link UseSpeechRecognitionProps}. Using these controls inside a callback is always safe — they delegate to the latest internal implementation via stable refs, so no circular dependency or stale closure issues arise.

PropertyTypeRequiredDescription
start(config?: SpeechRecognitionConfig) => voidStarts a new recognition session, optionally overriding the defaultConfig passed to the hook.
stop() => voidStops the current recognition session and attempts to return results for audio captured so far.
reset(resultAlso?: boolean) => voidAborts the current recognition session. Pass true to also clear the last result from state.

UseSpeechRecognitionProps

Configuration object accepted by useSpeechRecognition.

All callback properties mirror the corresponding event handlers on the native SpeechRecognition interface.

PropertyTypeRequiredDescription
alreadyStartedbooleanWhen true, recognition starts automatically on mount (equivalent to calling start() immediately). Useful for "always listening" scenarios.
defaultConfigSpeechRecognitionConfigDefault configuration applied to the SpeechRecognition instance. See {@link SpeechRecognitionConfig} for the available options.
onAudioStart(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when the user agent has started capturing audio.
onAudioEnd(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when the user agent has finished capturing audio.
onEnd(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when the speech recognition service disconnects.
onError(this: SpeechRecognition, ev: SpeechRecognitionErrorEvent, controls: SpeechRecognitionControls) => voidFired when a speech recognition error occurs.
onNoMatch(this: SpeechRecognition, ev: SpeechRecognitionEvent, controls: SpeechRecognitionControls) => voidFired when no significant recognition was returned.
onResult(this: SpeechRecognition, ev: SpeechRecognitionEvent, controls: SpeechRecognitionControls) => voidFired when a word or phrase has been positively recognised.
onSoundStart(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when any sound (recognisable or not) has been detected.
onSoundEnd(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when any sound has stopped being detected.
onSpeechStart(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when speech recognised by the service has been detected.
onSpeechEnd(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when speech recognised by the service has stopped being detected.
onStart(this: SpeechRecognition, ev: Event, controls: SpeechRecognitionControls) => voidFired when the speech recognition service has begun listening.

SpeechRecognitionConfig

Initial configuration options for the SpeechRecognition instance.

PropertyTypeRequiredDescription
grammarsSpeechGrammarListReturns and sets a collection of SpeechGrammar objects that represent the grammars that will be understood by the current SpeechRecognition.
langLanguageBCP47TagsBCP 47 language tag for recognition (e.g. "en-US", "it-IT"). Falls back to the &lt;html lang&gt; attribute or the user agent's language setting.
continuousbooleanWhen true, continuous results are returned for each recognition phrase. When false (default), recognition stops after the first result.
interimResultsbooleanWhen true, interim (non-final) results are also returned. When false, only final results are delivered.
maxAlternativesnumberMaximum number of SpeechRecognitionAlternative objects per result.

SpeechRecognitionErrorCode

ts
export type SpeechRecognitionErrorCode = 'aborted' | 'audio-capture' | 'bad-grammar' | 'language-not-supported' | 'network' | 'no-speech' | 'not-allowed' | 'service-not-allowed'

SpeechRecognitionErrorEvent

PropertyTypeRequiredDescription
errorSpeechRecognitionErrorCode
messagestring

SpeechRecognitionEvent

PropertyTypeRequiredDescription
resultIndexnumberReturns the lowest index value result in the SpeechRecognitionResultList "array" that has actually changed.
resultsSpeechRecognitionResultListReturns a SpeechRecognitionResultList object representing all the speech recognition results for the current session.

SpeechGrammar

The SpeechGrammar interface of the Web Speech API represents a set of words or patterns of words that we want the recognition service to recognize.

PropertyTypeRequiredDescription
srcstringSets and returns a string containing the grammar from within in the SpeechGrammar object instance.
weightnumberSets and returns the weight of the SpeechGrammar object.

SpeechGrammarList

The SpeechGrammarList interface of the Web Speech API represents a list of SpeechGrammar objects containing words or patterns of words that we want the recognition service to recognize.

PropertyTypeRequiredDescription
lengthnumberReturns the number of SpeechGrammar objects contained in the SpeechGrammarList.
item(index: number) => SpeechGrammarStandard getter — allows individual SpeechGrammar objects to be retrieved from the SpeechGrammarList using array syntax.
addFromURI( src: string, weight?: number ) => undefinedTakes a grammar present at a specific URI and adds it to the SpeechGrammarList as a new SpeechGrammar object.
addFromString( src: string, weight?: number ) => undefinedAdds a grammar in a string to the SpeechGrammarList as a new SpeechGrammar object.
``any

SpeechRecognitionState

Reactive state snapshot returned by useSpeechRecognition.

PropertyTypeRequiredDescription
isSupportedbooleantrue when the Web Speech API (SpeechRecognition or webkitSpeechRecognition) is available in the current browser.
isListeningbooleantrue while the speech recognition service is actively listening for speech. Becomes false after stop() is called or recognition ends automatically.
result{ results: SpeechRecognitionEvent["results"] \| null; resultIndex: SpeechRecognitionEvent["resultIndex"] \| null; }The most recent recognition result. - results — a SpeechRecognitionResultList containing all result alternatives. - resultIndex — index of the most recent result in the list. Both are null before the first result arrives.

SpeechRecognition

The SpeechRecognition Web API interface.

PropertyTypeRequiredDescription
grammarsSpeechGrammarListGrammar list used by this recognition instance.
langLanguageBCP47TagsBCP 47 language tag. Falls back to the document's lang attribute or the user agent's default language.
continuousbooleanWhen true, recognition continues returning results until stop() is called.
interimResultsbooleanWhen true, interim (non-final) results are delivered via onresult.
maxAlternativesnumberMaximum number of recognition alternatives per result.
onaudiostart((this: SpeechRecognition, ev: Event) => void) \| nullCalled when audio capture starts.
onaudioend((this: SpeechRecognition, ev: Event) => void) \| nullCalled when audio capture ends.
onend((this: SpeechRecognition, ev: Event) => void) \| nullCalled when the service disconnects.
onerror((this: SpeechRecognition, ev: SpeechRecognitionErrorEvent) => void) \| nullCalled when a recognition error occurs.
onnomatch((this: SpeechRecognition, ev: SpeechRecognitionEvent) => void) \| nullCalled when no significant recognition was found.
onresult((this: SpeechRecognition, ev: SpeechRecognitionEvent) => void) \| nullCalled when a recognition result is returned.
onsoundstart((this: SpeechRecognition, ev: Event) => void) \| nullCalled when any detectable sound starts.
onsoundend((this: SpeechRecognition, ev: Event) => void) \| nullCalled when detectable sound stops.
onspeechstart((this: SpeechRecognition, ev: Event) => void) \| nullCalled when recognised speech starts.
onspeechend((this: SpeechRecognition, ev: Event) => void) \| nullCalled when recognised speech ends.
onstart((this: SpeechRecognition, ev: Event) => void) \| nullCalled when the recognition service starts listening.
start() => voidStarts listening for speech.
stop() => voidStops listening; returns results for speech recognised so far.
abort() => voidStops listening without returning results.

Released under the MIT License.