The way that we interact with computers has changed dramatically over the past decade. Touch-screen devices and laptop trackpads have enabled a much more intuitive form of interaction than is achievable using a traditional mouse. These changes haven’t been limited to just hardware. Gestures, predictive text, and speech recognition are all examples of software innovations that have improved the way in which we interact with our devices.
Speech recognition has somewhat eluded innovators for decades. Many organizations have tried (with varying levels of success) to create reliable speech recognition technologies. There is one company however that looks to have cracked the problem – Google.
In this post you’ll be learning how to take advantage of Google’s speech recognition technologies to enhance your web forms. You’ll learn how to give Chrome users the ability to fill-in text fields using speech, and how to detect support for this new voice input capability in browsers.
Let’s get started.
Contents
How to Add Voice Input in HTML
Enabling support for speech input is as simple as adding an attribute to your <input>
elements. The x-webkit-speech
attribute will indicate to the browser that the user should be given the option to complete this form field using speech input.
<input type="text" x-webkit-speech>
When speech input is enabled the element will have a small microphone icon displayed on the right of the input. Clicking on this icon will launch a small tooltip to show that your voice is now being recorded. You can also start speech input by focussing the element and pressing Ctrl
+ Shift
+ .
on Windows, or Command
+ Shift
+ .
on Mac.
In JavaScript, you can test to see if an element has speech input enabled by examining it’s webkitSpeech
property. This is a boolean property and will therefore be set to true
or false
. You can override this property to enable or disable speech input on an element.
// Enable
element.webkitSpeech = true;
// Disable
element.webkitSpeech = false;
A Caveat About Input Types
Speech input is not available for all the different HTML5 input types. In my testing I found that the text
, number
, and tel
types do support speech input whereas the email
, url
, date
, and month
input types don’t.
If you apply the x-webkit-speech
attribute to an <input>
element with an unsupported input type, the webkitSpeech
property on that element will still be set to true
. You therefore cannot rely on this property to tell if the browser is displaying the speech input controls, only that the browser supports speech input in general.
Detecting Browser Support for Speech Recognition
A simple way of checking if the user’s browser supports speech input is to look for the webkitSpeech
property on an <input>
element. An example of how to do this is shown below.
if (document.createElement('input').webkitSpeech === undefined) {
// Not supported
} else {
// Supported!
}
Google Chrome is the only browser that currently supports speech input. We’ll examine the reasons for this in the next section.
How Speech Recognition Works
The browser relies on an external service to handle speech-to-text conversion. The recording of your voice is sent to this service which then analyses the audio and constructs a textual representation. The text is then sent back to the browser which populates the <input>
element to complete the process. Many speech-to-text services incorporate machine-learning algorithms that allow them to get more accurate over time.
Note: A side effect of using an external service to handle speech-to-text is that you will need an internet connection for speech input to work. This is something to keep in mind if you plan for your web application to work offline.
The Chrome browser relies on Google’s proprietary speech recognition technology to provide the functionality behind x-webkit-speech
. Google has had a team working on speech recognition and natural language processing for a long time. It’s this team that’s been responsible for developing the complex systems needed to provide a reliable speech-to-text service for products like Google Translate and Voice Search.
Note: If you’re interested in learning more about how speech-to-text works check out the research papers published by Google engineers.
Developing speech-to-text services is incredibly difficult and requires a significant amount of investment. This is probably the main reason why no other browser vendor has implemented speech recognition yet. However, now that Apple has acquired Siri, I’m interested to see if speech recognition will make its way into Safari some time soon.
Summary
In this post you’ve learned about the x-webkit-speech
attribute and how it can be used to add speech input capabilities to your web forms. There is also a more advanced Web Speech API that we haven’t covered in this post. This API allows developers to add speech recognition functionality to more aspects of their applications, and even synthesize speech from text.
Whether it’s in the computer on your desk, or the phone in your pocket, software innovations like Google Voice Search and Siri are paving the way for a revolution in how we interact with computers. Welcome to the future my friends, now if only someone could figure out the whole teleportation thing.
What’s up colleagues, good paragraph and nihe urging commented here, I am genuinely enjoying by these.
is there any way to send audio files as an input to google translate website using java program for translation ?
What a Ԁata of un-ambiguity and preserveness
of precious know-how regarding unpredicted feelings.
Is this demo still working, I tried in chrome v55 in window, also tried in android chrome. Doesn’t work.
x-webkit-speech is now deprecated.
None of the older webkit speech demos I’ve found works in Chrome any more.
Hi Hagge , Just tried the same above and chrome is not working anymore .i’m using chrome version46.2 in Mac .Did you find anything ?
Thanks matt for the article.would like to create my next app around speech recognition technology
Firefox…
I have a project that I want to implement this on. Thanks Matt!
Speech input is certainly a big leap forward! the article is certainly replete with details on this concept! thanks matt !!
The demo doesn’t work on my nexus 5 though 🙁 Not sure why
I’m not sure if `x-webkit-speech` is supported in Chrome for Android yet. Unfortunately I don’t have a nexus 5 to test :/
I know that this doesn’t work on Chrome for iOS at the moment though.
Many mobile operating systems include their own speech input technologies that are more part of the keyboard than the browser. Perhaps this is why Google haven’t seen the need to implement this feature on mobile.
Speech is going to be so important in the future, using it on form inputs in your example can really open up and improve the mobile experience.
I got really excited a few months back by learning about the JavaScript Web Speech API, Ian Devlin has already shown real world examples of controlling video through speech – can you imagine how accessible the web is going to become when users start to navigate around just by talking. Exciting times ahead.
There certainly are. One step closer to a real world J.A.R.V.I.S 🙂
Thanks Matt! I didn’t know this was available yet. I wonder how long before the other browsers catch up?
I’m not sure. I think it’s mainly down to when other browser vendors can develop/license the speech recognition technology. Safari seems like the next contender for this feature after Apple acquired Siri. That said, Microsoft has done a lot of work on speech recognition too.
I guess we’ll just have to wait and see 🙂
Amazing!
I’ll start to use in my projects!
Me encantó tu web. Llegué por casualidad y me ha gustado bastante.