Accepting Voice Input in HTML5 Forms

Matt West

11 years ago

Getting Start with the Speech Synthesis API

The way that we interact with computers has changed dramatically over the past decade. Touch-screen devices and laptop trackpads have enabled a much more intuitive form of interaction than is achievable using a traditional mouse. These changes haven’t been limited to just hardware. Gestures, predictive text, and speech recognition are all examples of software innovations that have improved the way in which we interact with our devices.

Speech recognition has somewhat eluded innovators for decades. Many organizations have tried (with varying levels of success) to create reliable speech recognition technologies. There is one company however that looks to have cracked the problem – Google.

In this post you’ll be learning how to take advantage of Google’s speech recognition technologies to enhance your web forms. You’ll learn how to give Chrome users the ability to fill-in text fields using speech, and how to detect support for this new voice input capability in browsers.

Let’s get started.

Contents

1 How to Add Voice Input in HTML
- 1.1 A Caveat About Input Types
2 Detecting Browser Support for Speech Recognition
3 How Speech Recognition Works
4 Summary
5 Further Reading & Links

How to Add Voice Input in HTML

Enabling support for speech input is as simple as adding an attribute to your <input> elements. The x-webkit-speech attribute will indicate to the browser that the user should be given the option to complete this form field using speech input.

<input type="text" x-webkit-speech>

When speech input is enabled the element will have a small microphone icon displayed on the right of the input. Clicking on this icon will launch a small tooltip to show that your voice is now being recorded. You can also start speech input by focussing the element and pressing Ctrl + Shift + . on Windows, or Command + Shift + . on Mac.

Speech Input in Chrome

See the Demo View on CodePen

In JavaScript, you can test to see if an element has speech input enabled by examining it’s webkitSpeech property. This is a boolean property and will therefore be set to true or false. You can override this property to enable or disable speech input on an element.

// Enable
element.webkitSpeech = true;

// Disable
element.webkitSpeech = false;

A Caveat About Input Types

Speech input is not available for all the different HTML5 input types. In my testing I found that the text, number, and tel types do support speech input whereas the email, url, date, and month input types don’t.

If you apply the x-webkit-speech attribute to an <input> element with an unsupported input type, the webkitSpeech property on that element will still be set to true. You therefore cannot rely on this property to tell if the browser is displaying the speech input controls, only that the browser supports speech input in general.

Detecting Browser Support for Speech Recognition

A simple way of checking if the user’s browser supports speech input is to look for the webkitSpeech property on an <input> element. An example of how to do this is shown below.

if (document.createElement('input').webkitSpeech === undefined) {
    // Not supported
} else {
    // Supported!
}

Google Chrome is the only browser that currently supports speech input. We’ll examine the reasons for this in the next section.

How Speech Recognition Works

Speech-to-Text with a Web Service

The browser relies on an external service to handle speech-to-text conversion. The recording of your voice is sent to this service which then analyses the audio and constructs a textual representation. The text is then sent back to the browser which populates the <input> element to complete the process. Many speech-to-text services incorporate machine-learning algorithms that allow them to get more accurate over time.

Note: A side effect of using an external service to handle speech-to-text is that you will need an internet connection for speech input to work. This is something to keep in mind if you plan for your web application to work offline.

The Chrome browser relies on Google’s proprietary speech recognition technology to provide the functionality behind x-webkit-speech. Google has had a team working on speech recognition and natural language processing for a long time. It’s this team that’s been responsible for developing the complex systems needed to provide a reliable speech-to-text service for products like Google Translate and Voice Search.

Note: If you’re interested in learning more about how speech-to-text works check out the research papers published by Google engineers.

Developing speech-to-text services is incredibly difficult and requires a significant amount of investment. This is probably the main reason why no other browser vendor has implemented speech recognition yet. However, now that Apple has acquired Siri, I’m interested to see if speech recognition will make its way into Safari some time soon.

Summary

In this post you’ve learned about the x-webkit-speech attribute and how it can be used to add speech input capabilities to your web forms. There is also a more advanced Web Speech API that we haven’t covered in this post. This API allows developers to add speech recognition functionality to more aspects of their applications, and even synthesize speech from text.

Whether it’s in the computer on your desk, or the phone in your pocket, software innovations like Google Voice Search and Siri are paving the way for a revolution in how we interact with computers. Welcome to the future my friends, now if only someone could figure out the whole teleportation thing.

How to Add Voice Input in HTML

A Caveat About Input Types

Detecting Browser Support for Speech Recognition

How Speech Recognition Works

Summary

Further Reading & Links