Comcast has built a dedicated voice team within its digital home engineering organization as the company seeks out ways to accelerate its use of voice commands across multiple service domains, including its X1 TV service, Xfinity Home offering and beyond.
Update: The lead and headline have been adjusted to clarify that that the new dedicated team referenced is within Comcast’s digital home engineering organization that’s focused on extending voice commands to home functionality, rather than a dedicated team focused on Comcast’s broader voice-related activities.
"The voice team works closely with the Xfinity Home apps team, ensuring that the new features and tools are always considered within the context of voice control,” Comcast’s Bryan Kissinger explained in a blog post that was co-authored by Shiv Dhondiyal.
The new voice team built Comcast’s Voice Action Processing Service (VAPS), which processes commands that come way of the company’s home-grown Natural Language Processing agent and determines which other services and APIs to call. One example is launching an app for viewing a camera feed on the Xfinity Home service.
More details about the technology underpinnings of Comcast’s voice commands platform come as the MSO expands those capabilities to additional domains such as home automation and security.
That work started with a “guard word” for Xfinity Home commands – for example, uttering “Xfinity Home, disarm” to disarm the home security system to ensure that the voice control system didn’t seek a movie or song called “disarm.”
“Adding a guard word opened a lot of possibilities, but we still didn’t have a way to support more complex functions,” Kissinger explained.
Working with Comcast’s Applied Artificial Intelligence team, Comcast is now developing a way to support more natural commands that work across multiple service domains. Saying, “Xfinity Home, I’m hot,” for example will trigger the system to adjust the home’s connected thermostat by a couple of degrees.
Comcast’s expansion of connected home voice capabilities included adding direct commands for lights and thermostats using Xfinity Home’s ZigBee radio.
In a recent interview, Comcast execs discussed how the company is meeting the scale challenge using an in-house deep learning platform that matches up with an integrated metadata system.
Not only does the system need to know and understand an increasingly broader scope of specific and conversational-style commands, it also needs to grasp the intent of the voice command. Is it a search for a TV program or movie, or is the user telling the smart home system to turn off a light or adjust the thermostat?
RELATED: Comcast Team Developing Smarter Search
Early on, Comcast used a more traditional, pattern-based algorithm that relied on manual tuning, but later realized that machine learning would be required to train that algorithm to maintain a high level of accuracy while keeping pace with requirements as the scope and complexity of the system continued to expand.
“We saw that the machine could learn how to accurately find intent in a better way than it did prior to that,” Jeanine Heck, executive director of AI product at Comcast Cable, said, adding that Comcast’s dependence on deep learning to process natural language has only become more pronounced over time. “Our machines can learn how to adapt to those [new domains].”
Added Jonathan Palmatier, Comcast Cable’s vice president of product management, voice control: “You’d need an army of people that are trying to capture every possible way that you can construct a phrase, and [that’s where] it starts to get untenable.”
In addition to expanding voice support to multiple Xfinity services, Comcast has also been working on how voice commands can support customer support functions.
Comcast's Voice Platform Leans on Distributed, Cloud-Based Platform
Comcast launched the X1 voice remote in May 2015, and the numbers suggest that customers have embraced it.
Per Comcast’s figures for year-end 2017, the operator has deployed nearly 20 million voice remotes so far, and customers uttered more than 6 billion voice commands in 2017, roughing out to about 500 million per month.
Comcast backs its voice system with a distributed architecture (via a handful of data centers around the country) that aims to deliver low latency (the time it takes for a command to be uttered and executed) while also giving the system a level of redundancy.
If there’s an issue at one of those data centers, another can be used as a backup and balance the load. That holds true for other aspects of X1, as every button press is a “cloud call.”
The lifecycle of a voice command starts with the remote, which is equipped with a microphone button for the push-to-talk mechanism. Once a command is uttered, it is streamed from the remote to the X1 set-top box using the RF4CE protocol.
It’s then packaged up and sent to a distributed server in Comcast’s cloud infrastructure. That cloud-based system then translated the audio file into text and sends it to Comcast’s natural language processing system, which determines what the user was intending with the original voice command.
Once that is broken down, the system selects the most probable result and executes it and displays that result on the TV screen.