 |
Voice recognition on the Xbox 360 has been done through Kinect's built-in microphones, and uses the system's audio processing to cancel out noise from games and applications. (Credit: Microsoft)
|
Most of these systems revolve around finding out what users are saying, then feeding that back into the cloud. Though in some cases, those commands can be simple enough to not need to phone home. For instance, saying something like "play (song name)" or "call mom" can be processed locally, but if you're saying something that goes outside of that short list of commands, it will ping Microsoft for the answer.
The idea behind CU is to take all this one big step further by hooking into buckets of data--be it third-party sites or private data feeds to add context to user queries and figure out what the user was trying to do. To that end, it's not all just about search.
"[For] the application of conversational understanding, certainly search is one, but it's much, much broader," said Ilya Bukshteyn, Microsoft's senior director of marketing for TellMe, the voice company Microsoft
bought in 2007, and later
folded into its speech group. "Understanding
intent on search is going to be key to actually helping you complete your task instead of just finding data," he said.
Bukshteyn detailed a system where Microsoft will be able to take something like helping plan dinner for two people, and break it down into a query that uses data from various places such as calendars, restaurant ratings, and location.
"All of that data is actually available in different places," Bukshteyn said. "So having an engine and a service that can look in all those places--looking around your calendar, your past history, places you have in common that you may have been to, and then can assist you by giving you a few places to choose from, and then finalize that reservation we think is going to be of tremendous value."
The secret, of course, is getting that process started by telling your phone you simply want to go out to dinner that evening. "This is effectively where Microsoft's speech tools are headed," Serafin said.
Echoing comments about Microsoft's goal to get Bing to be able to consolidate multistep tasks into one action, made
last month by Yusuf Mehdi, Microsoft's senior vice president of Online Audience Business, Serafin outlined a system that would make the number of apps users have installed on their phone, as well as the need to use them all, less critical.
"This area where you're actually able to complete tasks that may have taken you multiple keystrokes, may have taken you multiple apps...In this world of understanding, you actually get into an environment where you can assist the user in what they'd like to get done," he said.
As for when all this is coming, Serafin wouldn't say. "There's implementation that we're building on this basis, and you'll see more forthcoming on it," he said. "What we're highlighting is the strategy behind it, and how it actually makes use of what we've built up until this point."