
VoiceXML Chairman Ken Rehor explains how speech-enabled technology has radically transformed the customer experience in financial institutions.
Speech-enabled technology has radically transformed the customer experience in financial institutions across the globe. FST was privileged to catch up with VoiceXML Chairman Ken Rehor, a pioneer in the development of web-based telephony and one of the original authors of the VoiceXML specification, to find out more about the technology behind automated voice systems and what’s next in this field for FSIs.
Developed initially by the VoiceXML Forum but submitted to, and since developed by, the W3C, VoiceXML is the standard mark-up language for building voice-user interface applications. Rehor likens it to the way that html is a standard for building visual web applications, explaining: “VoiceXML enables a high level of voice applications to be built using standard web technology, meaning that users such as those in the financial services arena can build a phone interface to their applications using the same back-end infrastructure as they would do to build their online systems.”
As an open standards technology, VoiceXML overcomes some of the difficulties experienced by the past, where telephone applications were built using every vendors’ own proprietary technology. “There was no vendor interoperability and so the situation was very complex,” says Rehor. “As each vendor’s technology was characterized by all manner of obscure things, only those with specialised knowledge and training would know how to use them.”
Rehor therefore recommends that when specifying a VoiceXML solution decision-makers ensure that the platforms have been certified by the VoiceXML Forum. “VoiceXML is the industry standard for these kinds of applications and every major player has adopted it. As an open standards technology it is well understood and supports an ecosystem of platform vendors, application developers and pre-packaged application providers. The buyer knows they are not getting tied into a particular vendor’s proprietary technology, which gives them tremendous opportunity to find the very best application for their product or solution.”
VoiceXML itself was a technology based on years of research by various companies, including AT&T, Motorola, Lucent and IBM. The VoiceXML Forum was established in 1999 to come up with a standard language, which was called VoiceXML. “This language was developed as an industry proposal, published in 2000 and submitted it to the W3C, which continues to develop both VoiceXML and related technology,” explains Rehor, adding that the Forum itself has since taken on a more educational role. “We carry out educational and marketing activities for the member’s companies, explaining how the technology can be used in different markets. We also have a couple of certification programs, one as part of an education program to help developers get certified in using VoiceXML. As part of our technical working group, we also have a platform certification program, which tests vendor’s VoiceXML platforms to ensure that they meet technical specifications.”
Speech-enabled technologies are currently being employed in voice-driven, self-service and customer service applications across a wide range of industries, not least financial services. The principal application for financial institutions is banking by phone, whereby customers can obtain an account balance, arrange transfers of money, get account information, etc., through an automated or semi-automated system. And, Rehor adds, the technology has also made an impact in online stock trading: “In fact, one of the first phone-based speech applications to be used in the financial services industry was Charles Schwab, which used it to enable automated trade transactions and orders. It has potential in any area where there is a desire to speed up the user’s experience and their interaction with the organization.”
He also points out that this doesn’t necessarily mean cutting out personal contact altogether. “What we’re seeing is that in some cases applications are becoming fully automated, while in others organizations simply want to collect information from the caller so they can rout them to the most appropriate agent,” says Rehor. “For example, if a customer has a question about their checking account, once the system has taken their account details it can put them directly through to the right agent, avoiding potential multiple transfers within the telephone system.” According to Rehor, adoption is accelerating in both areas as companies aim to reduce costs and improve service.
VoiceXML and speech technology have been deployed extensively in the US and worldwide in the financial services industry for some time now and it seems that many of barriers to deployment that may once have have successfully been deconstructed. Rehor explains: “It is more complicated to build a speech application than it is to build a touchtone application but, over time, the technology has improved and any reluctance on the side of the customer to dealing with an automated system has subsided as they have found them easy to use and very effective.” He adds that a major barrier in the past to adoption by FSIs was the fear of being locked in to a vendor’s proprietary technologies, something that VoiceXML has successfully eliminated.
When it comes to VoiceXML itself, says Rehor, it’s really just a question of having the appropriate infrastructure. “I can’t imagine any financial institution not using web technologies as part of their core infrastructure today. The technology is there, it’s proven and, without meaning to make it sound overly simple, it’s really just a question of getting the core infrastructure in place. Once open standards have been adopted, there really are no barriers to adopting VoiceXML.”
Of course, while the technology is well established and having a huge impact on industries worldwide, there is always room for improvement. Speaking with Rehor, it seems that much is going on to improve both the quality and reliability of the technology. “When it comes to quality, one of the challenges we face is getting the technology to work well with a variety of different accents and languages. However, this has improved greatly and there are now systems available in multiple languages.” What’s more of a problem now, he explains, is tackling often unrealistic customer expectations. “It’s important to understand that these systems are not going to provide the futuristic full dialogue that some people may have expected based on the science fiction movies. While they work very well when asking for a particular piece of information (numbers, airport names, etc.,) they are not so effective if the user speaks more randomly.” This again is something that is becoming less of a problem as more and more people get accustomed to using these systems and find them to work very well.
Security is another area that is receiving focus as a potentially important growth area for VoiceXML. “Something we are addressing right now at the VoiceXML Forum and that we expect to include in the next version of VoiceXML (version 3) is speaker biometrics capability – speaker verification and identification.” The idea behind this is that, once enrolled, the user can be identified by their voice rather than by simply entering a four or five digit pin, which could be potentially accessed by a third party. “At the VoiceXML Forum, we see huge potential for use of biometrics, particularly in the financial services industry. With the threat of ID theft, people want reassurance that their information is safe.”
In addition to biometrics, there are a number of other exciting developments in the pipeline, first among them being the development of a ‘sister language’ to VoiceXML, called CCXML – (Call Control eXtensible Markup Language). As Rehor explains, this allows customers to build more advanced routing applications and is particularly applicable to contact center applications. “CCXML has become the standard technology through which contact centers can collect information from the caller, route the call to the appropriate agent, then perhaps add on a supervisor or move to a different call centre or back to an automated system. It is an important standard that compliments VoiceXML.”
Another area in which Rehor predicts growing demand in the financial services arena is video, coinciding with the growing deployment of VoIP phones and wireless 3G. Rehor explains the possible advantage of what the Forum is informally calling ‘Video VoiceXML’: “With video, it would be possible for the customer to get a response to their request in the form of a video, perhaps including graphs and complex information that would be harder to grasp by audio alone.”
Related to this, is the possibility for systems to become fully multimodal, combining manual, voice and video functions to provide the optimal customer experience. “With the growth of VoIP and increasing use of screen-based and wireless applications, I see multimodal applications as very important growth opportunities in the future,” agrees Rehor. “While VoiceXML technology is very well deployed all over the world, is stable and has a long history it also flows well into these ‘futuristic’ functions such as multi-modal.” With customer experience and self-service being such a core focus for financial institutions today, it is clear that VoiceXML and related technology represent an ever greater opportunity for banks to differentiate and compete in the future.
Ken Rehor
Chairman of the VoiceXML Forum and Chief Architect at Vocalocity. Ken Rehor was named one of the industry’s 20 most influential people by Speech Technology Magazine. Rehor was a principal founder of the VoiceXML Forum and one of the original authors of the VoiceXML specification. He currently serves as Chair of the VoiceXML Forum's Conformance Committee, and is co-editor of the VoiceXML 2.0 standard, VoiceXML 2.1, and CCXML 1.0. Previously, Rehor was a member of the Bell Labs Research team at Lucent where he co-developed the first web-based telephony platform, PhoneWeb.