摘要:
Enabling grammars in web page frames, including receiving, in a multimodal application on a multimodal device, a frameset document, where the frameset document includes markup defining web page frames; obtaining by the multimodal application content documents for display in each of the web page frames, where the content documents include navigable markup elements; generating by the multimodal application, for each navigable markup element in each content document, a segment of markup defining a speech recognition grammar, including inserting in each such grammar markup identifying content to be displayed when words in the grammar are matched and markup identifying a frame where the content is to be displayed; and enabling by the multimodal application all the generated grammars for speech recognition.
摘要:
A computer-implemented method and system are provided for filling a graphic-based form field in response to a speech utterance. The computer-implemented method includes generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The method further includes creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the auto-fill event causing the filling of the form field with data corresponding to the user profile. The system includes a grammar-generating module for generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The system also includes an event module for creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the event causing the filling of the form field with data corresponding to the user profile.
摘要:
Speech-enabled content navigation and control of a distributed multimodal browser is disclosed, the browser providing an execution environment for a multimodal application, the browser including a graphical user agent (‘GUA’) and a voice user agent (‘VUA’), the GUA operating on a multimodal device, the VUA operating on a voice server, that includes: transmitting, by the GUA, a link message to the VUA, the link message specifying voice commands that control the browser and an event corresponding to each voice command; receiving, by the GUA, a voice utterance from a user, the voice utterance specifying a particular voice command; transmitting, by the GUA, the voice utterance to the VUA for speech recognition by the VUA; receiving, by the GUA, an event message from the VUA, the event message specifying a particular event corresponding to the particular voice command; and controlling, by the GUA, the browser in dependence upon the particular event.
摘要:
Ordering recognition results produced by an automatic speech recognition (‘ASR’) engine for a multimodal application implemented with a grammar of the multimodal application in the ASR engine, with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to the ASR engine through a VoiceXML interpreter, includes: receiving, in the VoiceXML interpreter from the multimodal application, a voice utterance; determining, by the VoiceXML interpreter using the ASR engine, a plurality of recognition results in dependence upon the voice utterance and the grammar; determining, by the VoiceXML interpreter according to semantic interpretation scripts of the grammar, a weight for each recognition result; and sorting, by the VoiceXML interpreter, the plurality of recognition results in dependence upon the weight for each recognition result.
摘要:
A computer-implemented method and system are provided for filling a graphic-based form field in response to a speech utterance. The computer-implemented method includes generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The method further includes creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the auto-fill event causing the filling of the form field with data corresponding to the user profile. The system includes a grammar-generating module for generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The system also includes an event module for creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the event causing the filling of the form field with data corresponding to the user profile.
摘要:
Methods, apparatus, and products are disclosed for altering behavior of a multimodal application based on location. The multimodal application operates on a multimodal device supporting multiple modes of user interaction with the multimodal application, including a voice mode and one or more non-voice modes. The voice mode of user interaction with the multimodal application is supported by a voice interpreter. Altering behavior of a multimodal application based on location includes: receiving a location change notification in the voice interpreter from a device location manager, the device location manager operatively coupled to a position detection component of the multimodal device, the location change notification specifying a current location of the multimodal device; updating, by the voice interpreter, location-based environment parameters for the voice interpreter in dependence upon the current location of the multimodal device; and interpreting, by the voice interpreter, the multimodal application in dependence upon the location-based environment parameters.
摘要:
Methods, apparatus, and computer program products are described for invoking tapered prompts in a multimodal application implemented with a multimodal browser and a multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes. Embodiments include identifying, by a multimodal browser, a prompt element in a multimodal application; identifying, by the multimodal browser, one or more attributes associated with the prompt element; and playing a speech prompt according to the one or more attributes associated with the prompt element.
摘要:
A method for prompting user input for a multimodal interface including the steps of providing a multimodal interface to a user, where the interface includes a visual interface having a plurality of input regions, each having at least one input field; selecting an input region and processing a multi-token speech input provided by the user, where the processed speech input includes at least one value for at least one input field of the selected input region; and storing at least one value in at least one input field.
摘要:
A method for prompting user input for a multimodal interface including the steps of providing a multimodal interface to a user, where the interface includes a visual interface having a plurality of input regions, each having at least one input field; selecting an input region and processing a multi-token speech input provided by the user, where the processed speech input includes at least one value for at least one input field of the selected input region; and storing at least one value in at least one input field.
摘要:
Methods, apparatus, and computer program products are described for invoking tapered prompts in a multimodal application implemented with a multimodal browser and a multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes. Embodiments include identifying, by a multimodal browser, a prompt element in a multimodal application; identifying, by the multimodal browser, one or more attributes associated with the prompt element; and playing a speech prompt according to the one or more attributes associated with the prompt element.