Improving speech capabilities of a multimodal application
    1.
    发明授权
    Improving speech capabilities of a multimodal application 有权
    提高多模式应用程序的语音能力

    公开(公告)号:US08380513B2

    公开(公告)日:2013-02-19

    申请号:US12468166

    申请日:2009-05-19

    IPC分类号: G10L11/00

    摘要: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

    摘要翻译: 改善多模式应用的语音能力,包括由多模式浏览器接收具有元数据容器的媒体文件; 由所述多模式浏览器从所述元数据容器检索与存储在所述媒体文件中的内容相关的语音伪像,以包括在所述多模式浏览器中可用的语音引擎中; 确定语音伪影是否包括语法规则或发音规则; 如果语音工件包括语法规则,则由多模式浏览器修改语音引擎的语法以包括语法规则; 并且如果语音伪影包括发音规则,则由多模式浏览器修改语音引擎的词典以包括发音规则。

    Records Disambiguation In A Multimodal Application Operating On A Multimodal Device
    2.
    发明申请
    Records Disambiguation In A Multimodal Application Operating On A Multimodal Device 有权
    在多模式设备上运行的多模式应用程序中记录消歧

    公开(公告)号:US20090271199A1

    公开(公告)日:2009-10-29

    申请号:US12109167

    申请日:2008-04-24

    IPC分类号: G10L15/00 G10L11/00

    摘要: Methods, apparatus, and products are disclosed for record disambiguation in a multimodal application operating on a multimodal device, the multimodal device supporting multiple modes of interaction including at least a voice mode and a visual mode, that include: prompting, by the multimodal application, a user to identify a particular record among a plurality of records; receiving, by the multimodal application in response to the prompt, a voice utterance from the user; determining, by the multimodal application, that the voice utterance ambiguously identifies more than one of the plurality of records; generating, by the multimodal application, a user interaction to disambiguate the records ambiguously identified by the voice utterance in dependence upon record attributes of the records ambiguously identified by the voice utterance; and selecting, by the multimodal application for further processing, one of the records ambiguously identified by the voice utterance in dependence upon the user interaction.

    摘要翻译: 公开了用于在多模式设备上操作的多模式应用中的记录消歧的方法,装置和产品,所述多模式设备支持包括至少语音模式和视觉模式的多种交互模式,其包括:由多模式应用提示, 用户识别多个记录中的特定记录; 由多模式应用程序响应于该提示,接收来自用户的语音发声; 由所述多模式应用程序确定所述语音发音含糊地识别所述多​​个记录中的多于一个的记录; 由多模式应用程序产生用户交互,以消除由声音话语模糊识别的记录,依赖于由语音话语模糊识别的记录的记录属性; 以及通过多模式应用程序进行进一步处理,根据用户交互,通过语音话语模糊识别的记录之一。

    Dynamically Publishing Directory Information For A Plurality Of Interactive Voice Response Systems
    3.
    发明申请
    Dynamically Publishing Directory Information For A Plurality Of Interactive Voice Response Systems 有权
    动态发布多种交互式语音应答系统的目录信息

    公开(公告)号:US20090268883A1

    公开(公告)日:2009-10-29

    申请号:US12109214

    申请日:2008-04-24

    IPC分类号: H04M1/64

    CPC分类号: H04M3/493

    摘要: Methods, apparatus, and products are disclosed for dynamically publishing directory information for a plurality of interactive voice response (‘IVR’) systems through an IVR directory service that include: providing a description of a web services publication interface for the IVR directory service; receiving, on behalf of one or more IVR systems, web services publication requests through the publication interface; determining, in response to the web services publication requests, directory information for each IVR system requesting publication; adding the directory information for each IVR system to an IVR system directory; generating a voice mode user interface to reflect the directory information for each IVR system added to the IVR system directory; and interacting, using the voice mode user interface, with a caller to identify a particular IVR system in dependence upon the IVR system directory and query information provided by the caller and to connect the caller with the identified IVR system.

    摘要翻译: 公开了用于通过IVR目录服务动态地发布用于多个交互式语音响应(“IVR”)系统的目录信息的方法,装置和产品,其包括:提供用于IVR目录服务的Web服务发布界面的描述; 通过出版界面接收代表一个或多个IVR系统的Web服务发布请求; 响应于所述Web服务发布请求确定请求发布的每个IVR系统的目录信息; 将每个IVR系统的目录信息添加到IVR系统目录; 生成语音模式用户界面,以反映添加到IVR系统目录的每个IVR系统的目录信息; 并且使用语音模式用户界面与呼叫者进行交互,以根据IVR系统目录和由呼叫者提供的查询信息来识别特定的IVR系统,并将呼叫者连接到所识别的IVR系统。

    Records disambiguation in a multimodal application operating on a multimodal device
    4.
    发明授权
    Records disambiguation in a multimodal application operating on a multimodal device 有权
    记录在多模式设备上运行的多模式应用程序中的歧义

    公开(公告)号:US09349367B2

    公开(公告)日:2016-05-24

    申请号:US12109167

    申请日:2008-04-24

    摘要: Methods, apparatus, and products are disclosed for record disambiguation in a multimodal application operating on a multimodal device, the multimodal device supporting multiple modes of interaction including at least a voice mode and a visual mode, that include: prompting, by the multimodal application, a user to identify a particular record among a plurality of records; receiving, by the multimodal application in response to the prompt, a voice utterance from the user; determining, by the multimodal application, that the voice utterance ambiguously identifies more than one of the plurality of records; generating, by the multimodal application, a user interaction to disambiguate the records ambiguously identified by the voice utterance in dependence upon record attributes of the records ambiguously identified by the voice utterance; and selecting, by the multimodal application for further processing, one of the records ambiguously identified by the voice utterance in dependence upon the user interaction.

    摘要翻译: 公开了用于在多模式设备上操作的多模式应用中的记录消歧的方法,装置和产品,所述多模式设备支持包括至少语音模式和视觉模式的多种交互模式,其包括:由多模式应用提示, 用户识别多个记录中的特定记录; 由多模式应用程序响应于该提示,接收来自用户的语音发声; 由所述多模式应用程序确定所述语音发音含糊地识别所述多​​个记录中的多于一个的记录; 由多模式应用程序产生用户交互,以消除由声音话语模糊识别的记录,依赖于由语音话语模糊识别的记录的记录属性; 以及通过多模式应用程序进行进一步处理,根据用户交互,通过语音话语模糊识别的记录之一。

    TESTING A GRAMMAR USED IN SPEECH RECOGNITION FOR RELIABILITY IN A PLURALITY OF OPERATING ENVIRONMENTS HAVING DIFFERENT BACKGROUND NOISE
    5.
    发明申请
    TESTING A GRAMMAR USED IN SPEECH RECOGNITION FOR RELIABILITY IN A PLURALITY OF OPERATING ENVIRONMENTS HAVING DIFFERENT BACKGROUND NOISE 有权
    测试在具有不同背景噪声的多种操作环境中可靠性的语音识别中使用的灰度

    公开(公告)号:US20120053934A1

    公开(公告)日:2012-03-01

    申请号:US13289233

    申请日:2011-11-04

    IPC分类号: G10L15/20

    CPC分类号: G10L15/01

    摘要: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

    摘要翻译: 用于在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性的方法,系统和产品,包括:为所述多个操作环境中的每一个接收记录的背景噪声; 产生语音识别引擎使用语法进行识别的测试语音语音; 将测试语音发音与每个记录的背景噪声混合,导致多个混合测试语音话语,每个混合测试语音话语具有不同的背景噪声; 对于每个混合测试语音话语,使用语法和混合测试语音话语进行语音识别,导致每个混合测试语音话语的语音识别结果; 并且对于每个记录的背景噪声,根据具有记录的背景噪声的混合测试语音话语的语音识别结果,评估语法的语音识别可靠性。

    Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise
    6.
    发明授权
    Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise 有权
    在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性

    公开(公告)号:US08082148B2

    公开(公告)日:2011-12-20

    申请号:US12109204

    申请日:2008-04-24

    IPC分类号: G10L15/20

    CPC分类号: G10L15/01

    摘要: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

    摘要翻译: 用于在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性的方法,系统和产品,包括:为所述多个操作环境中的每一个接收记录的背景噪声; 产生语音识别引擎使用语法进行识别的测试语音语音; 将测试语音发音与每个记录的背景噪声混合,导致多个混合测试语音话语,每个混合测试语音话语具有不同的背景噪声; 对于每个混合测试语音话语,使用语法和混合测试语音话语进行语音识别,导致每个混合测试语音话语的语音识别结果; 并且对于每个记录的背景噪声,根据具有记录的背景噪声的混合测试语音话语的语音识别结果来评估语法的语音识别可靠性。

    Testing A Grammar Used In Speech Recognition For Reliability In A Plurality Of Operating Environments Having Different Background Noise
    7.
    发明申请
    Testing A Grammar Used In Speech Recognition For Reliability In A Plurality Of Operating Environments Having Different Background Noise 有权
    在具有不同背景噪声的多种操作环境中测试用于语音识别中的可用性的语法

    公开(公告)号:US20090271189A1

    公开(公告)日:2009-10-29

    申请号:US12109204

    申请日:2008-04-24

    IPC分类号: G10L15/00

    CPC分类号: G10L15/01

    摘要: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

    摘要翻译: 用于在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性的方法,系统和产品,包括:为所述多个操作环境中的每一个接收记录的背景噪声; 产生语音识别引擎使用语法进行识别的测试语音语音; 将测试语音发音与每个记录的背景噪声混合,导致多个混合测试语音话语,每个混合测试语音话语具有不同的背景噪声; 对于每个混合测试语音话语,使用语法和混合测试语音话语进行语音识别,导致每个混合测试语音话语的语音识别结果; 并且对于每个记录的背景噪声,根据具有记录的背景噪声的混合测试语音话语的语音识别结果,评估语法的语音识别可靠性。

    Adjusting A Speech Engine For A Mobile Computing Device Based On Background Noise
    8.
    发明申请
    Adjusting A Speech Engine For A Mobile Computing Device Based On Background Noise 有权
    基于背景噪声调整移动计算设备的语音引擎

    公开(公告)号:US20090271188A1

    公开(公告)日:2009-10-29

    申请号:US12109151

    申请日:2008-04-24

    IPC分类号: G10L15/00

    CPC分类号: G10L21/0208 G10L15/20

    摘要: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.

    摘要翻译: 公开了用于基于背景噪声调整用于移动计算设备的语音引擎的方法,装置和产品,该移动计算设备可操作地耦合到麦克风,其包括:通过麦克风对多个操作环境的背景噪声进行采样 其中移动计算设备运行; 根据所述操作环境的采样背景噪声,为每个操作环境产生噪声模型; 以及为移动计算设备当前操作的操作环境的噪声模型配置移动计算设备的语音引擎。

    Adjusting a speech engine for a mobile computing device based on background noise
    9.
    发明授权
    Adjusting a speech engine for a mobile computing device based on background noise 有权
    基于背景噪声调整移动计算设备的语音引擎

    公开(公告)号:US09076454B2

    公开(公告)日:2015-07-07

    申请号:US13358097

    申请日:2012-01-25

    IPC分类号: G10L15/20 G10L21/0208

    CPC分类号: G10L21/0208 G10L15/20

    摘要: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.

    摘要翻译: 公开了用于基于背景噪声调整用于移动计算设备的语音引擎的方法,装置和产品,该移动计算设备可操作地耦合到麦克风,其包括:通过麦克风对多个操作环境的背景噪声进行采样 其中移动计算设备运行; 根据所述操作环境的采样背景噪声,为每个操作环境产生噪声模型; 以及为移动计算设备当前操作的操作环境的噪声模型配置移动计算设备的语音引擎。

    Speech enabled media sharing in a multimodal application
    10.
    发明授权
    Speech enabled media sharing in a multimodal application 有权
    在多模式应用程序中启用语音启用媒体共享

    公开(公告)号:US08510117B2

    公开(公告)日:2013-08-13

    申请号:US12500029

    申请日:2009-07-09

    IPC分类号: G10L21/00

    摘要: Speech enabled media sharing in a multimodal application including parsing, by a multimodal browser, one or more markup documents of a multimodal application; identifying, by the multimodal browser, in the one or more markup documents a web resource for display in the multimodal browser; loading, by the multimodal browser, a web resource sharing grammar that includes keywords for modes of resource sharing and keywords for targets for receipt of web resources; receiving, by the multimodal browser, an utterance matching a keyword for the web resource, a keyword for a mode of resource sharing and a keyword for a target for receipt of the web resource in the web resource sharing grammar thereby identifying the web resource, a mode of resource sharing, and a target for receipt of the web resource; and sending, by the multimodal browser, the web resource to the identified target for the web resource using the identified mode of resource sharing.

    摘要翻译: 在多模式应用程序中启用语音启用媒体共享,包括通过多模式浏览器解析多模式应用程序的一个或多个标记文档; 由多模式浏览器在一个或多个标记文档中识别用于在多模式浏览器中显示的网络资源; 由多模式浏览器加载包括资源共享模式的关键字和用于接收网络资源的目标的关键字的网络资源共享语法; 通过多模式浏览器接收与web资源匹配的关键词,用于资源共享模式的关键字和用于在web资源共享语法中接收web资源的目标的关键字,从而识别web资源, 资源共享模式,以及Web资源接收目标; 以及使用所识别的资源共享模式,将多个模式浏览器将web资源发送到所识别的web资源的目标。