Amalgamating multimedia transcripts for closed captioning from a plurality of text to speech conversions

发明授权

US09332319B2 Amalgamating multimedia transcripts for closed captioning from a plurality of text to speech conversions 有权

标题翻译：从多个文本到语音转换合并用于隐藏字幕的多媒体抄本

请登陆查看更多内容

专利标题： Amalgamating multimedia transcripts for closed captioning from a plurality of text to speech conversions
专利标题（中）： 从多个文本到语音转换合并用于隐藏字幕的多媒体抄本
申请号： US12890744

申请日： 2010-09-27
公开(公告)号： US09332319B2

公开(公告)日： 2016-05-03
发明人: Johney Tsai , Matthew Miller , David Strong
申请人： Johney Tsai , Matthew Miller , David Strong
申请人地址： US PA Blue Bell
专利权人： Unisys Corporation
当前专利权人： Unisys Corporation
当前专利权人地址： US PA Blue Bell
代理商 Richard J. Gregson
主分类号： G10L15/32
IPC分类号： G10L15/32 ; G10L15/183 ; G10L15/26 ; H04N21/488

Amalgamating multimedia transcripts for closed captioning from a plurality of text to speech conversions

摘要：

Methods and systems for converting speech to text are disclosed. One method includes analyzing multimedia content to determine the presence of closed captioning data. The method includes, upon detecting closed captioning data, indexing the closed captioning data as associated with the multimedia content. The method also includes, upon failure to detect closed captioning data in the multimedia content, extracting audio data from multimedia content, the audio data including speech data, performing a plurality of speech to text conversions on the speech data to create a plurality of transcripts of the speech data, selecting text from one or more of the plurality of transcripts to form an amalgamated transcript, and indexing the amalgamated transcript as associated with the multimedia content.

摘要（中）：

公开了将语音转换为文本的方法和系统。一种方法包括分析多媒体内容以确定隐藏字幕数据的存在。该方法包括在检测到隐藏字幕数据时，将与多媒体内容相关联的隐藏字幕数据进行索引。该方法还包括：在未能检测多媒体内容中的隐藏字幕数据的情况下，从多媒体内容中提取音频数据，音频数据包括语音数据，执行多个语音以对语音数据进行文本转换，以创建多个所述语音数据，从所述多个抄本中的一个或多个文本中选择文本以形成合并的抄本，并且将与所述多媒体内容相关联的所述合并抄本进行索引。

公开/授权文献

US20120078626A1 SYSTEMS AND METHODS FOR CONVERTING SPEECH IN MULTIMEDIA CONTENT TO TEXT 公开/授权日：2012-03-29

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/28	.语音识别系统的结构细节
G10L15/32	..以顺序或并行使用的多个识别器；相应的记分组合系统，例如投票系统