-
公开(公告)号:US20240087591A1
公开(公告)日:2024-03-14
申请号:US18451736
申请日:2023-08-17
Applicant: SomniQ, Inc.
Inventor: Rikko Sakaguchi , Hidenori Ishikawa
Abstract: Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.
-
2.
公开(公告)号:US20230317097A1
公开(公告)日:2023-10-05
申请号:US18142165
申请日:2023-05-02
Applicant: Distributed Creation Inc.
Inventor: Alejandro KORETZKY , Naveen Sasalu RAJASHEKHARAPPA
CPC classification number: G10L25/54 , G06F16/65 , G06F3/165 , G06N3/08 , G10L21/12 , G10L21/14 , G10L25/30 , G06F18/214
Abstract: A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
-
公开(公告)号:US11735204B2
公开(公告)日:2023-08-22
申请号:US17404873
申请日:2021-08-17
Applicant: SomniQ, Inc.
Inventor: Rikko Sakaguchi , Hidenori Ishikawa
Abstract: Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.
-
公开(公告)号:US20180046433A1
公开(公告)日:2018-02-15
申请号:US15790442
申请日:2017-10-23
Applicant: Patient Prism LLC
Inventor: Michael G. Spiessbach , Amol Nirgudkar
IPC: G06F3/16 , G06F17/22 , H04M3/51 , G06F17/24 , H04L29/08 , G06F3/0481 , G10L15/08 , G06F3/0482 , H04M3/42
CPC classification number: G06F3/165 , G06F3/04812 , G06F3/0482 , G06F17/2235 , G06F17/241 , G06Q30/0201 , G10L15/08 , G10L21/12 , G10L2015/088 , H04L67/02 , H04L67/146 , H04L67/22 , H04M3/42221 , H04M3/5175 , H04M2203/303 , H04M2203/305 , H04M2203/403
Abstract: Merchant/consumer calls may be recorded and evaluated according to a variety of criteria. The call recordings and analyses thereof, as well as consumer tracking information, may be displayed in a user interface of a web-based online portal for convenience in evaluating the use and efficacy of marketing channels as well as the quality of merchant/consumer interactions. In an aspect, the user interface provides a representation of a variety of telephone calls as an interactive keyword cloud that presents business-value-specific keywords targeted for detection during such telephone calls. The keyword cloud may depict keywords in a range of colors, sizes, and relative positioning to connote varied degrees of significance, such as a relative rate of occurrence of keywords in the represented telephone calls. Each keyword in the keyword cloud may contain a hyperlink to related content such as a listing of telephone calls containing the keyword.
-
公开(公告)号:US09826090B2
公开(公告)日:2017-11-21
申请号:US15377428
申请日:2016-12-13
Applicant: Patient Prism LLC
Inventor: Michael G. Spiessbach , Amol Nirgudkar
CPC classification number: G06F3/165 , G06F3/04812 , G06F3/0482 , G06F17/2235 , G06F17/241 , G06Q30/0201 , G10L15/08 , G10L21/12 , G10L2015/088 , H04L67/02 , H04L67/146 , H04L67/22 , H04M3/42221 , H04M3/5175 , H04M2203/303 , H04M2203/305 , H04M2203/403
Abstract: Merchant/consumer calls may be recorded and evaluated according to a variety of criteria. The call recordings and analyses thereof, as well as consumer tracking information, may be displayed in a user interface of a web-based online portal for convenience in evaluating the use and efficacy of marketing channels as well as the quality of merchant/consumer interactions. In an aspect, the user interface provides call visualization in the form of audio data from a telephone call displayed as a waveform on a call timeline. The call may be (automatically or manually) annotated with various business-value-specific keywords spoken during the telephone call, and markers for these keywords can be presented on the call timeline to visually indicate the keyword and the time during the call when the keyword was spoken. A business value for the call may be determined based at least in part on keywords spoken during the call.
-
公开(公告)号:US09666208B1
公开(公告)日:2017-05-30
申请号:US14968876
申请日:2015-12-14
Applicant: Adobe Systems Incorporated
Inventor: Michael Rubin , James A. Moorer
CPC classification number: G10L21/12 , G06F3/0481 , G06F3/167 , G06F17/211 , G06T11/60 , G06T2200/24 , G10L15/04 , G10L15/26 , G10L21/10 , G10L25/87
Abstract: The present disclosure includes a hybrid waveform system that displays a hybrid waveform to a user. In general, the hybrid waveform system provides a hybrid waveform to a user that uses converted readable text and waveforms to represent an audio segment. By providing a user with a hybrid waveform, the hybrid waveform system offers users with a number of benefits, such as providing an audio display that enables a user to quickly ascertain context information and audio information typically missing from audio transcriptions.
-
7.
公开(公告)号:US20160277577A1
公开(公告)日:2016-09-22
申请号:US15076572
申请日:2016-03-21
Applicant: TopBox, LLC
Inventor: Jeffrey Stephen Yentis , Christopher Lee Tranquill , Brian Keith Timmons , Ryan Andrew Studer , Micheal Dean Dobson
IPC: H04M3/51 , G10L21/12 , G06F3/16 , G06F17/24 , G06F3/0482 , G06F3/0484 , G06F3/0481 , H04M3/42 , G06F17/30
CPC classification number: G10L21/12 , G06F3/04817 , G06F3/0482 , G06F3/04842 , G06F3/165 , G06F16/3326 , G06F16/685 , G06F17/24 , G06F17/241 , G06Q10/063 , H04M3/42221 , H04M3/5175
Abstract: An interaction management system receives audio files of interactions between customers and customer service agents and client provided metadata from a client. The interaction management system provides an interface for creating enhanced metadata based on the received audio file and client provided metadata using a capture interface. The capture interface allows a user to label the audio file with event labels and sentiment labels at particular time stamps in the audio file. The interaction management system saves the captured metadata in an interaction file associated with the client provided audio file to be presented back to the user as a visual sequential representation of the captured data.
Abstract translation: 交互管理系统从客户端接收客户和客户服务代理之间的交互的音频文件以及客户端提供的元数据。 交互管理系统提供用于使用捕获接口基于所接收的音频文件和客户端提供的元数据创建增强的元数据的接口。 捕获接口允许用户在音频文件中的特定时间戳上标记具有事件标签和情绪标签的音频文件。 交互管理系统将捕捉到的元数据保存在与客户端提供的音频文件相关联的交互文件中,作为被捕获数据的可视化顺序表示呈现给用户。
-
公开(公告)号:US20160217807A1
公开(公告)日:2016-07-28
申请号:US15082959
申请日:2016-03-28
Applicant: Securus Technologies, Inc.
Inventor: Jay Loring Gainsboro , Lee Davis Weinstein
CPC classification number: H04M3/568 , G10L15/1807 , G10L17/00 , G10L21/12 , G10L25/63 , H04M3/2281 , H04M3/42221 , H04M2201/41
Abstract: A multi-party conversation analyzer and logger uses a variety of techniques including spectrographic voice analysis, absolute loudness measurements, directional microphones, and telephonic directional separation to determine the number of parties who take part in a conversation, and segment the conversation by speaking party. In one aspect, the invention monitors telephone conversations in real time to detect conditions of interest (for instance, calls to non-allowed parties or calls of a prohibited nature from prison inmates). In another aspect, automated prosody measurement algorithms are used in conjunction with speaker segmentation to extract emotional content of the speech of participants within a particular conversation, and speaker interactions and emotions are displayed in graphical form. A conversation database is generated which contains conversation recordings, and derived data such as transcription text, derived emotions, alert conditions, and correctness probabilities associated with derived data. Investigative tools allow flexible queries of the conversation database.
Abstract translation: 多方会话分析器和记录器使用各种技术,包括光谱语音分析,绝对响度测量,定向麦克风和电话方向分离,以确定参与对话的各方数量,并通过会话分割对话。 一方面,本发明实时监控电话对话以检测感兴趣的情况(例如,对不允许的当事人的呼叫或被监禁的囚犯的被禁止的呼叫)。 在另一方面,自动化韵律测量算法与说话者分割结合使用以提取特定对话内的参与者的言语的情感内容,并且以图形形式显示说话人的交互和情绪。 生成会话数据库,其中包含会话记录,以及衍生数据,如转录文本,派生情绪,警报条件以及与派生数据相关联的正确性概率。 调查工具允许会话数据库的灵活查询。
-
公开(公告)号:US20070071206A1
公开(公告)日:2007-03-29
申请号:US11475541
申请日:2006-06-26
Applicant: Jay Gainsboro , Lee Weinstein
Inventor: Jay Gainsboro , Lee Weinstein
CPC classification number: G10L25/63 , G10L15/1807 , G10L17/00 , G10L21/12 , H04M3/2281 , H04M2201/41
Abstract: A multi-party conversation analyzer and logger uses a variety of techniques including spectrographic voice analysis, absolute loudness measurements, directional microphones, and telephonic directional separation to determine the number of parties who take part in a conversation, and segment the conversation by speaking party. In one aspect, the invention monitors telephone conversations in real time to detect conditions of interest (for instance, calls to non-allowed parties or calls of a prohibited nature from prison inmates). In another aspect, automated prosody measurement algorithms are used in conjunction with speaker segmentation to extract emotional content of the speech of participants within a particular conversation, and speaker interactions and emotions are displayed in graphical form. A conversation database is generated which contains conversation recordings, and derived data such as transcription text, derived emotions, alert conditions, and correctness probabilities associated with derived data. Investigative tools allow flexible queries of the conversation database.
Abstract translation: 多方会话分析器和记录器使用各种技术,包括光谱语音分析,绝对响度测量,定向麦克风和电话方向分离,以确定参与对话的各方数量,并通过会话分割对话。 一方面,本发明实时监控电话对话以检测感兴趣的情况(例如,对不允许的当事人的呼叫或被监禁的囚犯的被禁止的呼叫)。 在另一方面,自动化韵律测量算法与说话者分割结合使用以提取特定对话内的参与者的言语的情感内容,并且以图形形式显示说话人的交互和情绪。 生成会话数据库,其中包含会话记录,以及衍生数据,如转录文本,派生情绪,警报条件以及与派生数据相关联的正确性概率。 调查工具允许会话数据库的灵活查询。
-
公开(公告)号:US20180182396A1
公开(公告)日:2018-06-28
申请号:US15823937
申请日:2017-11-28
Applicant: SORIZAVA CO., LTD.
Inventor: Munhak AN
CPC classification number: G10L15/26 , G06F17/279 , G06F17/28 , G06F17/30746 , G10L15/08 , G10L15/32 , G10L21/0272 , G10L21/12
Abstract: The present invention relates to a multi-speaker speech recognition correction system for determining a speaker of an utterance with a simple method and easily correcting speech-recognized text during speech recognition for a plurality of speakers. According to the present invention, when speech signals are input to a multi-speaker speech recognition system from a plurality of microphones which are each provided to a corresponding one of a plurality of speakers, the multi-speaker speech recognition correction system may detect a speech session from a time point at which input of each of the speech signals is started to a time point at which the input of the speech signal is stopped, and a speech recognizer may convert only the detected speech sessions into text so that a speaker of an utterance can be identified by a simple method and speech recognition can be carried out at a low cost.
-
-
-
-
-
-
-
-
-