Patent search ap:"Chalapathy V. Neti" Page 1

1.

发明授权
Audio-visual codebook dependent cepstral normalization 有权
Title translation: 视听码本依赖倒谱归一化

公开(公告)号：US07664637B2

公开(公告)日：2010-02-16

申请号：US11932996

申请日：2007-10-31

Applicant: Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

Inventor： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

IPC: G10L15/00

CPC classification number: G10L15/20 , G10L15/24

Abstract: An arrangement for yielding enhanced audio features towards the provision of enhanced audio-visual features for speech recognition. Input is provided in the form of noisy audio-visual features and noisy audio features related to the noisy audio-visual features.

Abstract translation: 用于产生增强的音频特征以提供用于语音识别的增强的视听特征的装置。输入以嘈杂的视听功能和与嘈杂的视听功能相关的嘈杂音频功能的形式提供。

2.

发明授权
Audio-visual codebook dependent cepstral normalization 有权
Title translation: 视听码本依赖倒谱归一化

公开(公告)号：US07319955B2

公开(公告)日：2008-01-15

申请号：US10307164

申请日：2002-11-29

Applicant: Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

Inventor： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

IPC: G10L15/00

CPC classification number: G10L15/20 , G10L15/24

Abstract: An arrangement for yielding enhanced audio features towards the provision of enhanced audio-visual features for speech recognition. Input is provided in the form of noisy audio-visual features and noisy audio features related to the noisy audio-visual features.

Abstract translation: 用于产生增强的音频特征以提供用于语音识别的增强的视听特征的装置。输入以嘈杂的视听功能和与嘈杂的视听功能相关的嘈杂音频功能的形式提供。

3.

发明授权
Speech driven lip synthesis using viseme based hidden markov models 有权
Title translation: 使用基于Viseme的隐马尔可夫模型的语音驱动唇形合成

公开(公告)号：US06366885B1

公开(公告)日：2002-04-02

申请号：US09384763

申请日：1999-08-27

Applicant: Sankar Basu , Tanveer Atzal Faruquie , Chalapathy V. Neti , Nitendra Rajput , Andrew William Senior , L. Venkata Subramaniam , Ashish Verma

Inventor： Sankar Basu , Tanveer Atzal Faruquie , Chalapathy V. Neti , Nitendra Rajput , Andrew William Senior , L. Venkata Subramaniam , Ashish Verma

IPC: G10L2106

CPC classification number: G11B27/10 , G10L2021/105 , G11B27/031

Abstract: A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

Abstract translation: 基于视觉训练模型的视觉语音单元的语音驱动唇形合成方法。音频数据被分组为较少数量的视觉上不同的视角，而不是较大数量的音素。这些视差然后形成了隐马尔可夫模型（HMM）状态序列或神经网络的输出节点的基础。在训练阶段，从输入语音中提取音频和视觉特征，然后根据明显的视度序列对准音频特征，使用相应的音频特征来计算HMM状态输出概率或中性网络的输出。在合成阶段期间，声输入与最可能的viseme HMM序列（在基于HMM的模型的情况下）或网络的节点（在基于神经网络的系统的情况下）对齐，然后使用用于动画。

4.

发明申请
METHOD AND APPARATUS FOR PERVASIVE AUTHENTICATION DOMAINS 有权

公开(公告)号：US20080141357A1

公开(公告)日：2008-06-12

申请号：US11932918

申请日：2007-10-31

Applicant: Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

Inventor： Sabine Deligne , Chalapathy V. Neti , Gerasimos Potamianos

IPC: H04L9/32

CPC classification number: H04L63/08 , H04L63/0428 , H04L63/126

Abstract: Methods and apparatus for enabling a Pervasive Authentication Domain. A Pervasive Authentication Domain allows many registered Pervasive Devices to obtain authentication credentials from a single Personal Authentication Gateway and to use these credentials on behalf of users to enable additional capabilities for the devices. It provides an arrangement for a user to store credentials in one device (the Personal Authentication Gateway), and then make use of those credentials from many authorized Pervasive Devices without re-entering the credentials. It provides a convenient way for a user to share credentials among many devices, particularly when it is not convenient to enter credentials as in a smart wristwatch environment. It further provides an arrangement for disabling access to credentials to devices that appear to be far from the Personal Authentication Gateway as measured by metrics such as communications signal strengths.

5.

发明授权
Assessing consistency between facial motion and speech signals in video 失效
Title translation: 评估视频中面部动作和语音信号的一致性

公开(公告)号：US07046300B2

公开(公告)日：2006-05-16

申请号：US10307181

申请日：2002-11-29

Applicant: Giridharan Iyengar , Chalapathy V. Neti , Harriet J. Nock

Inventor： Giridharan Iyengar , Chalapathy V. Neti , Harriet J. Nock

IPC: H04N9/475

CPC classification number: G10L2021/105

Abstract: The use of multiple complementary classes of measure to assess face and speech consistency in video. In an exemplary embodiment, both synchrony measures and plausibility measures are employed.

Abstract translation: 使用多种补充措施来评估视频中的面部和语音一致性。在一个示例性实施例中，采用同步测量和可信度测量。

6.

发明授权
System and method for microphone activation using visual speech cues 失效
Title translation: 使用视觉语音提示的麦克风激活的系统和方法

公开(公告)号：US06754373B1

公开(公告)日：2004-06-22

申请号：US09616229

申请日：2000-07-14

Applicant: Philippe de Cuetos , Giridharan R. Iyengar , Chalapathy V. Neti , Gerasimos Potamianos

Inventor： Philippe de Cuetos , Giridharan R. Iyengar , Chalapathy V. Neti , Gerasimos Potamianos

IPC: G06K900

CPC classification number: G10L25/78 , G06K9/00335 , G10L15/24

Abstract: A system for activating a microphone based on visual speech cues, in accordance with the invention, includes a feature tracker coupled to an image acquisition device. The feature tracker tracks features in an image of a user. A region of interest extractor is coupled to the feature tracker. The region of interest extractor extracts a region of interest from the image of the user. A visual speech activity detector is coupled to the region of interest extractor and measures changes in the region of interest to determine if a visual speech cue has been generated by the user. A microphone is turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. Methods for activating a microphone based on visual speech cues are also included.

Abstract translation: 根据本发明的用于基于视觉语音提示来激活麦克风的系统包括耦合到图像采集装置的特征跟踪器。功能跟踪器跟踪用户图像中的功能。感兴趣区域提取器耦合到特征跟踪器。感兴趣区域提取器从用户的图像中提取感兴趣的区域。视觉语音活动检测器耦合到感兴趣区域提取器，并测量感兴趣区域中的变化，以确定用户是否已经产生视觉语音提示。当视觉语音活动检测器确定了视觉语音提示时，麦克风由视觉语音活动检测器接通。还包括基于视觉语音提示激活麦克风的方法。

7.

发明授权
Method and system for multi-client access to a dialog system 有权
Title translation: 多客户端访问对话系统的方法和系统

公开(公告)号：US06377913B1

公开(公告)日：2002-04-23

申请号：US09374026

申请日：1999-08-13

Applicant: Daniel M. Coffman , Popani Gopalakrishnan , Ganesh N. Ramaswamy , Jan Kleindienst , Chalapathy V. Neti

Inventor： Daniel M. Coffman , Popani Gopalakrishnan , Ganesh N. Ramaswamy , Jan Kleindienst , Chalapathy V. Neti

IPC: G10L1500

CPC classification number: G06F3/16

Abstract: In accordance with the invention, a method and system for accessing a dialog system employing a plurality of different clients, includes providing a first client device for accessing a conversational system and presenting a command to the conversational system by converting the command to a form understandable to the conversational system. The command is interpreted by employing a mediator, a dialog manager and a multi-modal history to determine the intent of the command based on a context of the command. A second client device is determined based on a predetermined device preference stored in the conversational system. An application is abstracted to perform the command, and the results of the performance of the command are set to the second client device.

Abstract translation: 根据本发明，一种用于访问采用多个不同客户端的对话系统的方法和系统包括：提供用于访问对话系统的第一客户端设备，并且通过将该命令转换成可理解为的形式向对话系统呈现命令对话系统。通过使用调解器，对话管理器和多模式历史来解释该命令，以基于命令的上下文来确定命令的意图。基于存储在会话系统中的预定的设备偏好来确定第二客户端设备。抽象应用程序来执行命令，并将命令的执行结果设置为第二个客户端设备。

8.

发明授权
System and method for annotating multi-modal characteristics in multimedia documents 有权
Title translation: 多媒体文件注释多模态特征的系统和方法

公开(公告)号：US07793212B2

公开(公告)日：2010-09-07

申请号：US10539890

申请日：2003-12-19

Applicant: Hugh W. Adams, Jr. , Giridharen Iyengar , Ching-Yung Lin , Chalapathy V. Neti , John R. Smith , Belle L. Tseng

Inventor： Hugh W. Adams, Jr. , Giridharen Iyengar , Ching-Yung Lin , Chalapathy V. Neti , John R. Smith , Belle L. Tseng

IPC: G06F17/00

CPC classification number: G06F17/30038 , G06F17/241

Abstract: A manual annotation system of multi-modal characteristics in multimedia files. There is provided an arrangement for selection an observation modality of video with audio, video without audio, audio with video, or audio without video, to be used to annotate multimedia content. While annotating video or audio features is isolation results in less confidence in the identification of features, observing both audio and video simultaneously and annotating that observation results in a higher confidence level.

Abstract translation: 多媒体文件中多模态特征的手动注释系统。提供了一种用于选择具有音频的视频，没有音频的视频，具有视频的音频或无视频的音频的观察模式的布置，用于注释多媒体内容。虽然注释视频或音频功能是隔离的结果对于识别功能的信心较差，同时观察音频和视频，并注释观察结果导致更高的置信水平。

9.

发明授权
Method and system for noise-robust speech processing with cochlea filters in an auditory model 失效
Title translation: 在听觉模型中使用耳蜗滤波器进行噪声鲁棒语音处理的方法和系统

公开(公告)号：US5768474A

公开(公告)日：1998-06-16

申请号：US581288

申请日：1995-12-29

Applicant: Chalapathy V. Neti

Inventor： Chalapathy V. Neti

IPC: G10L15/02 , G10L15/20 , G01L5/06 , G01L9/00

CPC classification number: G10L15/02 , G10L15/20

Abstract: A method for noise-robust speech processing with cochlea filters within a computer system is disclosed. This invention provides a method for producing feature vectors from a segment of speech, that is more robust to variations in the environment due to additive noise. A first output is produced by convolving a speech signal input with spatially dependent impulse responses that resemble cochlea filters. The temporal transient and the spatial transient of the first output is then enhanced by taking a time derivative and a spatial derivative, respectively, of the first output to produce a second output. Next, all the negative values of the second output are replaced with zeros. A feature vector is then obtained from each frame of the second output by a multiple resolution extraction. The parameters for the cochlea filters are finally optimized by minimizing the difference between a feature vector generated from a relatively noise-free speech signal input and a feature vector generated from a noisy speech signal input.

Abstract translation: 公开了一种在计算机系统内使用耳蜗滤波器进行噪声鲁棒语音处理的方法。本发明提供了一种用于从语音段产生特征向量的方法，其对于由于加性噪声引起的环境变化更加鲁棒。通过将语音信号输入与类似于耳蜗滤波器的空间相关脉冲响应进行卷积来产生第一输出。然后通过分别获得第一输出的时间导数和空间导数来增强第一输出的时间瞬态和空间瞬态以产生第二输出。接下来，第二个输出的所有负值都被替换为零。然后通过多分辨率提取从第二输出的每个帧获得特征向量。通过最小化从相对无噪声语音信号输入产生的特征向量与从噪声语音信号输入产生的特征向量之间的差异，最终优化耳蜗滤波器的参数。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification