-
公开(公告)号:US20240331706A1
公开(公告)日:2024-10-03
申请号:US18741427
申请日:2024-06-12
发明人: Linhao DONG , Zhiyun FAN , Zejun MA
IPC分类号: G10L17/04
CPC分类号: G10L17/04
摘要: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.
-
公开(公告)号:US12094484B2
公开(公告)日:2024-09-17
申请号:US18360838
申请日:2023-07-28
申请人: ZHEJIANG LAB
发明人: Jingsong Li , Zhenchuan Zhang , Tianshu Zhou , Yu Tian
IPC分类号: G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30
CPC分类号: G10L21/0232 , G10L17/02 , G10L17/04 , G10L25/30
摘要: The present disclosure discloses a general speech enhancement method and apparatus using multi-source auxiliary information. The method includes following steps: S1: building a training data set; S2: using the training data set to learn network parameters of a model, and building a speech enhancement model; S3: building a sound source information database in a pre-collection or on-site collection mode; S4: acquiring an input of the speech enhancement model; and S5: taking a noisy original signal as a main input of the speech enhancement model, taking auxiliary sound signals of a target source group and auxiliary sound signals of an interference source group as side inputs of the speech enhancement model for speech enhancement, and obtaining an enhanced speech signal.
-
公开(公告)号:US20240304190A1
公开(公告)日:2024-09-12
申请号:US18663779
申请日:2024-05-14
发明人: Taewoo LEE , Minseok KWON , Kyungtae KIM , Gajin SONG , Hoseon SHIN , Jungin LEE , Seokyeong JUNG
摘要: An electronic device configured to perform speaker verification on a voice input to determine whether the voice input matches a voice of an enrolled speaker, based on determining that the voice input does not match the voice of the enrolled speaker, perform first speech recognition on the voice input based on a first automatic speech recognition (ASR) model, and based on determining that the voice input matches the voice of the enrolled speaker, perform second speech recognition on the voice input based on a sequence summarizing neural network (SSN) and a second ASR model.
-
公开(公告)号:US12039995B2
公开(公告)日:2024-07-16
申请号:US17667370
申请日:2022-02-08
发明人: Jun Wang , Wingyip Lam
IPC分类号: G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L21/0208 , G10L21/0232
CPC分类号: G10L21/0308 , G10L13/02 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/22 , G10L2021/02087 , G10L21/0232
摘要: This application discloses an audio signal processing method performed by an electronic device. According to this application, embedding processing is performed on a mixed audio signal by mapping the mixed audio signal to an embedding space, to obtain an embedding feature of the mixed audio signal in the embedding space; and generalized feature extraction is performed on the embedding feature, so that a generalized feature of a target component in the mixed audio signal can be obtained through extraction. The generalized feature of the target component has good generalization capability and expression capability, and can be used for different scenarios. Audio signal processing is performed on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object, thereby improving the robustness and generalization of an audio signal processing process, and improving the accuracy of audio signal processing.
-
公开(公告)号:US20240232308A1
公开(公告)日:2024-07-11
申请号:US18152671
申请日:2023-01-10
申请人: Verint Americas Inc.
发明人: Ian BEAVER , Vladislav LUZIN
摘要: Certain aspects of the present disclosure provide techniques for receiving audio data comprising a user voice command; determining a task to be completed by a remote service based on the user voice command; determining that a reference voice print associated with the user is stored in a user account; authenticating the user by determining that a sample voice print based on the user voice command matches the reference voice print associated with the user; storing authentication evidence associated with the task; and providing proof of user authentication to the remote service in order to initiate the task with the remote service.
-
公开(公告)号:US12020711B2
公开(公告)日:2024-06-25
申请号:US17166525
申请日:2021-02-03
申请人: Nice Ltd.
摘要: A system and method may classify a plurality of interactions, by: obtaining a plurality of voiceprints of the plurality of interactions, wherein each voiceprint of the plurality of voiceprints represents a speaker participating in an interaction of the plurality of interactions; calculating, for each interaction, a plurality of scores, wherein each score of the plurality of scores is indicative of a similarity between the voiceprint of the interaction and one voiceprint of a set of benchmark voiceprints; calculating, for each interaction, statistics of the scores; and determining that a plurality of interactions pertain to a single cluster of interactions based on statistics of the scores of the interactions in the cluster.
-
公开(公告)号:US12002467B2
公开(公告)日:2024-06-04
申请号:US18074513
申请日:2022-12-05
申请人: KYOCERA Corporation
发明人: Yumiko Yamamoto
CPC分类号: G10L15/22 , G06F16/433 , G06F21/32 , G06V40/50 , G10L15/07 , G10L2015/223 , G10L2015/227
摘要: A voice command system according to a first disclosure comprises a gateway apparatus having an interface configured to receive a voice command, and a controller configured to perform a registration process of registering a speaker permitted to receive the voice command. The controller is configured to perform an authentication process of rejecting a reception of the voice command when a speaker of the voice command is not registered, and permitting a reception of the voice command when a speaker of the voice command is registered. The controller is configured to perform the authentication process for each voice command.
-
公开(公告)号:US20240163397A1
公开(公告)日:2024-05-16
申请号:US18424706
申请日:2024-01-26
发明人: Mario Ferrari
摘要: A hub apparatus (20) is designated to be used in a video communication system comprising the hub apparatus (20) and a plurality of mobile terminals (10a-10d) configured to be wirelessly connectable to the hub apparatus (20). The hub apparatus (20) comprises: a receiving unit (24) configured to receive from each mobile terminal (10) of the plurality of mobile terminals (10a-10d) a video stream, a current speaker indicator to indicate whether the user of the mobile terminal is speaking and an association information which associates the current speaker indicator transmitted by the mobile terminal with the video stream transmitted from such mobile terminal (10), and a generation unit (40) operatively connected to said receiving unit (24) and configured to generate an output video communication stream (6) based on the plurality of video streams received from each mobile terminal (10) of the plurality of mobile terminals (10a-10d), on the plurality of current speaker indicators received from each mobile terminal (10) of the plurality of mobile terminals (10a-10d) and on the plurality of association information received from each mobile terminal (10) of the plurality of mobile terminals (10a-10d).
-
公开(公告)号:US20240153509A1
公开(公告)日:2024-05-09
申请号:US18368459
申请日:2023-09-14
申请人: OTO Systems Inc.
IPC分类号: G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272
CPC分类号: G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272
摘要: Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.
-
公开(公告)号:US11978472B2
公开(公告)日:2024-05-07
申请号:US17210108
申请日:2021-03-23
申请人: Otter.ai, Inc.
发明人: Yun Fu , Simon Lau , Kaisuke Nakajima , Julius Cheng , Gelei Chen , Sam Song Liang , James Mason Altreuter , Kean Kheong Chin , Zhenhao Ge , Hitesh Anand Gupta , Xiaoke Huang , James Francis McAteer , Brian Francis Williams , Tao Xing
CPC分类号: G10L21/10 , G06F16/438 , G10L17/02 , G10L17/04 , G10L17/22 , H04L63/104
摘要: A system for processing and presenting a conversation includes a sensor, a processor, and a presenter. The sensor is configured to capture an audio-form conversation. The processor is configured to automatically transform the audio-form conversation into a transformed conversation. The transformed conversation includes a synchronized text, wherein the synchronized text is synchronized with the audio-form conversation. The presenter is configured to present the transformed conversation including the synchronized text and the audio-form conversation. The presenter is further configured to present the transformed conversation to be navigable, searchable, assignable, editable, and shareable.
-
-
-
-
-
-
-
-
-