-
公开(公告)号:US12080317B2
公开(公告)日:2024-09-03
申请号:US17639317
申请日:2020-08-27
IPC分类号: G10L15/20 , G10L21/02 , G10L21/0208 , G10L21/0316
CPC分类号: G10L21/0316 , G10L15/20 , G10L21/0208 , G10L2021/02082
摘要: An apparatus and method of pre-conditioning audio for machine perception. Machine perception differs from human perception, and different processing parameters are used for machine perception applications (e.g., speech to text processing) as compared to those used for human perception applications (e.g., voice communications). These different parameters may result in pre-conditioned audio that is worsened for human perception yet improved for machine perception.
-
公开(公告)号:US12039988B1
公开(公告)日:2024-07-16
申请号:US18424695
申请日:2024-01-26
申请人: Nantong University
发明人: Shibing Zhang , Jianrong Wu
CPC分类号: G10L21/02 , G10L15/063 , G10L15/20 , G10L25/51 , G10L2015/0631
摘要: The present application discloses a method and a system for saturation diving heliumspeech unscrambling based on multi-objective optimization. In a system including a diver and a filter at least, a working language phonetic symbol library and a common working word library for divers are constructed. The divers read them one by one, and a phonetic symbol standard speech library, a phonetic symbol heliumspeech library and a common working word speech library are generated. The filter uses the multi-objective optimization algorithm to design its impulse response coefficients, corrects and unscrambles the tagged and sampled heliumspeech signal word by word, and continuously updates the impulse response coefficients to complete the perfect heliumspeech unscrambling.
-
公开(公告)号:US12033649B2
公开(公告)日:2024-07-09
申请号:US17793539
申请日:2021-01-18
IPC分类号: G10L21/02
CPC分类号: G10L21/02
摘要: Embodiments are disclosed for noise floor estimation and noise reduction, In an embodiment, a method comprises: obtaining an audio signal; dividing the audio signal into a plurality of buffers; determining time-frequency samples for each buffer of the audio signal; for each buffer and for each frequency, determining a median (or mean) and a measure of an amount of variation of energy based on the samples in the buffer and samples in neighboring buffers that together span a specified time range of the audio signal; combining the median (or mean) and the measure of the amount of variation of energy into a cost function; for each frequency: determining a signal energy of a particular buffer of the audio signal that corresponds to a minimum value of the cost function; selecting the signal energy as the estimated noise floor of the audio signal; and reducing, using the estimated noise floor, noise in the audio signal.
-
公开(公告)号:US20240212699A1
公开(公告)日:2024-06-27
申请号:US18568678
申请日:2022-06-09
申请人: COCHL.INC.
发明人: Yoon Chang HAN , Su Bin LEE , Jeong Soo PARK , Il Young JEONG , Don Moon LEE , Hyun Gui LIM
摘要: An audio quality conversion device according to the present invention includes: a control unit having, mounted therein, an artificial neural network that learns using a plurality of pieces of audio data recorded in recording environments differing with respect to a predetermined audio event, and environmental data related to the recording environments corresponding to respective audio data; and an audio input unit receiving outside sounds to generate audio recording data, wherein the control unit converts, on the basis of a learning result of the artificial neural network, the audio recording data generated by means of the audio input unit.
-
公开(公告)号:US12014747B2
公开(公告)日:2024-06-18
申请号:US18308293
申请日:2023-04-27
IPC分类号: G10L19/26 , G10L19/02 , G10L19/028 , G10L19/03 , G10L19/032 , G10L19/04 , G10L19/12 , G10L19/16 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L21/038 , G10L25/15 , G10L25/18
CPC分类号: G10L19/265 , G10L19/0204 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/16 , G10L19/26 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/02 , G10L19/028 , G10L19/04 , G10L21/038
摘要: An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.
-
公开(公告)号:US20240194189A1
公开(公告)日:2024-06-13
申请号:US18077180
申请日:2022-12-07
摘要: An electronic device includes a far-field voice (FFV) processor including a source selection module. The source selection module receives a set of audio signals and determines, for each audio stream, whether the audio stream is relevant to an application. The source selection module receives several separate probability computations, with each probability computation providing a probability of the presence of a particular characteristic. Additionally, the source selection module receives one or more applications as well relevance information (e.g., one or relevant characteristics) associated with the one or applications. The source selection module can used respective probabilities to determine if one or more characteristics are present in an audio signal, and compare the characteristic(s) to the relevance information for the application. Using this information, the source selection module can determine, for each audio signal, to which respective application the audio stream is relevant.
-
公开(公告)号:US11996091B2
公开(公告)日:2024-05-28
申请号:US16989844
申请日:2020-08-10
IPC分类号: G10L15/20 , G10L15/02 , G10L15/16 , G10L15/22 , G10L17/06 , G10L21/02 , G10L21/0272 , G10L21/0208
CPC分类号: G10L15/20 , G10L15/02 , G10L15/16 , G10L15/22 , G10L17/06 , G10L21/02 , G10L21/0272 , G10L2015/223 , G10L2021/02087
摘要: A mixed speech recognition method, a mixed speech recognition apparatus, and a computer-readable storage medium are provided. The mixed speech recognition method includes: monitoring an input of speech input and detecting an enrollment speech and a mixed speech; acquiring speech features of a target speaker based on the enrollment speech; and determining speech belonging to the target speaker in the mixed speech based on the speech features of the target speaker. The enrollment speech includes preset speech information, and the mixed speech is non-enrollment speech inputted after the enrollment speech.
-
公开(公告)号:US11978467B2
公开(公告)日:2024-05-07
申请号:US17870759
申请日:2022-07-21
申请人: Dell Products, LP
发明人: Peng Lip Goh , Deeder M. Aurongzeb , Eng Kang Chng
IPC分类号: H04R3/00 , G10L21/02 , G10L21/0216 , G10L25/84 , H04R1/08 , H04R1/32 , G10L21/0208
CPC分类号: G10L21/0216 , G10L25/84 , H04R1/08 , H04R1/323 , G10L2021/02087
摘要: A speakerphone includes a processor, a memory device, a power management unit, a first microphone to receive audio waves, a second microphone to receive audio waves, and a third microphone to receive audio waves. The speakerphone may also include a digital signal processor (DSP) to detect a single-user mode activated at the speakerphone, process the audio waves received by the first microphone, second microphone, and third microphone to determine the wave phases of the audio waves received by the first microphone, second microphone, and third microphone, calculate a direction of a voice of a single user relative to the speakerphone; and process the voice of the single user and filter other voices detected by the first microphone, second microphone, and third microphone from the user's voice.
-
公开(公告)号:US11978464B2
公开(公告)日:2024-05-07
申请号:US17757122
申请日:2021-01-22
申请人: GOOGLE LLC
IPC分类号: G10L19/00 , G10L19/038 , G10L19/04 , G10L21/02 , G06N3/02
CPC分类号: G10L19/038 , G10L19/04 , G10L21/02 , G06N3/02 , G10L19/00
摘要: A method includes receiving sampled audio data corresponding to utterances and training a machine learning (ML) model, using the sampled audio data, to generate a high-fidelity audio stream from a low bitrate input bitstream. The training of the ML model includes de-emphasizing the influence of low-probability distortion events in the sampled audio data on the trained ML model, where the de-emphasizing of the distortion events is achieved by the inclusion of a term in an objective function of the ML model, which term encourages low-variance predictive distributions of a next sample in the sampled audio data, based on previous samples of the audio data.
-
公开(公告)号:US11934636B2
公开(公告)日:2024-03-19
申请号:US18190530
申请日:2023-03-27
申请人: Snap Inc.
发明人: Jesse Chand
IPC分类号: G06F3/0482 , G06F3/04817 , G06F3/16 , G10L21/003 , G10L21/02 , G10L21/0316 , G10L25/78 , G11B27/031 , G11B27/34
CPC分类号: G06F3/0482 , G06F3/167 , G11B27/031 , G11B27/34 , G06F3/04817 , G06F3/165 , G06F2203/04803 , G10L21/003 , G10L21/02 , G10L21/0316 , G10L25/78
摘要: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.
-
-
-
-
-
-
-
-
-