-
公开(公告)号:US20240357212A1
公开(公告)日:2024-10-24
申请号:US18745720
申请日:2024-06-17
发明人: Stephan SCHREINER , Simone NEUKAM , Harald FUCHS , Jan PLOGSTIES , Stefan DOEHLA
IPC分类号: H04N21/81 , G10L19/00 , G10L19/16 , H04N21/435 , H04N21/4363 , H04N21/439 , H04N21/442 , H04N21/485
CPC分类号: H04N21/8106 , G10L19/00 , G10L19/167 , H04N21/435 , H04N21/4363 , H04N21/4394 , H04N21/44222 , H04N21/44227 , H04N21/4852
摘要: Audio data processor, having: a receiver interface for receiving encoded audio data and metadata related to the encoded audio data; a metadata parser for parsing the metadata to determine an audio data manipulation possibility; an interaction interface for receiving an interaction input and for generating, from the interaction input, interaction control data related to the audio data manipulation possibility; and a data stream generator for obtaining the interaction control data and the encoded audio data and the metadata and for generating an output data stream, the output data stream having the encoded audio data, at least a portion of the metadata, and the interaction control data.
-
公开(公告)号:US12118980B2
公开(公告)日:2024-10-15
申请号:US18346657
申请日:2023-07-03
申请人: Telepathy Labs, Inc.
发明人: Martin Reber , Vijeta Avijeet
IPC分类号: G10L25/30 , G06F18/10 , G06F18/21 , G06F18/2135 , G06N3/02 , G06N3/042 , G06N3/08 , G06N5/02 , G10L13/04 , G10L13/08 , G10L19/00
CPC分类号: G10L13/08 , G06F18/10 , G06F18/2135 , G06F18/217 , G06N3/02 , G06N3/042 , G06N3/08 , G06N5/02 , G10L13/04 , G10L19/00
摘要: A technique improves training and speech quality of a text-to-speech (TTS) system having an artificial intelligence, such as a neural network. The TTS system is organized as a front-end subsystem and a back-end subsystem. The front-end subsystem is configured to provide analysis and conversion of text into input vectors, each having at least a base frequency, f0, a phenome duration, and a phoneme sequence that is processed by a signal generation unit of the back-end subsystem. The signal generation unit includes the neural network interacting with a pre-existing knowledgebase of phenomes to generate audible speech from the input vectors. The technique applies an error signal from the neural network to correct imperfections of the pre-existing knowledgebase of phenomes to generate audible speech signals. A back-end training system is configured to train the signal generation unit by applying psychoacoustic principles to improve quality of the generated audible speech signal.
-
公开(公告)号:US12094885B2
公开(公告)日:2024-09-17
申请号:US17209063
申请日:2021-03-22
发明人: Min Soo Kim
IPC分类号: G10L13/04 , G06F3/041 , G06F3/044 , G06F18/10 , G06F18/21 , G06F18/2135 , G06N3/02 , G06N3/04 , G06N3/042 , G06N3/08 , G06N5/02 , G10L13/08 , G10L19/00 , H01L27/12 , H05K1/02 , H05K1/14 , G02F1/1345 , H10K59/131 , H10K77/10
CPC分类号: H01L27/124 , G06F3/0412 , G06F3/0446 , G06F18/10 , G06F18/2135 , G06F18/217 , G06N3/02 , G06N3/042 , G06N3/08 , G06N5/02 , G10L13/04 , G10L13/08 , G10L19/00 , H01L27/1218 , H05K1/028 , H05K1/147 , G02F1/13452 , G02F2201/56 , G06F2203/04102 , H05K2201/056 , H05K2201/10128 , H05K2201/10136 , H10K59/131 , H10K77/10
摘要: A display device includes: a display panel having a first side and a second side facing the first side in a first direction, the display panel including a recessed portion having a recessed shape from the first side of the display panel toward the second side in the first direction, the recessed portion including a side extended from the first side of the display panel; a pad portion disposed on a front surface of the display panel, the pad portion being adjacent to at least one side of the recessed portion; and a flexible printed circuit (FPC) connected to the pad portion, the FPC being bent to a rear surface of the display panel around the at least one side of the recessed portion, the rear surface opposing the front surface.
-
公开(公告)号:US12094480B2
公开(公告)日:2024-09-17
申请号:US18228109
申请日:2023-07-31
发明人: Lars Villemoes , Heiko Purnhagen , Per Ekstrand
IPC分类号: G10L19/00 , G10L19/008 , G10L19/24 , G10L19/26
CPC分类号: G10L19/26 , G10L19/008 , G10L19/24
摘要: A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag.
-
5.
公开(公告)号:US12094476B2
公开(公告)日:2024-09-17
申请号:US17781978
申请日:2020-12-02
CPC分类号: G10L19/167
摘要: Embodiments are disclosed for channel-based audio (CBA) (e.g., 22.2-ch audio) to object-based audio (OBA) conversion. The conversion includes converting CBA metadata to object audio metadata (OAMD) and reordering the CBA channels based on channel shuffle information derived in accordance with channel ordering constraints of the OAMD. The OBA with reordered channels is rendered in a playback device using the OAMD or in a source device, such as a set-top box or audio/video recorder. In an embodiment, the CBA metadata includes signaling that indicates a specific OAMD representation to be used in the conversion of the metadata. In an embodiment, pre-computed OAMD is transmitted in a native audio bitstream (e.g., AAC) for transmission (e.g., over HDMI) or for rendering in a source device. In an embodiment, pre-computed OAMD is transmitted in a transport layer bitstream (e.g., ISO BMFF, MPEG4 audio bitstream) to a playback device or source device.
-
公开(公告)号:US12080303B2
公开(公告)日:2024-09-03
申请号:US18514393
申请日:2023-11-20
IPC分类号: G10L19/025 , G10L19/00 , G10L19/008 , G10L19/02 , G10L19/032 , G10L19/16 , H03H17/02 , H04B1/66 , H04B3/20 , H04B3/21 , H04L65/70 , H04L65/75 , H04N19/44 , H04N19/625 , H04N21/233
CPC分类号: G10L19/025 , G10L19/0017 , G10L19/008 , G10L19/0204 , G10L19/032 , G10L19/167 , H03H17/0272 , H04B1/667 , H04B3/20 , H04B3/21 , H04L65/70 , H04L65/75 , H04N19/45 , H04N19/625 , H04N21/233 , G10L19/0212
摘要: An encoder operable to filter audio signals into a plurality of frequency band components, generate quantized digital components for each band, identify a potential for pre-echo events within the generated quantized digital components, generate an approximate signal by decoding the quantized digital components using inverse pulse code modulation, generate an error signal by comparing the approximate signal with the sampled audio signal, and process the error signal and quantized digital components. The encoder operable to process the error signal by processing delayed audio signals and Q band values, determining the potential for pre-echo events from the Q band values, and determining scale factors and MDCT block sizes for the potential for pre-echo events.
-
公开(公告)号:US20240282315A1
公开(公告)日:2024-08-22
申请号:US18653833
申请日:2024-05-02
发明人: Kristofer Kjoerling
IPC分类号: G10L19/00 , G10L19/02 , G10L21/038
CPC分类号: G10L19/0017 , G10L19/0204 , G10L21/038
摘要: The application relates to HFR (High Frequency Reconstruction/Regeneration) of audio signals. In particular, the application relates to a method and system for performing HFR of audio signals having large variations in energy level across the low frequency range which is used to reconstruct the high frequencies of the audio signal. A system configured to generate a plurality of high frequency subband signals covering a high frequency interval from a plurality of low frequency subband signals is described. The system comprises means for receiving the plurality of low frequency subband signals; means for receiving a set of target energies, each target energy covering a different target interval within the high frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval; means for generating the plurality of high frequency subband signals from the plurality of low frequency subband signals and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals, respectively; and means for adjusting the energy of the plurality of high frequency subband signals using the set of target energies.
-
公开(公告)号:US12051430B2
公开(公告)日:2024-07-30
申请号:US18195015
申请日:2023-05-09
发明人: Takehiro Moriya , Yutaka Kamamoto , Noboru Harada
IPC分类号: G10L19/00 , G10L19/038 , G10L19/07 , G10L21/00 , G10L19/005
CPC分类号: G10L19/07 , G10L19/038 , G10L2019/0016 , G10L19/005
摘要: A coding method and a decoding method are provided which can use in combination a predictive coding and decoding method which is a coding and decoding method that can accurately express coefficients which are convertible into linear prediction coefficients with a small code amount and a coding and decoding method that can obtain correctly, by decoding, coefficients which are convertible into linear prediction coefficients of the present frame if a linear prediction coefficient code of the present frame is correctly input to a decoding device. A coding device includes: a predictive coding unit that obtains a first code by coding a differential vector formed of differentials between a vector of coefficients which are convertible into linear prediction coefficients of more than one order of the present frame and a prediction vector containing at least a predicted vector from a past frame, and obtains a quantization differential vector corresponding to the first code; and a non-predictive coding unit that generates a second code by coding a correction vector which is formed of differentials between the vector of the coefficients which are convertible into the linear prediction coefficients of more than one order of the present frame and the quantization differential vector or formed of some of elements of the differentials.
-
公开(公告)号:US12035018B2
公开(公告)日:2024-07-09
申请号:US18347546
申请日:2023-07-05
发明人: Stephan Schreiner , Simone Neukam , Harald Fuchs , Jan Plogsties , Stefan Doehla
IPC分类号: H04N21/81 , G10L19/00 , G10L19/16 , H04N21/435 , H04N21/4363 , H04N21/439 , H04N21/442 , H04N21/485
CPC分类号: H04N21/8106 , G10L19/00 , G10L19/167 , H04N21/435 , H04N21/4363 , H04N21/4394 , H04N21/44222 , H04N21/44227 , H04N21/4852
摘要: Audio data processor, having: a receiver interface for receiving encoded audio data and metadata related to the encoded audio data; a metadata parser for parsing the metadata to determine an audio data manipulation possibility; an interaction interface for receiving an interaction input and for generating, from the interaction input, interaction control data related to the audio data manipulation possibility; and a data stream generator for obtaining the interaction control data and the encoded audio data and the metadata and for generating an output data stream, the output data stream having the encoded audio data, at least a portion of the metadata, and the interaction control data.
-
10.
公开(公告)号:US12033644B2
公开(公告)日:2024-07-09
申请号:US17479912
申请日:2021-09-20
申请人: SMULE, INC.
发明人: Parag Chordia , Mark Godfrey , Alexander Rae , Prerna Gupta , Perry R. Cook
CPC分类号: G10L19/02 , G10H1/366 , G10L19/00 , G10L21/055 , G10H2210/051 , G10H2240/141 , G10H2250/235
摘要: Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
-
-
-
-
-
-
-
-
-