-
11.
公开(公告)号:US11837220B2
公开(公告)日:2023-12-05
申请号:US17308800
申请日:2021-05-05
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Inventor: Minje Kim , Mi Suk Lee , Seung Kwon Beack , Jongmo Sung , Tae Jin Lee , Jin Soo Choi , Kai Zhen
Abstract: Disclosed is a speech processing apparatus and method using a densely connected hybrid neural network. The speech processing method includes inputting a time domain sample of N*1 dimension for an input speech into a densely connected hybrid network; passing the time domain sample through a plurality of dense blocks in a densely connected hybrid network; reshaping the time domain samples into M subframes by passing the time domain samples through the plurality of dense blocks; inputting the M subframes into gated recurrent unit (GRU) components of N/M-dimension; outputting clean speech from which noise is removed from the input speech by passing the M subframes through GRU components.
-
公开(公告)号:US11562757B2
公开(公告)日:2023-01-24
申请号:US17377157
申请日:2021-07-15
Inventor: Seung Kwon Beack , Jongmo Sung , Mi Suk Lee , Tae Jin Lee , Woo-taek Lim , Inseon Jang , Jin Soo Choi
IPC: G10L19/06 , G10L19/032
Abstract: An audio signal encoding method performed by an encoder includes identifying a time-domain audio signal in a unit of blocks, quantizing a linear prediction coefficient extracted from a combined block in which a current original block of the audio signal and a previous original block chronologically adjacent to the current original block using frequency-domain linear predictive coding (LPC), generating a temporal envelope by dequantizing the quantized linear prediction coefficient, extracting a residual signal from the combined block based on the temporal envelope, quantizing the residual signal by one of time-domain quantization and frequency-domain quantization, and transforming the quantized residual signal and the quantized linear prediction coefficient into a bitstream.
-
公开(公告)号:US11456001B2
公开(公告)日:2022-09-27
申请号:US16814103
申请日:2020-03-10
Applicant: Electronics and Telecommunications Research Institute , Kwangwoon University Industry-Academic Collaboration Foundation
Inventor: Seung Kwon Beack , Jongmo Sung , Mi Suk Lee , Tae Jin Lee , Hochong Park
IPC: G10L19/02 , G06N3/04 , G10L21/038 , G10L19/032
Abstract: Disclosed are a method of encoding a high band of an audio, a method of decoding a high band of an audio, and an encoder and a decoder for performing the methods. The method of decoding a high band of an audio, the method performed by a decoder, includes identifying a parameter extracted through a first neural network, identifying side information extracted through a second neural network, and restoring a high band of an audio by applying the parameter and the side information to a third neural network.
-
公开(公告)号:US11133015B2
公开(公告)日:2021-09-28
申请号:US16180298
申请日:2018-11-05
Inventor: Seung Kwon Beack , Woo-taek Lim , Jongmo Sung , Mi Suk Lee , Tae Jin Lee , Hui Yong Kim
IPC: G10L19/04 , G10L25/30 , G10L19/008
Abstract: A method of predicting a channel parameter of an original signal from a downmix signal is disclosed. The method may include generating an input feature map to be used to predict a channel parameter of the original signal based on a downmix signal of an original signal, determining an output feature map including a predicted parameter to be used to predict the channel parameter by applying the input feature map to a neural network, generating a label map including information associated with the channel parameter of the original signal, and predicting the channel parameter of the original signal by comparing the output feature map and the label map.
-
公开(公告)号:US09807340B2
公开(公告)日:2017-10-31
申请号:US14951005
申请日:2015-11-24
Inventor: In Ki Hwang , Mi Suk Lee
CPC classification number: H04N7/144 , H04N13/111 , H04N13/128 , H04N13/239 , H04N13/271 , H04N13/383
Abstract: The present invention relates to a new eye-contact function providing method which provides a natural eye-contact function to attendances by using a stereo image and a depth image to estimate a precise depth value of the occlusion region and improve a quality of a composite eye-contact image when there are two or more remote attendances in one site at the time of a video conference using a video conference system and an apparatus therefor.
-
公开(公告)号:US11862183B2
公开(公告)日:2024-01-02
申请号:US17368390
申请日:2021-07-06
Inventor: Jongmo Sung , Seung Kwon Beack , Mi Suk Lee , Tae Jin Lee , Woo-taek Lim , Inseon Jang
IPC: G10L19/032
CPC classification number: G10L19/032
Abstract: An audio signal encoding and decoding method using a neural network model, a method of training the neural network model, and an encoder and decoder performing the methods are disclosed. The encoding method includes computing the first feature information of an input signal using a recurrent encoding model, computing an output signal from the first feature information using a recurrent decoding model, calculating a residual signal by subtracting the output signal from the input signal, computing the second feature information of the residual signal using a nonrecurrent encoding model, and converting the first feature information and the second feature information to a bitstream.
-
公开(公告)号:US11790926B2
公开(公告)日:2023-10-17
申请号:US17156006
申请日:2021-01-22
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Inventor: Mi Suk Lee , Seung Kwon Beack , Jongmo Sung , Tae Jin Lee , Jin Soo Choi , Minje Kim , Kai Zhen
IPC: G10L19/038 , G10L19/028 , G10L25/18 , G10L25/21 , G10L25/30
CPC classification number: G10L19/038 , G10L19/028 , G10L25/18 , G10L25/21 , G10L25/30
Abstract: A method and apparatus for processing an audio signal are disclosed. According to an example embodiment, a method of processing an audio signal may include acquiring a final audio signal for an initial audio signal using a plurality of neural network models generating output audio signals by encoding and decoding input audio signals, calculating a difference between the initial audio signal and the final audio signal in a time domain, converting the initial audio signal and the final audio signal into Mel-spectra, calculating a difference between the Mel-spectra of the initial audio signal and the final audio signal in a frequency domain, training the plurality of neural network models based on results calculated in the time domain and the frequency domain, and generating a new final audio signal distinguished from the final audio signal from the initial audio signal using the trained neural network models.
-
公开(公告)号:US11729597B2
公开(公告)日:2023-08-15
申请号:US17534563
申请日:2021-11-24
Inventor: Seung Il Myong , Woo Sug Jung , Mi Suk Lee
CPC classification number: H04W4/90 , G06T19/003 , H04W4/38 , H04W4/80
Abstract: Provided is a digital twin disaster management system customized to keep safety for urban underground tunnels, including: a sensor sub-system configured to detect environmental information, status information and image information in the urban underground tunnels; a digital twin model management sub-system configured to create and update a virtual space corresponding to the urban underground tunnels using information provided from the sensor sub-system and 3D space, insert various types of attributes into the virtual space, detect tagging information, predict the spread of each disaster, and infer a degree of risk of a management facility; a disaster management sub-system having a control function of conducting centralized supervision by displaying information about components installed in the urban underground tunnels in the metaverse space and recording a situation; and a network sub-system configured to provide the virtual space to a user terminal of an external inspector.
-
公开(公告)号:US11664037B2
公开(公告)日:2023-05-30
申请号:US17326035
申请日:2021-05-20
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Inventor: Woo-taek Lim , Seung Kwon Beack , Jongmo Sung , Mi Suk Lee , Tae Jin Lee , Inseon Jang , Minje Kim , Haici Yang
IPC: G10L19/032 , G10L21/0272
CPC classification number: G10L19/032 , G10L21/0272
Abstract: Methods of encoding and decoding a speech signal using a neural network model that recognizes sound sources, and encoding and decoding apparatuses for performing the methods are provided. A method of encoding a speech signal includes identifying an input signal for a plurality of sound sources; generating a latent signal by encoding the input signal; obtaining a plurality of sound source signals by separating the latent signal for each of the plurality of sound sources; determining a number of bits used for quantization of each of the plurality of sound source signals according to a type of each of the plurality of sound sources; quantizing each of the plurality of sound source signals based on the determined number of bits; and generating a bitstream by combining the plurality of quantized sound source signals.
-
公开(公告)号:US11508386B2
公开(公告)日:2022-11-22
申请号:US16843649
申请日:2020-04-08
Applicant: Electronics and Telecommunications Research Institute , Kwangwoon University Industry-Academic Collaboration Foundation
Inventor: Hochong Park , Seung Kwon Beack , Jongmo Sung , Seong-Hyeon Shin , Mi Suk Lee , Tae Jin Lee , Jin Soo Choi
Abstract: An inventive concept relates to an audio coding method to which CNN-based frequency spectrum recovery is applied. An inventive concept transmits a part of frequency spectral coefficients generated in transform coding to a decoder and the decoder recovers the frequency spectral coefficient not transmitted. Furthermore, the signs of frequency spectral coefficient are transmitted from an encoder to the decoder depending on a sign transmission rule.
-
-
-
-
-
-
-
-
-