Abstract:
A data processing method for acoustic event includes: establishing a simulated acoustic frequency event module, a data capturing module, and a sound application decision module in a software manner, setting a simulated hardware parameter to the simulated acoustic frequency event module, inputting a sound signal to a frequency filtering module of the simulated acoustic frequency event module, and obtaining metadata from a frequency event quantizer of the simulated acoustic frequency event module, dividing each of the metadata into multiple frames according to a time interval by the data capturing module, accumulating an event number of each frame by the data capturing module, setting a label of each frame according to the event number, storing these frames, the event number and the label in a database, and training a decision model by the sound application decision module according to the database and a sound application.
Abstract:
Provided are a voice activity detection method and apparatus, an electronic device and a storage medium, which relate to the technical field of voice processing, for example, to the technical field of artificial intelligence and deep learning. The specific implementation solution is described below. A first audio signal is acquired, and a frequency domain feature of the first audio signal is extracted; and the frequency domain feature of the first audio signal is input into a voice activity detection model, and a voice presence detection result output by the voice activity detection model is obtained, where the voice activity detection model is configured to detect whether voice is present in the first audio signal.
Abstract:
Methods and systems for providing consistency in noise reduction during speech and non-speech periods are provided. First and second signals are received. The first signal includes at least a voice component. The second signal includes at least the voice component modified by human tissue of a user. First and second weights may be assigned per subband to the first and second signals, respectively. The first and second signals are processed to obtain respective first and second full-band power estimates. During periods when the user's speech is not present, the first weight and the second weight are adjusted based at least partially on the first full-band power estimate and the second full-band power estimate. The first and second signals are blended based on the adjusted weights to generate an enhanced voice signal. The second signal may be aligned with the first signal prior to the blending.
Abstract:
The invention relates to audio signal processing. More specifically, the invention relates to enhancing multichannel audio, such as television audio, by applying a gain to the audio that has been smoothed between portions of the audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.
Abstract:
The various implementations described enable voice activity detection and/or pitch estimation for speech signal processing in, for example and without limitation, hearing aids, speech recognition and interpretation software, telephony, and various applications for smartphones and/or wearable devices. In particular, some implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by determining a voice activity indicator value that is a normalized function of signal amplitudes associated with at least two sets of spectral locations associated with a candidate pitch. In some implementations, voice activity is considered detected when the voice activity indicator value breaches a threshold value. Additionally and/or alternatively, in some implementations, analysis of the audible signal provides a pitch estimate of detectable voice activity.
Abstract:
The invention relates to audio signal processing. More specifically, the invention relates to enhancing multichannel audio, such as television audio, by applying a gain to the audio that has been smoothed between portions of the audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.
Abstract:
There is provided a sound processing apparatus including a sound separation unit that separates an input sound into a plurality of sounds caused by a plurality of sound sources, a sound type estimation unit that estimates sound types of the plurality of sounds separated by the sound separation unit, a mixing ratio calculation unit that calculates a mixing ratio of each sound in accordance with the sound type estimated by the sound type estimation unit, and a sound mixing unit that mixes the plurality of sounds separated by the sound separation unit in the mixing ratio calculated by the mixing ratio calculation unit.
Abstract:
Acoustic Voice Activity Detection (AVAD) methods and systems are described. The AVAD methods and systems, including corresponding algorithms or programs, use microphones to generate virtual directional microphones which have very similar noise responses and very dissimilar speech responses. The ratio of the energies of the virtual microphones is then calculated over a given window size and the ratio can then be used with a variety of methods to generate a VAD signal. The virtual microphones can be constructed using either an adaptive or a fixed filter.
Abstract:
A harmonic structure acoustic signal detection device not depending on the level fluctuation of the input signal including: an FFT unit which performs FFT on an input signal and calculates a power spectrum component for each frame; a harmonic structure extraction unit which leaves only a harmonic structure from the power spectrum component; a voiced feature evaluation unit which evaluates correlation between the frames of harmonic structures extracted by the harmonic structure extraction unit, thereby evaluates whether or not the segment is a vowel segment, and extracts the voiced segment; and a speech segment determination unit which determines a speech segment according to the continuity and durability of the output of the voiced feature evaluation unit.
Abstract:
Input audio signal is divided on a block-by-block basis. Frequency domain conversion is done on each of the blocks. Voiced bands of the frequency domain data for one of the blocks are searched for a voiced band B.sub.VH with the highest center frequency if it is decided that there are one or more shift points of voiced (V)/unvoiced (UV) decision data of all the bands. The number N.sub.V of voiced bands having center frequency less than that of the band B.sub.VH is found, so as to decide whether a proportion of the voiced bands is equal to or higher than a predetermined threshold N.sub.th, thereby deciding one V/UV boundary point. Thus, it is possible to replace the V/UV decision data for each band by information on one demarcation in all bands, thereby reducing data volume and bit rate.