摘要:
This application relates to a method for matching music with a video performed by a computer device, and a storage medium. The method includes: determining a cut speed of a video; determining a long-time audio speed corresponding to each of a plurality of pieces of candidate music according to a high-scale point and a music duration of the candidate music; selecting matched music from the pieces of candidate music according to the cut speed and the corresponding long-time audio speeds; determining, according to a video duration of the video and a high-scale point corresponding to the matched music, a short-time audio speed corresponding to each music clip in the matched music; and determining a target music clip in the matched music according to the cut speed of the video and the corresponding short-time audio speed, and synthesizing the target music clip and the video to obtain a target video.
摘要:
A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.
摘要:
A method is described comprising receiving a stem signal and an audio mix signal, wherein the audio mix signal comprises information of the stem signal. The method includes applying a first transform to the stem signal to provide a first stem spectrum, applying a second transform to the stem signal to provide a second stem spectrum, generating a plurality of mix signals using the audio mix signal, applying a first transform to each mix signal of the plurality of mix signals to provide a corresponding first mix signal spectrum, applying a second transform to each mix signal of the plurality of mix signals to provide a corresponding second mix signal spectrum, and using information of the first stem spectrum, the second stem spectrum, a first mix signal spectrum, or a second mix signal spectrum to detect the information of the stem signal in the audio mix signal.
摘要:
A method is described comprising receiving a stem signal and an audio mix signal, wherein the audio mix signal comprises information of the stem signal. The method includes applying a first transform to the stem signal to provide a first stem spectrum, applying a second transform to the stem signal to provide a second stem spectrum, generating a plurality of mix signals using the audio mix signal, applying a first transform to each mix signal of the plurality of mix signals to provide a corresponding first mix signal spectrum, applying a second transform to each mix signal of the plurality of mix signals to provide a corresponding second mix signal spectrum, and using information of the first stem spectrum, the second stem spectrum, a first mix signal spectrum, or a second mix signal spectrum to detect the information of the stem signal in the audio mix signal.
摘要:
Systems and techniques are provided for finding differences in nearly-identical audio recordings. A first version of an audio recording may be received. A second version of the audio recording may be received. A difference between the first version of the audio recording and the second version of the audio recording may be determined using time domain analysis and frequency domain analysis. The difference may be stored in a difference set. The difference set may allow the first version of the audio recording to be distinguished from the second version of the audio recording. The audio recording may be a music track. The first version of the audio recording may be an explicit version of the music track. The second version of the audio recording may be an edited version of the music track.
摘要:
Audio advertisements for music services can be created and played to minimize any perceived discontinuity by a listener from a preceding song. Specifically, a voice-over content item (e.g., a spoken advertisement produced without music) may combined with a non-song musical content item (e.g., backing music specifically produced for advertisements) to create an audio advertisement to be played after a song on a streaming music service. The non-song musical content item may be selected based on its similarity to the preceding song (e.g., genre, tempo, or harmony) and/or a musical preference of a user (e.g., a preferred genre obtained through a user's music service account and/or selected by the user). The genre of both the preceding song and a musical preference of a user may be used to select a non-song musical content item genre by a weighting that determines the probability of the genre used for an advertisement.
摘要:
A user inputs, as a query pattern, a desired search-object rhythm pattern using a control, corresponding to a desired one of a plurality of performance parts constituting a performance data set (automatic accompaniment data set), in a rhythm input device. An input rhythm pattern storage section stores the input rhythm pattern (query pattern) into a RAM on the basis of a clock signal output from a bar line clock output section and input trigger data. A part identification section identifies a search-object performance part corresponding to the user-operated control. For the identified performance part, a rhythm pattern search section searches an automatic accompaniment database for an automatic accompaniment data set including a rhythm pattern that matches, i.e. has the highest similarity to, the input rhythm pattern (query pattern).
摘要:
A match score provides a semantically-meaningful quantification of the aural similarity of two chromae from two corresponding audio sequences. The match score can be applied to the chroma pairs of two corresponding audio sequences, and is independent of the lengths of the sequences, thereby permitting comparisons of matches across subsequences of different length. Accordingly, a single cutoff match score to identify “good” audio subsequence matches can be determined and has both good precision and good recall metrics. A function for determining the match score is determined by establishing a function PM indicating probabilities that chroma correspondence scores indicate semantic correspondences, and a function PR indicating probabilities that chroma correspondence scores indicate random correspondences, repeatedly updating PM and the match function based on existing values of PM and the match function as applied to audio subsequences with known semantic correspondences.
摘要:
It is inter alia disclosed a method comprising: determining a divergence measure between a statistical distribution of audio features of a first audio track and a statistical distribution of audio features of at least one further audio track; determining a divergence measure threshold value from at least the divergence measure between the statistical distribution of audio features of a first audio track and the statistical distribution of audio features of the at least one further audio track; and comparing the divergence measure with the divergence measure threshold value.
摘要:
The invention concerns a method for generating a signature of a musical audio signal of a given duration, the method comprising the following steps: —modelling (104) the musical audio signal to obtain, for each frequency band of a set of n frequency bands, a diagram representing the energy of the audio signal for the frequency band, on the basis of the time during said given duration; —determining (103) musical transition times tk of the audio signal during the given duration; —associating (105) each musical transition time tk with an item of local information comprising a vector of n values representative, respectively, of the energy of the audio signal in each of the n diagrams obtained between musical transition time tk and a subsequent musical transition time tk+1 and/or a vector of n values representative, respectively, of the energy of the audio signal in each of the n diagrams obtained between musical transition time tk and a preceding musical transition time tk−1; —determining (106), on the basis of the local information associated with each musical transition time tk, a key associated with the musical transition time, the determined keys forming a first set of keys of the audio signal; —generating (107) a signature of the musical audio signal comprising pairs of keys from the first set of keys and associated musical transition times tk.