摘要:
Speech parameters (P.sub.h and P.sub.l) are derived for consonant classification and recognition by separating a speech signal into Low and High frequency bands, then in each band obtaining the time first-derivative, from which the min-max differences (power dip) are obtained (P.sub.h and P.sub.l). The distribution of P.sub.h and P.sub.l in a two-dimensional plot for a discriminant diagram classifies the consonant phoneme.
摘要:
An inter-frame similarity between an input voice and a standard patterned word is calculated for each of frames and for each of standard patterned words, and a posterior probability similarity is produced by subtracting a constant value from each of the inter-frame similarities. The constant value is determined by analyzing voice data obtained from specified persons to set the posterior probability similarities to positive values when a word existing in the input voice matches with the standard patterned word and to set the posterior probability similarities to negative values when a word existing in the input voice does not match with the standard patterned word. Thereafter, an accumulated similarity having an accumulated value obtained by accumulating values of the posterior probability similarities according to a continuous dynamic programming matching operation for the frames of the input voice is calculated for each of the standard patterned words. Thereafter, a particular standard patterned word relating to an accumulated similarity having a maximum value among the accumulated similarities is output as a recognized word of the input voice.
摘要:
A method of speech recognition includes the steps of analyzing input speech every frame and deriving feature parameters from the input speech, generating an input vector from the feature parameters of a plurality of frames, and periodically calculating partial distances between the input vector and partial standard patterns while shifting the frame one by one. Standard patterns correspond to recognition-object words respectively, and each of the standard patterns is composed of the partial standard patterns which represent parts of the corresponding recognition-object word respectively. The partial distances are accumulated into distances between the input speech and the standard patterns. The distances correspond to the recognition-object words respectively. The distances are compared with each other, and a minimum distance of the distances is selected when the input speech ends. One of the recognition-object words which corresponds to the minimum distance is decided to be a recognition result.
摘要:
A set of "m" feature parameters is generated every frame from reference speech which is spoken by at least one speaker and which represents recognition-object words, where "m" denotes a preset integer. A set of "n" types of standard patterns is previously generated on the basis of speech data of a plurality of speakers, where "n" denotes a preset integer. Matching between the feature parameters of the reference speech and each of the standard patterns is executed to generate a vector of "n" reference similarities between the feature parameters of the reference speech and each of the standard patterns every frame. The reference similarity vectors of respective frames are arranged into temporal sequences corresponding to the recognition-object words respectively. The reference similarity vector sequences are previously registered as dictionary similarity vector sequences. Input speech to be recognized is analyzed to generate "m" feature parameters from the input speech. Matching between the feature parameters of the input speech and the standard patterns is executed to generate a vector of "n" input-speech similarities between the feature parameters of the input speech and the standard patterns every frame. The input-speech similarity vectors of respective frames are arranged into a temporal sequence. The input-speech similarity vector sequence is collated with the dictionary similarity vector sequences to recognize the input speech.
摘要:
Disclosed is a speech recognition apparatus comprising: a speech analysis portion for extracting parameters necessary for determination of spoken words; a speech period detecting portion for extracting one or more combinations of speech periods using the parameters; and a structure analysis portion for detecting feature points indicative of phoneme structure of each word and for determining a word through computation of similarity to proposed words in accordance with the presence and absence of the feature points. Therefore, erroneous recognition due to noise introduction or the like can be reduced by detecting one or more combinations of proposed speech periods by the speech period detecting portion. By extracting only necessary number of extracting points, which contribute to the distinguishment between words, with reference to analysis procedure provided for each word, the sharpness of determination is bettered. More stable operation than conventional apparatus has been achieved in connection with time base expansion/compression. Small numbers of parameters obtained through speech analysis are used to reduce the amount of computation, while the above-mentioned parameters are stable against difference in phenemes due to difference in speakers.
摘要:
Apparatus for speech recognition, having each phoneme as a fundamental recognition unit, recognizes input speech by discriminating phonemes in the input speech. The apparatus comprises a memory for storing phoneme standard patterns of phonemes or phoneme groups; a spectrum analyzer for obtaining parameters indicative of the input speech signal spectrum; a statistical distance measure similarity calculator calculates the degree of similarity between the output of the spectrum analyzer and standard patterns stored in the memory; a segmentation portion for segmenting by using time-dependent low- and high-frequency power variations of the input speech signal and results from the similarity calculator; and a phoneme discriminator for recognizing phonemes by using the results from the similarity calculator.
摘要:
Linear prediction coefficients of a speech signal including unknown words are derived for each of successive periodic frame intervals. For every frame over the duration of an individual phoneme of the speech signal, the degree of similarity of stored coefficients of known words and derived coefficients of the unknown words are calculated so that at the end of the individual phonemes, the degree of similarity is calculated. Phoneme segmentation data are derived in response to the speech signal and combined with the calculated degree of similarity over the individual phoneme to derive phoneme strings of the speech signal. The derived and stored phoneme strings are compared to indicate the words stored in a word dictionary having the greatest similarity with the derived phoneme strings.
摘要:
An input signal is applied to a clamping circuit where the level of the input signal is clamped at a desired base line voltage in response to a clamping control signal. The clamping control signal is a pulse train signal, and pulses are applied to the clamping circuit only when the level of the output signal of the clamping circuit is within a given range. Therefore, when the level rising rate or speed is low, the level of the input signal is intermittently clamped so that the output signal level is maintained close to the base line voltage. When the level rising rate exceeds a given value, clamping is not performed so that the output level follows the input level. In order to see whether the level of the output signal is within the given range or not, a reference voltage which is higher than the base line voltage is used when shaping a two-level signal. If the input signal is of three-level, another reference voltage, which is lower than the base line voltage, is additionally used.
摘要:
In a voice recognition method, a d-by-J demensioned reference voice pattern is prepared for each target word, when J denotes a predetermined number of frames and d denotes a predetermined number of characterizing parameters per frame. A spoken input word is partitioned between its start and end points into J frames, and d characteristic parameters are extracted for each frame to form a d-by-J demensioned input time-series vector. The resemblance between the input vector and each of the reference voice patterns is then calculated using a statistical distance scale, and the spoken word is identified with the reference pattern providing the highest resemblance. The method requires fewer calculations and yet attains a high recognition rate through the normalization of the input voice word for both spectrum and time.
摘要:
An electronic engraving and recording system comprising a television camera for picking up an image of an object and converting this image into an electrical signal, means for generating control signals on the basis of the synchronizing signal used in the television camera, memory means for storing the electrical signal under control of the control signals, means for engraving and recording the image according to the signal read out from the memory means under control of the control signals, and monitoring display means for displaying the visible image of the object in response to the application of the signal read out from the memory means. An image of an object can be simply engraved and recorded on a card within a short period of time without requiring any photographic original of the object.This invention relates to electronic engraving and recording systems, and more particularly to a system of the kind above described which can electronically engrave and record an image of an object on a sheet such as a card of suitable material which has a flat and smooth surface and is highly endurable against wear.Various attempts have heretofore been made for the identification of the true user of an identification card such as a credit card, ID card, bank card, cash dispenser card, oil card, key card, consultation ticket, communication ticket, or license card. For instance, in one of the prior attempts, a photographic print of the face or other features of a user is affixed to a base plate of a card. In another prior art attempt, a photographic print of the face or other features of a user is utilized as an original to produce a printing plate which is used to print a picture of the face or other features of the user on a base plate of a card.These methods have however had various defects. For example, the former method of affixing a photographic picture of, for example, the face of a user is unsatisfactory from the standpoint of preventing possible forgery. That is, if the card were stolen or lost, the card may be used illicitly by replacing the photograph affixed to the card and the true user of the card may suffer from unexpected damage. Another defect of this prior art method resides in the fact that the thickness of the photograph-bearing portion of the card is increased by the amount corresponding to the thickness of the photograph so that when, for example, the card is magnetically verified by a suitable mechanical apparatus, an inconvenience is frequently encountered in the mechanical handling of the card. In the latter method which resorts to printing, an individual printing plate must be prepared for producing a single card. Thus, this latter method is also defective in that the manufacturing cost of such a card increases considerably. Further, either of these prior art methods is defective in that the photographic picture or printed picture of the face of the user manifested on the base plate of the card is inferior in durability, and therefore, it tends to be worn away during prolonged use of the card to such an extent that the identifying function thereof is finally lost making it difficult to certify the identity of the user during the valid term of the card.It is therefore an object of the present invention to provide an electronic engraving and recording system which eliminates the necessity for preparing a photographic original of an object such as a person.Another object of the present invention is to provide an electronic engraving and recording system in which a visible image, which is a magnified image of an image to be engraved on a card and having exactly the same magnification with respect to the length and width is displayed on a visible image display means so that such image can be easily monitored.The electronic engraving and recording system according to the present invention includes image pickup means for picking up an image of an object such as a person whose picture is to be engraved on a card, signal storage means for storing a digital signal obtained by converting an analog signal representative of a still picture of the image picked up by the pickup means, means for reading out the digital signal stored in the storage means and converting same into an analog signal, visible image display means or monitoring means for displaying a visible image in response to the application of the analog signal from the D-A converting means, and engraving means for engraving the image corresponding to the still picture of the object on a card which is highly endurable against wear.