Abstract:
A method for generating an augmented reality module by an apparatus is described. Processing circuitry of the apparatus obtains preset third party software development interface information, the third party software development interface information being uniformly encapsulated with an AR core engine and an AR rendering engine, and a system parameter and pose information that are associated with the AR core engine being passed into the AR rendering engine. The processing circuitry generates, according to the third party software development interface information and a document configured corresponding to the third party software development interface information, an AR module of a mobile client, the correspondingly configured document comprising interface use information of the AR core engine and the AR rendering engine.
Abstract:
A biometric-based authentication method, an apparatus, and a system are described. The method includes: receiving a biometric image to be authenticated sent from a client; performing feature extraction to the biometric image to be authenticated to obtain a biometric template to be authenticated; comparing the biometric template to be authenticated with a locally-stored biometric template; and returning an authentication result. In this case, the feature extraction process may be implemented at a cloud server side, as such, the complexity of the client may be reduced, the expandability of the client may be increased, a limitation that the biometric recognition may only be implemented on the client may be eliminated, and diversified utilization may be supported.
Abstract:
A method of interacting with an audience of multimedia content is disclosed. The method includes receiving, from a client device, data associated with a piece of multimedia content from a group of pieces of multimedia content that is presented to a user of the client device. The data is obtained at the client device in response to an instruction provided to the client device by the user. The method includes determining, based on the data, an identifier of the piece of multimedia content from a set of identifiers, each of which identifies a piece of multimedia content from the group of pieces of multimedia content. The method includes retrieving, based on the identifier of the piece of multimedia content, interactive content associated with the piece of multimedia content. The method includes sending the interactive content to the client device such that the client device presents the interactive content to the user.
Abstract:
A method is performed at a device having one or more processors and memory. The device establishes a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data. The device establishes a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features. Using the second-level DNN model, registers a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user. The device performs speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user.
Abstract:
Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on a linear combination associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.
Abstract:
A parallel data processing method based on multiple graphic processing units (GPUs) is provided, including: creating, in a central processing unit (CPU), a plurality of worker threads for controlling a plurality of worker groups respectively, the worker groups including a plurality of GPUs; binding each worker thread to a corresponding GPU; loading one batch of training data from a nonvolatile memory to a GPU video memory corresponding to one worker group; transmitting, between a plurality of GPUs corresponding to one worker group, data required by data processing performed by the GPUs through peer to peer; and controlling the plurality of GPUs to perform data processing in parallel through the worker threads.
Abstract:
A method of testing and monitoring a real-time streaming media recognition service provider is performed at a computer system. The computer system obtains a streaming media signal source, selects a testing sample from the streaming media signal source, records characteristics of the testing sample, and obtains an expected output according to the characteristics of the testing sample. Next, the computer system converts the testing sample into a digital streaming format preset by the service provider and initiates a media recognition request according to the testing sample in the digital streaming format to the service provider. After receiving a media recognition result of the testing sample returned by the service provider according to the media recognition request, the computer system compares the media recognition result with the expected output and indicates whether the service provider is normal in accordance with the comparison result.
Abstract:
A method, device and system for providing a language service are disclosed. In some embodiments, the method is performed at a computer system having one or more processors and memory for storing programs to be executed by the one or more processors. The method includes receiving a first message from a client device. The method includes determining if the first message is in a first language or a second language different than the first language. The method includes translating the first message into a second message in the second language if the first message is in the first language. The method includes, alternatively, generating a third message in the second language if the first message is in the second language, where the third message includes a conversational response to the first message. The method further includes returning one of the second message and the third message to the client device.
Abstract:
A method includes: acquiring data samples; performing categorized sentence mining in the acquired data samples to obtain categorized training samples for multiple categories; building a text classifier based on the categorized training samples; classifying the data samples using the text classifier to obtain a class vocabulary and a corpus for each category; mining the corpus for each category according to the class vocabulary for the category to obtain a respective set of high-frequency language templates; training on the templates for each category to obtain a template-based language model for the category; training on the corpus for each category to obtain a class-based language model for the category; training on the class vocabulary for each category to obtain a lexicon-based language model for the category; building a speech decoder according to an acoustic model, the class-based language model and the lexicon-based language model for any given field, and the data samples.
Abstract:
Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments: one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on at least information associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.