Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improved pronunciation. One of the methods includes receiving data that represents an audible pronunciation of the name of an individual from a user device. The method includes identifying one or more other users that are members of a social circle that the individual is a member. The method includes identifying one or more devices associated with the other users. The method also includes providing information that identifies the individual and the data representing the audible pronunciation to the one or more identified devices.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for building language models. One of the methods includes identifying a first group of one or more users associated with a user in a social network. The method includes identifying first linguistic information associated with the first group. The method includes generating a first language model based on the first linguistic information. The method includes identifying a second group of one or more users associated with the user. The method includes identifying second linguistic information associated with the second group. The method includes generating a second language model based on the second linguistic information. The method includes associating the first language model and the second language model with the user.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for acoustic model generation. One of the methods includes identifying one or more demographic characteristics for a user of a social networking site. The method includes receiving speech data from the user, the speech data associated with a user device. The method includes storing the speech data associated with demographic characteristics of the user and the user device.
Abstract:
The subject matter of this specification can be implemented in a computer-implemented method that includes receiving utterances and transcripts thereof. The method includes analyzing the utterances and transcripts to determine certain attributes, such as distances between prosodic contours for pairs of utterances. A model can be generated that can be used to estimate a distance between a determined prosodic contour for a received utterance and an unknown prosodic contour for a synthesized utterance when given a distance between attributes for text associated with the received utterance and the synthesized utterance.
Abstract:
A computer-implemented technique includes receiving, at a computing device including one or more processors, a touch input from a user. The touch input includes (i) a spot input indicating a request to provide a speech input to the computing device followed by (ii) a slide input indicating a desired language for automatic speech recognition of the speech input. The technique includes receiving, at the computing device, the speech input from the user. The technique includes obtaining, at the computing device, one or more recognized characters resulting from automatic speech recognition of the speech input using the desired language. The technique also includes outputting, at the computing device, the one or more recognized characters.