Abstract:
Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
Abstract:
An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.
Abstract:
An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.
Abstract:
Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.