Abstract:
Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.
Abstract:
Disclosed herein are systems, methods, and computer readable-media for temporally adaptive media playback. The method for adaptive media playback includes estimating or determining an amount of time between a first event and a second event, selecting media content to fill the estimated amount of time between the first event and the second event, and playing the selected media content possibly at a reasonably different speed to fit the time interval. One embodiment includes events that are destination-based or temporal-based. Another embodiment includes adding, removing, speeding up, or slowing down selected media content in order to fit the estimated amount of time between the first event and the second event or to modify the selected media content to adjust to an updated estimated amount of time. Another embodiment bases selected media content on a user or group profile.
Abstract:
Speaker content generated in an audio conference is selectively and visually represented. A profile for each audience member who participates in the audio conference is obtained. Speaker content spoken during the audio conference is monitored. Different weights are applied to words included in the speaker content according to a parameter of the profile for each of the audience members. A relation between the speaker content to the profile for each of the audience members is determined. Visual representations of the speaker content are presented to selective members among the audience members based on the determined relation.
Abstract:
An interactive conference is supplemented based on terminology content. Terminology content from a plurality of devices connected to the interactive conference is monitored. A set of words from the terminology content is selected. Supplemental media content at an external source is identified based on the selected set of words, and selectively made available to a device connected to the interactive conference.
Abstract:
Disclosed herein are systems, methods, and computer readable-media for temporally adaptive media playback. The method for adaptive media playback includes estimating or determining an amount of time between a first event and a second event, selecting media content to fill the estimated amount of time between the first event and the second event, and playing the selected media content possibly at a reasonably different speed to fit the time interval. One embodiment includes events that are destination-based or temporal-based. Another embodiment includes adding, removing, speeding up, or slowing down selected media content in order to fit the estimated amount of time between the first event and the second event or to modify the selected media content to adjust to an updated estimated amount of time. Another embodiment bases selected media content on a user or group profile.
Abstract:
Disclosed herein are systems, methods, and computer readable-media for adaptive media playback based on destination. The method for adaptive media playback comprises determining one or more destinations, collecting media content that is relevant to or describes the one or more destinations, assembling the media content into a program, and outputting the program. In various embodiments, media content may be advertising, consumer-generated, based on real-time events, based on a schedule, or assembled to fit within an estimated available time. Media content may be assembled using an adaptation engine that selects a plurality of media segments that fit in the estimated available time, orders the plurality of media segments, alters at least one of the plurality of media segments to fit the estimated available time, if necessary, and creates a playlist of selected media content containing the plurality of media segments.
Abstract:
Systems, methods, and computer readable-media for dynamically constructing personalized contextual video programs include extracting video metadata from a video program, extracting component metadata from video components stored in a media object library, extracting viewer preferences, receiving synchronization information about the video program, identifying a video program segment susceptible to inserting a video component, and transmitting the video component to a playback device and instructions detailing how to insert the video component in the video program segment. Video metadata can be extracted in real time. A viewer profile can be based on demographic information and user behavior. The video program and the video component can be combined before transmitting the video component and instructions to the playback device. A video component can be selected based on which advertiser offers to pay the most. The transmitted video component and set of instructions can be stored as a construction list for future use.
Abstract:
Disclosed herein are systems, methods, and computer readable-media for rich media annotation, the method comprising receiving a first recorded media content, receiving at least one audio annotation about the first recorded media, extracting metadata from the at least one of audio annotation, and associating all or part of the metadata with the first recorded media content. Additional data elements may also be associated with the first recorded media content. Where the audio annotation is a telephone conversation, the recorded media content may be captured via the telephone. The recorded media content, audio annotations, and/or metadata may be stored in a central repository which may be modifiable. Speech characteristics such as prosody may be analyzed to extract additional metadata. In one aspect, a specially trained grammar identifies and recognizes metadata.
Abstract:
Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.
Abstract:
Disclosed herein are systems, methods, and computer readable-media for adaptive media playback based on destination. The method for adaptive media playback comprises determining one or more destinations, collecting media content that is relevant to or describes the one or more destinations, assembling the media content into a program, and outputting the program. In various embodiments, media content may be advertising, consumer-generated, based on real-time events, based on a schedule, or assembled to fit within an estimated available time. Media content may be assembled using an adaptation engine that selects a plurality of media segments that fit in the estimated available time, orders the plurality of media segments, alters at least one of the plurality of media segments to fit the estimated available time, if necessary, and creates a playlist of selected media content containing the plurality of media segments.