Abstract:
A videoconference apparatus and method coordinates a stationary view obtained with a stationary camera to an adjustable view obtained with an adjustable camera. The stationary camera can be a web camera, while the adjustable camera can be a pan-tilt-zoom camera. As the stationary camera obtains video, participants are detected and localized by establishing a static perimeter around a participant in which no motion is detected. Thereafter, if no motion is detected in the perimeter, any personage objects such as head, face, or shoulders which are detected in the region bounded by the perimeter are determined to correspond to the participant.
Abstract:
A videoconference apparatus and method coordinates a stationary view obtained with a stationary camera to an adjustable view obtained with an adjustable camera. The stationary camera can be a web camera, while the adjustable camera can be a pan-tilt-zoom camera. As the stationary camera obtains video, participants are detected and localized by establishing a static perimeter around a participant in which no motion is detected. Thereafter, if no motion is detected in the perimeter, any personage objects such as head, face, or shoulders which are detected in the region bounded by the perimeter are determined to correspond to the participant.
Abstract:
A videoconferencing apparatus automatically tracks speakers in a room and dynamically switches between a controlled, people-view camera and a fixed, room-view camera. When no one is speaking, the apparatus shows the room view to the far-end. When there is a dominant speaker in the room, the apparatus directs the people-view camera at the dominant speaker and switches from the room-view camera to the people-view camera. When there is a new speaker in the room, the apparatus switches to the room-view camera first, directs the people-view camera at the new speaker, and then switches to the people-view camera directed at the new speaker. When there are two near-end speakers engaged in a conversation, the apparatus tracks and zooms-in the people-view camera so that both speakers are in view.
Abstract:
Methods and systems for cancellation of table noise in a speaker system used for video or audio conferencing are disclosed. Table noise is cancelled by using a vertical microphone array to distinguish the tilt angle of sound received by a microphone. If the sound is close to horizontal, the audio is muted. If the sound is above a given angle from horizontal, it is not muted, as this indicates a person speaking. This eliminates paper rustling, keyboard clicks and the like.
Abstract:
A videoconference apparatus at a first location detects audio from a location and determines whether the sound should be included in an audio-video stream sent to a second location, or excluded as an interfering noise. Determining whether to include the audio involves using a face detector to see if there is a face at the source of the sound. If a face is present, the audio data from the location will be transmitted to the second location. If a face is not present, additional motion checks are performed to determine whether the sound corresponds to a person talking, (such as a presenter at a meeting), or whether the sound is instead unwanted noise.
Abstract:
A videoconference apparatus at a first location detects audio from a location and determines whether the sound should be included in an audio-video stream sent to a second location, or excluded as an interfering noise. Determining whether to include the audio involves using a face detector to see if there is a face at the source of the sound. If a face is present, the audio data from the location will be transmitted to the second location. If a face is not present, additional motion checks are performed to determine whether the sound corresponds to a person talking, (such as a presenter at a meeting), or whether the sound is instead unwanted noise.
Abstract:
A system for ensuring that the best available view of a person's face is included in a video stream when the person's face is being captured by multiple cameras at multiple angles at a first endpoint. The system uses one or more microphone arrays to capture direct-reverberant ratio information corresponding to the views, and determines which view most closely matches a view of the person looking directly at the camera, thereby improving the experience for viewers at a second endpoint.
Abstract:
A videoconferencing endpoint includes at least one processor a number of microphones and at least one camera. The endpoint can receive audio information and visual motion information during a teleconferencing session. The audio information includes one or more angles with respect to the microphone from a location of a teleconferencing session. The audio information is evaluated automatically to determine at least one candidate angle corresponding to a possible location of an active talker. The candidate angle can be analyzed further with respect to the motion information to determine whether the candidate angle correctly corresponds to person who is speaking during the teleconferencing session.
Abstract:
A videoconferencing system has a plurality of displays arranged side-by-side. Top loudspeakers are arranged adjacent the tops of the displays, and bottom loudspeakers are arranged adjacent the bottoms of the displays. A control unit operatively coupled to the displays and the loudspeakers routes video to each of the displays and routes audio corresponding to each display to any of the top and bottom loudspeakers arranged adjacent the display. Thus, the top and bottom loudspeakers form a vertical pair of loudspeakers that output the corresponding audio for its respective display. In this way, the audio for the video of a given display is perceived by participants to originate from the center of the given display. If one of the loudspeakers is not provided, gain setting and mixing between adjacent sets of loudspeakers can produce a virtual loudspeaker for the one that is missing.
Abstract:
A system for ensuring that the best available view of a person's face is included in a video stream when the person's face is being captured by multiple cameras at multiple angles at a first endpoint. The system uses one or more microphone arrays to capture direct-reverberant ratio information corresponding to the views, and determines which view most closely matches a view of the person looking directly at the camera, thereby improving the experience for viewers at a second endpoint.