摘要:
Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services.
摘要:
A system facilitates managing one or more devices utilized for communicating data within a telepresence session. A telepresence session can be initiated within a communication framework that includes a first user and one or more second users. In response to determining a temporary absence of the first user from the telepresence session, a recordation of the telepresence session is initialized to enable a playback of a portion or a summary of the telepresence session that the first user has missed.
摘要:
Described is a technology by which video, which may be relatively high-resolution video, is efficiently processed to determine whether the video contains a specified action. The video corresponds to a spatial-temporal volume. The volume is searched with a top-k search that finds a plurality of the most likely sub-volumes simultaneously in a single search round. The score volumes of larger spatial resolution videos may be down-sampled into lower-resolution score volumes prior to searching.
摘要:
In some implementations, invisible light is emitted toward a subject being imaged in a low-light environment. A camera having a first color image sensor captures an image of the subject. Image processing is used to correct distortion in the image caused by the invisible light, and an augmented color image is output.
摘要:
Techniques and technologies for tracking a face with a plurality of cameras wherein a geometry between the cameras is initially unknown. One disclosed method includes detecting a head with two of the cameras and registering a head model with the image of the head (as detected by one of the cameras). The method also includes back projecting the other detected face image to the head model and determining a head pose from the back-projected head image. Furthermore, the determined geometry is used to track the face with at least one of the cameras.
摘要:
Described is labeling a video session with metadata representing a recognized person or object, such as to identify a person corresponding to a recognized face when that face is being shown during the video session. The identification may be made by overlaying text on the video session, e.g., the person's name and/or other related information. Facial recognition and/or other (e.g., voice) recognition may be used to identify a person. The facial recognition process may be made more efficient by using known narrowing information, such as calendar information that indicates who the invitees are to a meeting that is being shown in the video session.
摘要:
Gaze tracking or other interest indications are used during a video conference to determine one or more audio sources that are of interest to one or more participants to the video conference, such as by determining a conversation from among multiple conversations that a subset of participants are participating in or listening to, for enhancing the audio experience of one or more of the participants.
摘要:
The computer-readable media provides improved procedures to estimate head motion between two images of a face. Locations of a number of distinct facial features are determined in two images. The locations are converted into as a set of physical face parameters based on the symmetry of the identified distinct facial features. An estimation objective function is determined by: (a) estimating each of the set of physical parameters, (b) estimating a first head pose transform corresponding to the first image, and (c) estimating a second head pose transform corresponding to the second image. The motion is estimated between the two images based on the set of physical face parameters by multiplying each term of the estimation objective function by a weighted contribution factor based on the confidence of data corresponding to the estimation objective function.
摘要:
A system and method for automatically determining if a remote client is a human or a computer. A set of HIP design guidelines which are important to ensure the security and usability of a HIP system are described. Furthermore, one embodiment of this new HIP system and method is based on human face and facial feature detection. Because human face is the most familiar object to all human users the embodiment of the invention employing a face is possibly the most universal HIP system so far.
摘要:
A system that facilitates blind source separation in a distributed microphone meeting environment for improved teleconferencing. Input sensor (e.g., microphone) signals are transformed to the frequency-domain and independent component analysis is applied to compute estimates of frequency-domain processing matrices. Modified permutations of the processing matrices are obtained based upon a maximum magnitude based de-permutation scheme. Estimates of the plurality of source signals are provided based upon the modified frequency-domain processing matrices and input sensor signals.Optionally, segments during which the set of active sources is a subset of the set of all sources can be exploited to compute more accurate estimates of frequency-domain mixing matrices. Source activity detection can be applied to determine which speaker(s), if any, are active. Thereafter, a least squares post-processing of the frequency-domain independent components analysis outputs can be employed to adjust the estimates of the source signals based on source inactivity.