Abstract:
A system and method for multimodal classification of user characteristics is described. The method comprises receiving audio and other inputs, extracting fundamental frequency information from the audio input, extracting other feature information from the video input, classifying the fundamental frequency information, textual information and video feature information using the multimodal neural network.
Abstract:
A system and method for conditioning execution of a control function on a determination of whether or not a person's attention is directed toward a predetermined device. The method involves acquiring data concerning the activity of a person who is in the proximity of the device, the data being in the form of one or more temporal samples. One or more of the temporal samples is then analyzed to determine if the person's activity during the time of the analyzed samples indicates that the person's attention is not directed toward the device. The results of the determination are used to ascertain whether or not the control function should be performed.
Abstract:
The present disclosure is related to unmanned aerial vehicles or drones that have a capability of quickly swapping batteries. This may be accomplished even as the drone continues to fly. A drone consistent with the present disclosure may drop one battery and pickup another using an attachment mechanism. Attachment mechanisms of the present disclosure may include electro-magnets, mechanical actuators, pins, or hooks. Systems consistent with the present disclosure may also include locations where replacement batteries may be provided to aircraft via actuation devices coupled to a physical location.
Abstract:
Method for providing image of HMD user to a non-HMD user includes, receiving a first image of a user including the user's facial features captured by an external camera when the user is not wearing a head mounted display (HMD). A second image capturing a portion of the facial features of the user when the user is wearing the HMD is received. An image overlay data is generated by mapping contours of facial features captured in the second image with contours of corresponding facial features captured in the first image. The image overlay data is forwarded to the HMD for rendering on a second display screen that is mounted on a front face of the HMD.
Abstract:
A virtual object can be controlled using one or more touch interfaces. A location for a first touch input can be determined on a first touch interface. A location for a second touch input can be determined on a second touch interface. A three-dimensional segment can be generated using the location of the first touch input, the location of the second touch input, and a pre-determined spatial relationship between the first touch interface and the second touch interface. The virtual object can be manipulated using the three-dimensional segment as a control input.
Abstract:
A domain adaptation module is used to optimize a first domain derived from a second domain using respective outputs from respective parallel hidden layers of the domains.
Abstract:
A system, method, and computer program product for hierarchical categorization of sound comprising one or more neural networks implemented on one or more processors. The one or more neural networks are configured to categorize a sound into a two or more tiered hierarchical coarse categorization and a finest level categorization in the hierarchy. The categorization sound may be used to search a database for similar or contextually related sounds.
Abstract:
In sequence level prediction of a sequence of frames of high dimensional data one or more affective labels are provided at the end of the sequence. Each label pertains to the entire sequence of frames. An action is taken with an agent controlled by a machine learning algorithm for a current frame of the sequence at a current time step. An output of the action represents affective label prediction for the frame at the current time step. A pool of actions taken up until the current time step including the action taken with the agent is transformed into a predicted affective history for a subsequent time step. A reward is generated on predicted actions up to the current time step by comparing the predicted actions against corresponding annotated affective labels.
Abstract:
Methods and systems are provided for providing real world assistance by a robot utility and interface device (RUID) are provided. A method provides for identifying a position of a user in a physical environment and a surface within the physical environment for projecting an interactive interface. The method also provides for moving to a location within the physical environment based on the position of the user and the surface for projecting the interactive interface. Moreover, the method provides for capturing a plurality of images of the interactive interface while the interactive interface is being interacted with by the use and for determining a selection of an input option made by the user.
Abstract:
For image captioning such as for computer game images or other images, bottom-up attention is combined with top-down attention to provide a multi-level residual attention-based image captioning model. A residual attention mechanism is first applied in the Faster R-CNN network to learn better feature representations for each region by taking spatial information into consideration. In the image captioning network, taking the extracted regional features as input, a second residual attention network is implemented to fuse the regional features attentionally for subsequent caption generation.