Abstract:
Systems and methods for bird's eye view (BEV) segmentation are provided. In one embodiment, a method includes receiving an input image from an image sensor on an agent. The input image is a perspective space image defined relative to the position and viewing direction of the agent. The method includes extracting features from the input image. The method includes estimating a depth map that includes depth values for pixels of the plurality of pixels of the input image. The method includes generating a 3D point map including points corresponding to the pixels of the input image. The method includes generating a voxel grid by voxelizing the 3D point map into a plurality voxels. The method includes generating a feature map by extracting feature vectors for pixels based on the points included in the voxels of the plurality of voxels and generating a BEV segmentation based on the feature map.
Abstract:
A system and method for providing unsupervised domain adaption for spatio-temporal action localization that includes receiving video data associated with a source domain and a target domain that are associated with a surrounding environment of a vehicle. The system and method also include analyzing the video data associated with the source domain and the target domain and determining a key frame of the source domain and a key frame of the target domain. The system and method additionally include completing an action localization model to model a temporal context of actions occurring within the key frame of the source domain and the key frame of the target domain and completing an action adaption model to localize individuals and their actions and to classify the actions based on the video data. The system and method further include combining losses to complete spatio-temporal action localization of individuals and actions.
Abstract:
A system and method for egocentric-vision based future vehicle localization that include receiving at least one egocentric first person view image of a surrounding environment of a vehicle. The system and method also include encoding at least one past bounding box trajectory associated with at least one traffic participant that is captured within the at least one egocentric first person view image and encoding a dense optical flow of the egocentric first person view image associated with the at least one traffic participant. The system and method further include decoding at least one future bounding box associated with the at least one traffic participant based on a final hidden state of the at least one past bounding box trajectory encoding and the final hidden state of the dense optical flow encoding.
Abstract:
Systems and techniques for scene classification and prediction is provided herein. A first series of image frames of an environment from a moving vehicle may be captured. Traffic participants within the environment may be identified and masked based on a first convolutional neural network (CNN). Temporal classification may be performed to generate a series of image frames associated with temporal predictions based on a scene classification model based on CNNs and a long short-term memory (LSTM) network. Additionally, scene classification may occur based on global average pooling. Feature vectors may be generated based on different series of image frames and a fusion feature vector may be obtained by performing data fusion based on a first feature vector, a second feature vector, a third feature vector, etc. In this way, a behavior predictor may generate a predicted driver behavior based on the fusion feature.
Abstract:
Systems and techniques for scene classification and prediction is provided herein. A first series of image frames of an environment from a moving vehicle may be captured. Traffic participants within the environment may be identified and masked based on a first convolutional neural network (CNN). Temporal classification may be performed to generate a series of image frames associated with temporal predictions based on a scene classification model based on CNNs and a long short-term memory (LSTM) network. Additionally, scene classification may occur based on global average pooling. Feature vectors may be generated based on different series of image frames and a fusion feature vector may be obtained by performing data fusion based on a first feature vector, a second feature vector, a third feature vector, etc. In this way, a behavior predictor may generate a predicted driver behavior based on the fusion feature.
Abstract:
A driver assistance system takes as input a number of different types of vehicle environment inputs including positions of objects in the vehicle's environment. The system identifies possible outcomes that may occur as a result of the positions of the objects in the environment. The possible outcomes include predicted positions for the objects involved in each outcome. The system uses the inputs to determine a likelihood of occurrence of each of the possible outcomes. The system also uses the inputs to determine a current risk value for objects as well as predicted risk values for objects for the possible outcomes. A total risk value can be determined by aggregating the current and predicted risk values of an object weighted by the likelihood of occurrence. Total risk values for objects can be used to determine how the driver assistance system responds to the inputs.
Abstract:
A computer-implemented method for interactive vehicle package design including receiving vehicle occupant package design model data including a task to be executed and receiving parameters defining a virtual human subject for executing the task, wherein the virtual human subject includes a plurality of degrees of freedom. The method including determining a plurality of motion descriptors of the virtual human subject including determining a manipulation over time of the degrees of freedom of the virtual human subject during accomplishment of the task and determining one or more performance metrics based on the motion descriptors. The method including generating a visual representation of the vehicle occupant package design model, the virtual human subject and the task to be executed based on at least one of the motion descriptors and the one or more performance metrics for output on a display.
Abstract:
A system and method for providing social-stage spatio-temporal multi-modal future forecasting that include receiving environment data associated with a surrounding environment of an ego vehicle and implementing graph convolutions to obtain attention weights that are respectively associated with agents that are located within the surrounding environment. The system and method also include decoding multi modal trajectories and probabilities for each of the agents. The system and method further include controlling at least one vehicle system of the ego vehicle based on predicted trajectories associated with each of the agents and the rankings associated with probabilities that are associated with each of the predicted trajectories.
Abstract:
According to one aspect, composite field based single shot trajectory prediction may include receiving an image of an environment including a number of agents, extracting a set of features from the image, receiving the image of the environment, encoding a set of trajectories from the image, concatenating the set of features and the set of trajectories from the image to generate an interaction module input, receiving the interaction module input, encoding a set of interactions between the number of agents and between the number of agents and the environment, concatenating the set of interactions and a localization composite field map to generate a decoder input, receiving the decoder input, generating the localization composite field map and an association composite field map, and generating a set of trajectory predictions for the number of agents based on the localization composite field map and the association composite field map.
Abstract:
Systems and methods for estimating velocity of an autonomous vehicle and state information of a surrounding vehicle are provided. In some aspects, the system includes a memory that stores instructions for executing processes for estimating velocity of an autonomous vehicle and state information of the surrounding vehicle and a processor configured to execute the instructions. In various aspects, the processes include: receiving image data from an image capturing device; performing a ground plane estimation by predicting a depth of points on a road surface based on an estimated pixel-level depth; determining a three-dimensional (3D) bounding box of the surrounding vehicle; determining the state information of the surrounding vehicle based on the ground plane estimation and the 3D bounding box; and determining the velocity of the autonomous vehicle based on an immovable object relative to the autonomous vehicle. In some aspects, an operation of the autonomous vehicle may be controlled based on at least one of the state information or the velocity of the autonomous vehicles.