-
公开(公告)号:US11921824B1
公开(公告)日:2024-03-05
申请号:US17215209
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Todd Hester , Sheng Chen , Mark Buckler , Ayan Tuhinendu Sinha , Hitesh Arora , Michael Lawrence LeKander , Hamed Pirsiavash
CPC classification number: G06F18/25 , B25J9/1697 , G06F18/2163 , G06N3/045 , G06N3/08 , G06V20/10
Abstract: Techniques are generally described for fusing sensor data of different modalities using a transformer. In various examples, first sensor data may be received from a first sensor and second sensor data may be received from a second sensor. A first feature representation of the first sensor data may be generated using a first machine learning model and a second feature representation of the second sensor data may be generated using a second machine learning model. In some examples, a modified first feature representation of the first sensor data may be generated based at least in part on a self-attention mechanism of a transformer encoder. The modified first feature representation may be generated based at least in part on the first feature representation and the second feature representation. A computer vision task may be performed using the modified first feature representation.