- 专利标题: Transformer-based architecture for transform coding of media
-
申请号: US17486732申请日: 2021-09-27
-
公开(公告)号: US12120348B2公开(公告)日: 2024-10-15
- 发明人: Yinhao Zhu , Yang Yang , Taco Sebastiaan Cohen
- 申请人: QUALCOMM Incorporated
- 申请人地址: US CA San Diego
- 专利权人: QUALCOMM INCORPORATED
- 当前专利权人: QUALCOMM INCORPORATED
- 当前专利权人地址: US CA San Diego
- 代理机构: Polsinelli LLP
- 主分类号: G06T9/00
- IPC分类号: G06T9/00 ; G06N3/0455 ; G06T7/11 ; G06V10/26 ; H04N1/387 ; H04N9/877 ; H04N13/00 ; H04N19/543 ; H04N19/60
摘要:
Systems and techniques are described herein for processing media data using a neural network system. For instance, a process can include obtaining a latent representation of a frame of encoded image data and generating, by a plurality of decoder transformer layers of a decoder sub-network using the latent representation of the frame of encoded image data as input, a frame of decoded image data. At least one decoder transformer layer of the plurality of decoder transformer layers includes: one or more transformer blocks for generating one or more patches of features and determine self-attention locally within one or more window partitions and shifted window partitions applied over the one or more patches; and a patch un-merging engine for decreasing a respective size of each patch of the one or more patches.
公开/授权文献
信息查询