Patent search ap:("Google LLC") AND inv:"Yinxiao Li" Page 1

1.

发明申请
Multi-Axis Vision Transformer 有权

公开(公告)号：US20250022269A1

公开(公告)日：2025-01-16

申请号：US18902546

申请日：2024-09-30

Applicant: Google LLC

Inventor： Yinxiao Li , Feng Yang , Peyman Milanfar , Han Zhang , Zhengzhong Tu , Hossein Talebi

IPC: G06V10/82 , G06V10/77

Abstract: Provided is an efficient and scalable attention model that can be referred to as multi-axis attention. Example implementations can include two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. The present disclosure also presents a new architectural element by effectively blending the proposed multi-axis attention model with convolutions. In addition, the present disclosure proposes a simple hierarchical vision backbone, example implementations of which can be referred to as MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to “see” globally throughout the entire network, even in earlier, high-resolution stages.

2.

发明公开
Compression-Informed Video Super-Resolution 审中-公开

公开(公告)号：US20240022760A1

公开(公告)日：2024-01-18

申请号：US18256837

申请日：2021-08-05

Applicant: Google LLC

Inventor： Yinxiao Li , Peyman Milanfar , Feng Yang , Ce Liu , Ming-Hsuan Yang , Pengchong Jin

IPC: H04N19/59 , G06T3/00 , H04N19/117 , G06V10/74 , H04N19/503 , H04N19/70 , H04N19/80

CPC classification number: H04N19/59 , G06T3/0093 , H04N19/117 , G06V10/761 , H04N19/503 , H04N19/70 , H04N19/80

Abstract: Example aspects of the present disclosure are directed to systems and methods which feature a machine-learned video super-resolution (VSR) model which has been trained using a bi-directional training approach. In particular, the present disclosure provides a compression-informed (e.g., compression-aware) super-resolution model that can perform well on real-world videos with different levels of compression. Specifically, example models described herein can include three modules to robustly restore the missing information caused by video compression. First, a bi-directional recurrent module can be used to reduce the accumulated warping error from the random locations of the intra-frame from compressed video frames. Second, a detail-aware flow estimation module can be added to enable recovery of high resolution (HR) flow from compressed low resolution (LR) frames. Finally, a Laplacian enhancement module can add high-frequency information to the warped HR frames washed out by video encoding.

3.

发明申请
Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks 有权

公开(公告)号：US20250069382A1

公开(公告)日：2025-02-27

申请号：US18726881

申请日：2023-01-05

Applicant: Google LLC

Inventor： Yinxiao Li , Zhengzhong Tu , Hossein Talebi , Han Zhang , Feng Yang , Peyman Milanfar

IPC: G06V10/82 , G06V10/764 , G06V10/77

Abstract: Provided are machine learning systems and models featuring resolution-flexible multi-axis attention blocks. In particular, the present disclosure provides example multi-axis MLP based architectures (example implementations of which can be generally referred to as MAXIM) that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. In some implementations, MAXIM can use a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, some example implementations of MAXIM can contain two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning.

4.

发明申请
Memory-Guided Video Object Detection 有权

公开(公告)号：US20220189170A1

公开(公告)日：2022-06-16

申请号：US17432221

申请日：2019-02-22

Applicant: Google LLC

Inventor： Menglong Zhu , Mason Liu , Marie Charisse White , Dmitry Kalenichenko , Yinxiao Li

IPC: G06V20/40 , G06V10/70 , G06V10/80 , G06V10/82 , G06V10/94 , G06V10/776 , G06V10/774

Abstract: Systems and methods for detecting objects in a video are provided. A method can include inputting a video comprising a plurality of frames into an interleaved object detection model comprising a plurality of feature extractor networks and a shared memory layer. For each of one or more frames, the operations can include selecting one of the plurality of feature extractor networks to analyze the one or more frames, analyzing the one or more frames by the selected feature extractor network to determine one or more features of the one or more frames, determining an updated set of features based at least in part on the one or more features and one or more previously extracted features extracted from a previous frame stored in the shared memory layer, and detecting an object in the one or more frames based at least in part on the updated set of features.

5.

发明授权
Pose empowered RGB-flow net 有权

公开(公告)号：US11776156B2

公开(公告)日：2023-10-03

申请号：US17303969

申请日：2021-06-11

Applicant: Google LLC

Inventor： Yinxiao Li , Zhichao Lu , Xuehan Xiong , Jonathan Huang

IPC: G06T7/73

CPC classification number: G06T7/73 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084 , G06T2207/30196

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

6.

发明申请
Pose Empowered RGB-Flow Net 有权

公开(公告)号：US20210390733A1

公开(公告)日：2021-12-16

申请号：US17303969

申请日：2021-06-11

Applicant: Google LLC

Inventor： Yinxiao Li , Zhichao Lu , Xuehan Xiong , Jonathan Huang

IPC: G06T7/73

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

7.

发明公开
MEMORY-GUIDED VIDEO OBJECT DETECTION 审中-公开

公开(公告)号：US20240212347A1

公开(公告)日：2024-06-27

申请号：US18603946

申请日：2024-03-13

Applicant: Google LLC

Inventor： Dmitry Kalenichenko , Menglong Zhu , Marie Charisse White , Mason Liu , Yinxiao Li

IPC: G06V20/40 , G06V10/70 , G06V10/774 , G06V10/776 , G06V10/80 , G06V10/82 , G06V10/94

CPC classification number: G06V20/40 , G06V10/774 , G06V10/776 , G06V10/806 , G06V10/82 , G06V10/87 , G06V10/955 , G06V20/46

Abstract: Systems and methods for detecting objects in a video are provided. A method can include inputting a video comprising a plurality of frames into an interleaved object detection model comprising a plurality of feature extractor networks and a shared memory layer. For each of one or more frames, the operations can include selecting one of the plurality of feature extractor networks to analyze the one or more frames, analyzing the one or more frames by the selected feature extractor network to determine one or more features of the one or more frames, determining an updated set of features based at least in part on the one or more features and one or more previously extracted features extracted from a previous frame stored in the shared memory layer, and detecting an object in the one or more frames based at least in part on the updated set of features.

8.

发明授权
Memory-guided video object detection 有权

公开(公告)号：US11961298B2

公开(公告)日：2024-04-16

申请号：US17432221

申请日：2019-02-22

Applicant: Google LLC

Inventor： Menglong Zhu , Mason Liu , Marie Charisse White , Dmitry Kalenichenko , Yinxiao Li

IPC: G06V10/00 , G06V10/70 , G06V10/774 , G06V10/776 , G06V10/80 , G06V10/82 , G06V10/94 , G06V20/40

CPC classification number: G06V20/40 , G06V10/774 , G06V10/776 , G06V10/806 , G06V10/82 , G06V10/87 , G06V10/955 , G06V20/46

Abstract: Systems and methods for detecting objects in a video are provided. A method can include inputting a video comprising a plurality of frames into an interleaved object detection model comprising a plurality of feature extractor networks and a shared memory layer. For each of one or more frames, the operations can include selecting one of the plurality of feature extractor networks to analyze the one or more frames, analyzing the one or more frames by the selected feature extractor network to determine one or more features of the one or more frames, determining an updated set of features based at least in part on the one or more features and one or more previously extracted features extracted from a previous frame stored in the shared memory layer, and detecting an object in the one or more frames based at least in part on the updated set of features.

9.

发明公开
Machine-Learned Models for Imperceptible Message Watermarking in Videos 审中-公开

公开(公告)号：US20240020788A1

公开(公告)日：2024-01-18

申请号：US18256783

申请日：2021-03-24

Applicant: Google LLC

Inventor： Xiyang Luo , Feng Yang , Ce Liu , Huiwen Chang , Peyman Milanfar , Yinxiao Li

IPC: G06T1/00

CPC classification number: G06T1/0085 , G06T2201/0083

Abstract: Systems and methods of the present disclosure are directed to a computing system. The computing system can obtain a message vector and video data comprising a plurality of video frames. The computing system can process the input video with a transformation portion of a machine-learned watermark encoding model to obtain a three-dimensional feature encoding of the input video. The computing system can process the three-dimensional feature encoding of the input video and the message vector with an embedding portion of the machine-learned watermark encoding model to obtain spatial-temporal watermark encoding data descriptive of the message vector. The computing system can generate encoded video data comprising a plurality of encoded video frames, wherein at least one of the plurality of encoded video frames includes the spatial-temporal watermark encoding data.

10.

发明公开
Pose Empowered RGB-Flow Net 审中-公开

公开(公告)号：US20230419538A1

公开(公告)日：2023-12-28

申请号：US18464912

申请日：2023-09-11

Applicant: Google LLC

Inventor： Yinxiao Li , Zhichao Lu , Xuehan Xiong , Jonathan Huang

IPC: G06T7/73

CPC classification number: G06T7/73 , G06T2207/20081 , G06T2207/30196 , G06T2207/20084 , G06T2207/10016

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification