Patent search ap:("Google LLC") AND inv:"Xuehan Xiong" Page 1

1.

发明申请
METHODS AND SYSTEMS FOR SHORT FORM PREVIEWS OF LONG FORM MEDIA ITEMS 有权

公开(公告)号：US20250054306A1

公开(公告)日：2025-02-13

申请号：US18797297

申请日：2024-08-07

Applicant: Google LLC

Inventor： Daniel S. Cohen , Christopher R. Conover , Emily Rose Smith , Anoop Menon , Benjamin Lehn , Sudheendra Vijayanarasimhan , Bo Hu , Shen Yan , Xuehan Xiong , David Alexander Ross

IPC: G06V20/40 , G06V10/70 , H04N21/8549

Abstract: Aspects of the disclosure are directed to methods and systems for short form previews of long form media items. A server can provide, to an artificial intelligence (AI) model, a long form media item to be shared with users. The server can receive, from the AI model, one or more frames that are predicted to contain content that is of interest to the users. The server can extract a segment of the long form media item that corresponds to the one or more frames, where the extracted segment corresponds to a short form media item preview. The short form media item preview can be provided for presentation to the users.

2.

发明公开
Pose Empowered RGB-Flow Net 审中-公开

公开(公告)号：US20230419538A1

公开(公告)日：2023-12-28

申请号：US18464912

申请日：2023-09-11

Applicant: Google LLC

Inventor： Yinxiao Li , Zhichao Lu , Xuehan Xiong , Jonathan Huang

IPC: G06T7/73

CPC classification number: G06T7/73 , G06T2207/20081 , G06T2207/30196 , G06T2207/20084 , G06T2207/10016

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

3.

发明公开
ACTION LOCALIZATION IN VIDEOS USING LEARNED QUERIES 审中-公开

公开(公告)号：US20240346824A1

公开(公告)日：2024-10-17

申请号：US18634794

申请日：2024-04-12

Applicant: Google LLC

Inventor： Alexey Alexeevich Gritsenko , Xuehan Xiong , Josip Djolonga , Mostafa Dehghani , Chen Sun , Mario Lucic , Cordelia Luise Schmid , Anurag Arnab

IPC: G06V20/40 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/77 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V20/46 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/7715 , G06V10/774 , G06V10/776 , G06V10/82 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.

4.

发明授权
Pose empowered RGB-flow net 有权

公开(公告)号：US11776156B2

公开(公告)日：2023-10-03

申请号：US17303969

申请日：2021-06-11

Applicant: Google LLC

Inventor： Yinxiao Li , Zhichao Lu , Xuehan Xiong , Jonathan Huang

IPC: G06T7/73

CPC classification number: G06T7/73 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084 , G06T2207/30196

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

5.

发明申请
Pose Empowered RGB-Flow Net 有权

公开(公告)号：US20210390733A1

公开(公告)日：2021-12-16

申请号：US17303969

申请日：2021-06-11

Applicant: Google LLC

Inventor： Yinxiao Li , Zhichao Lu , Xuehan Xiong , Jonathan Huang

IPC: G06T7/73

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

6.

发明申请
VIDEO LOCALIZATION USING ARTIFICIAL INTELLIGENCE 有权

公开(公告)号：US20240371164A1

公开(公告)日：2024-11-07

申请号：US18652703

申请日：2024-05-01

Applicant: Google LLC

Inventor： Shen Yan , Xuehan Xiong , Arsha Nagrani , Anurag Arnab , David Alexander Ross , Cordelia Schmid

IPC: G06V20/40 , G06V10/774 , G06V10/80

Abstract: Methods and systems for video localization using artificial intelligence are provided herein. A set of video embeddings representing features of one or more video frames of a media it em and a set of textual embeddings corresponding to an event associated with the media item are obtained. Fused video-textual data is generated based on the set of video embeddings and the set of textual embeddings. The fused video-textual data indicates features of the video frames of the media item and textual data pertaining to the media item. The fused video-textual data is provided as an input to an artificial intelligence (AI) model trained to perform multiple video localization tasks with respect to media items of a platform. One or move outputs of the AI model are obtained. A segment of the media item that depicts the event is determined based on the one or move outputs of the AI model.

Patent Agency Ranking