-
公开(公告)号:US20230297852A1
公开(公告)日:2023-09-21
申请号:US18007379
申请日:2021-07-29
Applicant: Google LLC
Inventor: Li Zhang , Andrew Gerald Howard , Brendan Wesley Jou , Yukun Zhu , Mingda Zhang , Andrey Zhmoginov
IPC: G06N5/022
CPC classification number: G06N5/022
Abstract: Example implementations of the present disclosure combine efficient model design and dynamic inference. With a standalone lightweight model, the unnecessary computation on easy examples is avoided and the information extracted by the lightweight model also guide the synthesis of a specialist network from the basis models. With extensive experiments on ImageNet it is shown that a proposed example BasisNet is particularly effective for image classification and a BasisNet-MV3 achieves 80.3% top-1 accuracy with 290 M MAdds without early termination.
-
公开(公告)号:US20230267942A1
公开(公告)日:2023-08-24
申请号:US17601042
申请日:2020-10-01
Applicant: Google LLC
Inventor: Anatoly Efros , Noam Etzion-Rosenberg , Tal Remez , Oran Lang , Inbar Mosseri , Israel Or Weinstein , Benjamin Schlesinger , Michael Rubinstein , Ariel Ephrat , Yukun Zhu , Stella Laurenzo , Amit Pitaru , Yossi Matias
IPC: G10L21/0208 , G10L25/57
CPC classification number: G10L21/0208 , G10L25/57 , G10L2021/02087
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.
-
公开(公告)号:US20190370648A1
公开(公告)日:2019-12-05
申请号:US16425900
申请日:2019-05-29
Applicant: Google LLC
Inventor: Barret Zoph , Jonathon Shlens , Yukun Zhu , Maxwell Donald Emmet Collins , Liang-Chieh Chen , Gerhard Florian Schroff , Hartwig Adam , Georgios Papandreou
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes obtaining training data for a dense image prediction task; and determining an architecture for a neural network configured to perform the dense image prediction task, comprising: searching a space of candidate architectures to identify one or more best performing architectures using the training data, wherein each candidate architecture in the space of candidate architectures comprises (i) the same first neural network backbone that is configured to receive an input image and to process the input image to generate a plurality of feature maps and (ii) a different dense prediction cell configured to process the plurality of feature maps and to generate an output for the dense image prediction task; and determining the architecture for the neural network based on the best performing candidate architectures.
-
公开(公告)号:US12073844B2
公开(公告)日:2024-08-27
申请号:US17601042
申请日:2020-10-01
Applicant: Google LLC
Inventor: Anatoly Efros , Noam Etzion-Rosenberg , Tal Remez , Oran Lang , Inbar Mosseri , Israel Or Weinstein , Benjamin Schlesinger , Michael Rubinstein , Ariel Ephrat , Yukun Zhu , Stella Laurenzo , Amit Pitaru , Yossi Matias
IPC: G10L21/0208 , G10L17/00 , G10L21/0272 , G10L25/57
CPC classification number: G10L21/0208 , G10L17/00 , G10L21/0272 , G10L25/57 , G10L2021/02087
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.
-
公开(公告)号:US20230222628A1
公开(公告)日:2023-07-13
申请号:US17572923
申请日:2022-01-11
Applicant: Google LLC
Inventor: Yang Zhao , Yu-Chuan Su , Chun-Te Chu , Yandong Li , Marius Renn , Yukun Zhu , Xuhui Jia , Bradley Ray Green
CPC classification number: G06T5/001 , G06V40/168 , G06T2207/30201 , G06T2207/20081 , G06T2207/20084
Abstract: Systems and methods for training a restoration model can leverage training for two sub-tasks to train the restoration model to generate realistic and identity-preserved outputs. The systems and methods can balance the training of the generation task and the reconstruction task to ensure the generated outputs preserve the identity of the original subject while generating realistic outputs. The systems and methods can further leverage a feature quantization model and skip connections to improve the model output and overall training.
-
公开(公告)号:US20250166135A1
公开(公告)日:2025-05-22
申请号:US18951203
申请日:2024-11-18
Applicant: Google LLC
Inventor: Yu-Chuan Su , Hsin-Ping Huang , Ming-Hsuan Yang , Deqing Sun , Lu Jiang , Yukun Zhu , Xuhui Jia
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controllable video generation. One of the methods includes receiving a text prompt that specifies an object; receiving a control input that comprises an image that depicts a particular instance of the object; generating a video that comprises a respective video frame at each of a plurality of time steps in the video and that depicts the particular instance of the object. Generating the video includes, at each of the plurality of time steps: obtaining a text prompt embedding; obtaining a control input embedding; and generating the respective video frame at the time step using a video generation neural network while the video generation neural network is conditioned on the text prompt embedding and on the control input embedding.
-
公开(公告)号:US20250111675A1
公开(公告)日:2025-04-03
申请号:US18900467
申请日:2024-09-27
Applicant: Google LLC
Inventor: Hui Miao , Chun-Te Chu , Mingyan Gao , Huanfen Yao , Ting Liu , Long Zhao , Liangzhe Yuan , Yukun Zhu , Vinay Kumar Bettadapura , Ye Jin
IPC: G06V20/40 , G06V10/74 , G06V10/75 , G06V10/762 , G06V10/80
Abstract: Methods and systems for media trend detection and maintenance are provided herein. A set of media items each having common media characteristics is identified. A set of pose values is determined for each respective media item of the set of media items. Each pose value is associated with a particular predefined pose for objects depicted by the set of media items. A set of distance scores is calculated. Each distance score represents a distance between the respective set of pose values determined for a media item and a respective set of pose values determined for an additional media item. A coherence score is determined for the set of media items based on the calculated set of distance scores. Responsive to a determination that the coherence score satisfies one or more coherence criteria, a determination is made that the set of media items corresponds to a media trend of a platform.
-
公开(公告)号:US20240428816A1
公开(公告)日:2024-12-26
申请号:US18797400
申请日:2024-08-07
Applicant: Google LLC
Inventor: Anatoly Efros , Noam Etzion-Rosenberg , Tal Remez , Oran Lang , Inbar Mosseri , Israel Or Weinstein , Benjamin Schlesinger , Michael Rubinstein , Ariel Ephrat , Yukun Zhu , Stella Laurenzo , Amit Pitaru , Yossi Matias
IPC: G10L21/0208 , G10L17/00 , G10L21/0272 , G10L25/57
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.
-
公开(公告)号:US11949724B2
公开(公告)日:2024-04-02
申请号:US17459964
申请日:2021-08-27
Applicant: Google LLC
Inventor: Colvin Pitts , Yukun Zhu , Xuhui Jia
CPC classification number: H04L65/403 , G06T11/00 , G06V20/41 , G06V40/20 , H04L63/105 , H04N5/272
Abstract: A computing system and method that can be used for safe and privacy preserving video representations of participants in a videoconference. In particular, the present disclosure provides a general pipeline for generating reconstructions of videoconference participants based on semantic statuses and/or activity statuses of the participants. The systems and methods of the present disclosure allow for videoconferences that convey necessary or meaningful information of participants through presentation of generalized representations of participants while filtering unnecessary or unwanted information from the representations by leveraging machine-learning models.
-
10.
公开(公告)号:US20230113131A1
公开(公告)日:2023-04-13
申请号:US17909579
申请日:2020-03-05
Applicant: Shawn O'Banion , Wenhuan WEI , Yukun ZHU , Google LLC
Inventor: Shawn Ryan O'Banion , Wenhuan Wei , Yukun Zhu
Abstract: The present disclosure is directed to systems and methods for performing automated labeling of images. Labeled images can be used to train machine-learned models to infer image attributes such as quality for suggesting user actions.
-
-
-
-
-
-
-
-
-