Invention Publication
- Patent Title: METHODS AND MODULES FOR ACCELERATING INFERENCE VIA DISTRIBUTED DEVICES
-
Application No.: US18087897Application Date: 2022-12-23
-
Publication No.: US20240220829A1Publication Date: 2024-07-04
- Inventor: Chenghao Hu , Baochun Li , Zhixiang Chi , Yuanhao Yu , Yang Wang , Jin Tang
- Applicant: Huawei Technologies Canada Co., Ltd. , The Governing Council of the University of Toronto
- Applicant Address: CA Kanata
- Assignee: Huawei Technologies Canada Co., Ltd.,The Governing Council of the University of Toronto
- Current Assignee: Huawei Technologies Canada Co., Ltd.,The Governing Council of the University of Toronto
- Current Assignee Address: CA Kanata
- Main IPC: G06N5/04
- IPC: G06N5/04

Abstract:
Methods and modules for accelerating inference computations in transformer models using edge devices includes partitioning inputs for each layer and synchronizing between transformer layers. A method includes receiving a transformer input, partitioning the transformer input into two or more first-stage divisions, processing each first-stage division into a processed first-stage division, and combining the processed first-stage divisions into a first output. A module includes a computing device for partitioning a transformer input into two or more divisions, transmitting each of the divisions, and receiving processed divisions, as well as two or more transformer processing units, each for receiving a division from the computing device, processing the division into a processed division, and sending the processed division to the computing device.
Information query