-
公开(公告)号:US20240184704A1
公开(公告)日:2024-06-06
申请号:US18524706
申请日:2023-11-30
发明人: Huaisheng ZHANG , Zhongyu TAO , Yuefan ZHANG
IPC分类号: G06F12/0868 , G06F5/01 , G06F7/523 , G06F7/535 , G06F12/0873
CPC分类号: G06F12/0868 , G06F5/01 , G06F7/523 , G06F7/535 , G06F12/0873
摘要: A method and an apparatus for reading cache data, and a storage medium are provided. The method includes: receiving a read instruction; converting at least one type of address offset position corresponding to the read instruction into a coupled address offset position according to a preset rule; performing a matching operation in a data group according to the coupled address offset position which corresponds to the at least one type of address offset position and obtaining corresponding cache data; reading the cache data obtained through matching. This method is able to simultaneously perform matching operations and reading operations for at least two combined address offset positions, thereby greatly improving data reading efficiency and doubling cache data reading throughput without significantly increasing hardware logic. Furthermore, this configuration is applicable to read-only cache, read-write cache, and write-only cache, possessing great versatility.
-
公开(公告)号:US20240289048A1
公开(公告)日:2024-08-29
申请号:US18226687
申请日:2023-07-26
发明人: Zhulin CHANG , Huaisheng ZHANG , Xin JIN
IPC分类号: G06F3/06
CPC分类号: G06F3/0656 , G06F3/0604 , G06F3/0673
摘要: Disclosed are method and apparatus for loading task data, a computer device, a storage medium, and a computer program product. The method includes: analyzing various types of buffers involved in a task after the task is initiated, and determining whether each of the buffers satisfies a preset read-only buffer condition; determining a buffer satisfying the read-only buffer condition as a read-only buffer; mapping the read-only buffer into a matched read-only storage space based on space information of the read-only storage space and size information of the read-only buffer, and obtaining corresponding read-only mapping information; and loading task data in the read-only buffer into the matched read-only storage space based on the read-only mapping information. With the method, the loading efficiency of task data can be improved.
-
3.
公开(公告)号:US20240020094A1
公开(公告)日:2024-01-18
申请号:US18222101
申请日:2023-07-14
发明人: Yuqin YU , Yaohui ZENG , Renyu BIAN , Huaisheng ZHANG
IPC分类号: G06F7/523
CPC分类号: G06F7/523
摘要: Multiplication-accumulation method and apparatus, a processor, and a computer program product are provided. The method includes: when a logical operation unit performs single-precision floating-point number multiplication-accumulation operation, combining two half-precision multiplier-accumulators in each single-precision multiplication-accumulation unit to perform the multiplication-accumulation operation on to-be-processed single-precision floating-point numbers to obtain corresponding single-precision multiplication-accumulation results, a total of N multiplication-accumulation results being obtained; and when the logical operation unit performs half-precision floating-point number multiplication-accumulation operation, performing, by each half-precision multiplier-accumulator, the multiplication-accumulation operation on to-be-processed half-precision floating-point numbers to obtain corresponding half-precision multiplication-accumulation results, a total of 2N multiplication-accumulation results being obtained. Utilization of the multiplier-accumulators is improved.
-
公开(公告)号:US20240176585A1
公开(公告)日:2024-05-30
申请号:US18225467
申请日:2023-07-24
发明人: Yaohui ZENG , Renyu BIAN , Huaisheng ZHANG
摘要: The present application relates to a data processing method, a computer device, and a storage medium. The method includes: acquiring data formats of two pieces of input data; the data formats of the two pieces of input data being the same; determining a target data conversion algorithm matching the data formats from a plurality of preset data conversion algorithms, and performing, by using the target data conversion algorithm, data format conversion on the two pieces of input data to obtain at least two pieces of target input data; processing, by using a multiplier, the at least two pieces of target input data to obtain a preliminary operation result; and determining truncation bit widths corresponding to the two pieces of input data, and processing the preliminary operation result according to the truncation bit widths, to obtain a multiplication operation result corresponding to the two pieces of input data.
-
公开(公告)号:US20240004702A1
公开(公告)日:2024-01-04
申请号:US17952730
申请日:2022-09-26
发明人: Huaisheng ZHANG , Xiangxiang CHEN , Baohua LI
IPC分类号: G06F9/48
CPC分类号: G06F9/4881
摘要: Disclosed are a thread construction method and device. The method includes: a workload is divided into a plurality of work groups; for any work group, a pattern type that matches a size of the any work group is selected, a target thread construction pattern is determined from a plurality of candidate thread construction patterns corresponding to the pattern type; a plurality of threads are constructed according to the target thread construction pattern; the threads are composed of a plurality of consecutive work items in the any work group; the work item index corresponding to at least one key work item in the work item sequence of each thread is cached and the work item index of each thread is obtained, which is configured to schedule the any work item corresponding to the thread to the processing unit.
-
6.
公开(公告)号:US20240004615A1
公开(公告)日:2024-01-04
申请号:US18216809
申请日:2023-06-30
发明人: Zhongyu TAO , Huaisheng ZHANG , Renyu BIAN
摘要: Convolution operation method and apparatus, matrix decompression device and graphics processor are provided. The method includes: loading, from a preset memory layout, at least one target feature tile constituting any sub-feature map in an original feature map for the any sub-feature map; the memory layout being obtained by writing at least one feature tile into memory according to preset way of data arrangement; the at least one feature tile being obtained by tiling the original feature map; decompressing a feature map which is composed of the at least one target feature tile according to a convolution parameter of a convolutional layer to obtain a destination decompressed matrix; performing a matrix multiplication operation on the destination decompressed matrix and the decompressed matrix corresponding to a convolution kernel to obtain a convolution operation result of the original feature map. The present disclosure may improve the convolution operation efficiency.
-
-
-
-
-