-
公开(公告)号:US20220198281A1
公开(公告)日:2022-06-23
申请号:US17127104
申请日:2020-12-18
发明人: Jan Van Lunteren , Nikolas Ioannou , Nikolaos Papandreou , Thomas Parnell , Andreea Anghel , Charalampos Pozidis
IPC分类号: G06N5/00 , G06F16/901 , G06F16/903 , G06N5/04 , G06F9/30 , G06F12/0811
摘要: An approach of accelerating inferences based on decision trees based on accessing one or more decision trees, wherein each decision tree of the decision trees accessed comprises decision tree nodes, including nodes grouped into one or more supersets of nodes designed for joint execution. For each decision tree of the decision trees accessed, the nodes are executed to obtain an outcome for the one or more decision trees, respectively. For each superset of the one or more supersets of said each decision tree, the nodes of each superset are jointly executed by: loading attributes of the nodes of each superset in a respective cache line of the cache memory processing said attributes from the respective cache line until an inference result is returned based on the one or more outcomes.
-
公开(公告)号:US10372358B2
公开(公告)日:2019-08-06
申请号:US14941837
申请日:2015-11-16
发明人: Jan Van Lunteren
摘要: A reconfigurable computing device having a plurality of reconfigurable partitions and that is adapted to perform parallel processing of operand data by the partitions is provided. The computing system includes a memory device that is adapted to store configuration data to configure the partitions of the computing device, to store operand data to be processed by the configured partitions and to store processing results of the operand data. A programmable memory access processor having a predefined program is provided. The access processor performs address generation, address mapping and access scheduling for retrieving the configuration data from the memory unit, for retrieving the operand data from the memory unit and for storing the processing results in the memory unit. The access processor also transfers the configuration data from the memory unit to the computing device and transfers the operand data from the memory unit to the computing device.
-
公开(公告)号:US20190026037A1
公开(公告)日:2019-01-24
申请号:US16137786
申请日:2018-09-21
发明人: Jan Van Lunteren
IPC分类号: G06F3/06 , G11C5/02 , G06F12/0802
CPC分类号: G06F3/0629 , G06F3/0604 , G06F3/0644 , G06F3/0673 , G06F12/0292 , G06F12/0607 , G06F2212/1012
摘要: A reconfigurable computing device having a plurality of reconfigurable partitions and that is adapted to perform parallel processing of operand data by the partitions is provided. The computing system includes a memory device that is adapted to store configuration data to configure the partitions of the computing device, to store operand data to be processed by the configured partitions and to store processing results of the operand data. A programmable memory access processor having a predefined program is provided. The access processor performs address generation, address mapping and access scheduling for retrieving the configuration data from the memory unit, for retrieving the operand data from the memory unit and for storing the processing results in the memory unit. The access processor also transfers the configuration data from the memory unit to the computing device and transfers the operand data from the memory unit to the computing device.
-
公开(公告)号:US09582474B2
公开(公告)日:2017-02-28
申请号:US14313132
申请日:2014-06-24
发明人: Hoi Sun Ng , Jan Van Lunteren
IPC分类号: G06F17/14
CPC分类号: G06F17/142
摘要: A method, apparatus, and computer program product for performing an FFT computation. The method includes: providing first and second input data elements in multiple memory areas of a memory unit; in each of a number of consecutive computation stages, performing multiple butterfly operations based on a first and second input data element to obtain two output data elements, wherein first and second input data elements for a plurality of multiple butterfly operations are simultaneously retrieved from predetermined memory locations of a first and second of memory areas; for each stage, storing two output data elements in the memory unit as input data elements for a next stage according to a mapping scheme configured to store output data elements at memory locations in first and second memory areas so that they are simultaneously retrievable as input data elements for a plurality of butterfly operations of subsequent computation stage.
摘要翻译: 一种用于执行FFT计算的方法,设备和计算机程序产品。 该方法包括:在存储器单元的多个存储区域中提供第一和第二输入数据元素; 在多个连续计算阶段的每一个中,基于第一和第二输入数据元素执行多个蝶形运算以获得两个输出数据元素,其中用于多个多蝶形运算的第一和第二输入数据元素从预定存储器同时检索 第一和第二存储区的位置; 对于每个级,根据映射方案将存储器单元中的两个输出数据元素存储在存储器单元中作为输入数据元素,该映射方案被配置为将输出数据元素存储在第一和第二存储器区域中的存储器位置,使得它们可以作为输入数据同时检索 用于后续计算阶段的多个蝴蝶操作的元件。
-
公开(公告)号:US20150121037A1
公开(公告)日:2015-04-30
申请号:US14496132
申请日:2014-09-25
发明人: Jan Van Lunteren
CPC分类号: G06F9/3824 , G06F9/3851 , G06F9/3889
摘要: A processing device includes an execute processor configured to execute data processing instructions; and an access processor configured to be coupled with a memory system to execute memory access instructions; wherein the execute processor and the access processor are logically separated units, the execute processor having an execute processor input register file with input registers, and a data processing instruction is executed as soon as all operands for the respective data processing instruction are available in the input registers.
摘要翻译: 处理装置包括:执行处理器,被配置为执行数据处理指令; 以及接入处理器,被配置为与存储器系统耦合以执行存储器访问指令; 其中所述执行处理器和所述访问处理器是逻辑上分离的单元,所述执行处理器具有具有输入寄存器的执行处理器输入寄存器文件,并且一旦所述数据处理指令的所有操作数在所述输入中可用,则执行数据处理指令 注册
-
6.
公开(公告)号:US20140172766A1
公开(公告)日:2014-06-19
申请号:US13714910
申请日:2012-12-14
发明人: Jan Van Lunteren
IPC分类号: G06N5/02
CPC分类号: G06F9/444 , G06F9/4498 , G06N5/047 , H04L45/00
摘要: Embodiments of the disclosure include a method for partitioning a deterministic finite automaton (DFA) into a plurality of groups. The method includes selecting, with a processing device, a subset of the plurality of states and mapping each state of the subset onto a group of the plurality of groups by assigning one or more transition rules associated with each state to a rule line of the group, wherein each rule line is assigned at most two transition rules and an extended address associated with one of the at most two transition rules. The method also includes iteratively processing each state of the subset mapped onto the group by removing the extended address from each rule line in the group with transition rules referring to a current state if the transition rules in the rule line branch within the group.
摘要翻译: 本公开的实施例包括用于将确定性有限自动机(DFA)划分成多个组的方法。 该方法包括使用处理设备选择多个状态的子集,并通过将与每个状态相关联的一个或多个过渡规则分配给组的规则行来将子集的每个状态映射到多个组的组上 其中每个规则线被分配至多两个转换规则和与至多两个转换规则之一相关联的扩展地址。 该方法还包括迭代地处理映射到组上的子集的每个状态,通过从组中的每个规则行移除扩展地址,其中转移规则参考当前状态,如果规则行中的转移规则在组内分支。
-
公开(公告)号:US20240311148A1
公开(公告)日:2024-09-19
申请号:US18183187
申请日:2023-03-14
发明人: Jan Van Lunteren
IPC分类号: G06F9/30 , G06F16/901
CPC分类号: G06F9/30036 , G06F16/9027 , G06N5/01 , G06N5/04 , G06N20/00
摘要: Methods are provided for inference processing of a decision tree model in processing apparatus which executes vector instructions to perform inference computations on vectors of operands stored in vector registers of the apparatus. Such a method includes, for each decision tree of the model, indexing nodes of the tree by consecutive node indexes which are assigned to nodes in a breadth-first order and increase with node-depth in the tree. During the inference processing, a vector of N node indexes, corresponding to a set of nodes for which N inference computations will be performed in parallel, is stored in a vector register of the apparatus. The method further includes adaptively selecting the granularity N of the vector of node indexes, in dependence on (at least) the node-depth of nodes in the set, to accelerate inference processing of the model.
-
公开(公告)号:US20230016368A1
公开(公告)日:2023-01-19
申请号:US17376223
申请日:2021-07-15
摘要: A method is provided for accelerating machine learning inferences. The method uses an ensemble model run on input data. This ensemble model involves several base learners, where each of the base learners has been trained. The method first schedules tasks for execution. As a result of the task scheduling, one of the base learners is executed based on a subset of the input data. The execution of the tasks is then started to obtain respective task outcomes. An exit condition is repeatedly evaluated while executing the tasks by computing a deterministic function of the task outcomes obtained so far. This deterministic function output values indicate whether an inference result of the ensemble model has converged. Accordingly, the execution of the tasks can be interrupted if the exit condition evaluated last is found to be fulfilled. Eventually, an inference result of the ensemble model is estimated based on the task outcomes.
-
公开(公告)号:US10740116B2
公开(公告)日:2020-08-11
申请号:US14841825
申请日:2015-09-01
摘要: A method for performing enhanced pattern scanning includes the steps of: providing a three-dimensional memory structure including multiple physical memory elements; compiling multiple programmable finite state machines, each of the programmable finite state machines representing at least one deterministic finite automation data structure, the data structure being distributed over at least a subset of the physical memory elements; configuring a subset of the programmable finite state machines to operate in parallel on a same input data stream, while each of the subset of programmable finite state machines processes a different pattern subset; and providing a local result processor, the local result processor transferring at least a part of a match state from the deterministic finite automation data structures to corresponding registers within the local result processor, the part of the match state being manipulated being based on instructions embedded within the deterministic finite automation data structures.
-
公开(公告)号:US10209890B2
公开(公告)日:2019-02-19
申请号:US15471372
申请日:2017-03-28
摘要: A computing system includes a host processor, an access processor having a command port, a near memory accelerator, and a memory unit. The system is adapted to run a software program on the host processor and to offload an acceleration task of the software program to the near memory accelerator. The system is further adapted to provide, via the command port, a first communication path for direct communication between the software program and the near memory accelerator, and to provide, via the command port and the access processor, a second communication path for indirect communication between the software program and the near memory accelerator. A related computer implemented method and a related computer program product are also disclosed.
-
-
-
-
-
-
-
-
-