Abstract:
A communication configuration apparatus for constructing a communication topology structure based on a plurality of processing nodes may be included in a combined processing apparatus. The combined processing apparatus further includes an interconnection interface and other processing apparatus. The communication configuration apparatus interacts with other processing apparatus to jointly complete a computing operation specified by a user. The combined processing apparatus further includes a storage apparatus. The storage apparatus is connected to the communication configuration apparatus and other processing apparatuses, respectively. The storage apparatus is used for storing data of the communication configuration apparatus and other processing apparatus. A technical solution of the present disclosure may improve efficiency of inter-chip communication.
Abstract:
In a data processing method in the field of artificial intelligence, a first processor of a data processing device sends a first search message to a second processor, and receives a second search message from a third processor. The first search message includes first data, and is for searching for an embedding parameter of the first data. The second search message includes second data and is for searching for an embedding parameter of the second data. The second processor and the third processor are respectively a next-hop processor and a previous-hop processor of the first processor in a ring communication architecture in which the first processor is located. The first, second, and third processors are among multiple processors in a data training system.
Abstract:
Techniques for multi-dimensional network sorted array merging. A first switch of a plurality of switches of an apparatus may receive a first element of a first array and a first element of a second array. The first switch may determine that the first element of the first array is less than the first element of the second array. The first switch may cause the first element of the first array to be stored as a first element of an output array.
Abstract:
Systems, computer-implemented methods, and computer program products to facilitate updating, such as averaging and/or training, of one or more statistical sets are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can include a computing component that averages a statistical set, provided by the system, with an additional statistical set, that is compatible with the statistical set, to compute an averaged statistical set, where the additional statistical set is obtained from a selected additional system of a plurality of additional systems. The computer executable components also can include a selecting component that selects the selected additional system according to a randomization pattern.
Abstract:
In an embodiment, a system on a chip (SOC) comprises a semiconductor die on which circuitry is formed, wherein the circuitry comprises a plurality of agents and a plurality of network switches coupled to the plurality of agents. The plurality of network switches are interconnected to form a plurality of physical and logically independent networks. A first network of the plurality of physically and logically independent networks is constructed according to a first topology and a second network of the plurality of physically and logically independent networks is constructed according to a second topology that is different from the first topology. For example, the first topology may a ring topology and the second topology may be a mesh topology. In an embodiment, coherency may be enforced on the first network and the second network may be a relaxed order network.
Abstract:
A computer, including a plurality of processing nodes arranged in two-dimensional arrays in respective front and rear layers. Each processing node has a set of activatable links. When activated, transmission of data items between the nodes connected via the activated link is enabled. When not activated, transmission of data items between the nodes is prevented. The set of activatable links including a respective link which connects the processing node to each adjacent node in the array, and to a facing processing node in the other layer. An allocation engine is configured to receive an allocation instruction and connected to the processing nodes to selectively activate the links in a configuration.
Abstract:
An interconnect for an integrated circuit communicating transactions between initiator Intellectual Property (IP) cores and multiple target IP cores coupled to the interconnect is generally described. The interconnect routes the transactions between the target IP cores and initiator IP cores in the integrated circuit. A first aggregate target of the target IP cores includes two or more memory channels that are interleaved in an address space for the first aggregate target in the address map. Each memory channel is divided up in defined memory interleave segments and then interleaved with memory interleave segments from other memory channels. An address map is divided up into two or more regions. Each interleaved memory interleave segment is assigned to at least one of those regions and populates the address space for that region, and parameters associated with the regions and memory interleave segments are configurable.
Abstract:
Provided are a reconfigurable processor and a method of operating the same, the reconfigurable processor including: a configurable memory configured to receive a task execution instruction from a control processor; and a plurality of reconfigurable arrays, each configured to receive configuration information from the configurable memory, wherein each of the plurality of reconfigurable arrays simultaneously executes a task based on the configuration information.
Abstract:
A data processing device includes: data processing stages having a processing element, a stage memory and an event controller; and an inter-stage bus connecting the stages via an access point. External and process completion events are input into the controller for generating a task start event toward the processing element according to the external and process completion events. Each access point has an access table storing a data write history when the processing element writes data in the memory in a memory access process. The processing element executes an event access process indicative of memory access process completion after the processing element completes the memory access process to the memory via the access point. The access point executes another event access process for inputting the process completion event into the controller of another stage, based on the data write history when the processing element executes the event access process.
Abstract:
A data processing node has an inter-node messaging module including a plurality of sets of registers each defining an instance of a GET/PUT context and a plurality of data processing cores each coupled to the inter-node messaging module. Each one of the data processing cores includes a mapping function for mapping each one of a plurality of user level processes to a different one of the sets of registers and thereby to a respective GET/PUT context instance. Mapping each one of the user level processes to the different one of the sets of registers enables a particular one of the user level processes to utilize the respective GET/PUT context instance thereof for performing a GET/PUT action to a ring buffer of a different data processing node coupled to the data processing node through a fabric without involvement of an operating system of any one of the data processing cores.