ENABLING HIERARCHICAL DATA LOADING IN A RESISTIVE PROCESSING UNIT (RPU) ARRAY FOR REDUCED COMMUNICATION COST

    公开(公告)号:US20220300797A1

    公开(公告)日:2022-09-22

    申请号:US17203705

    申请日:2021-03-16

    摘要: An electronic circuit includes word lines; bit lines intersecting the word lines at a plurality of grid points; and resistive processing units located at the grid points. Baseline stochastic pulse input units are coupled to the word lines; differential stochastic pulse input units are coupled to the word lines; and bitline stochastic pulse input units are coupled to the bit lines. Control circuitry coupled to the pulse input units is configured to cause each of the baseline stochastic pulse input units to generate a baseline pulse train using base input data, each of the differential stochastic pulse input units to generate a differential pulse train using differential input data defining differences from the base input data, and each of the bitline stochastic pulse input units to generate a bitline pulse train using bit line input data. Neural network weights can thus be stored in the resistive processing units.

    BIT-SERIAL COMPUTATION WITH DYNAMIC FREQUENCY MODULATION FOR ERROR RESILIENCY IN NEURAL NETWORK

    公开(公告)号:US20210208847A1

    公开(公告)日:2021-07-08

    申请号:US16737440

    申请日:2020-01-08

    IPC分类号: G06F7/523 G06F7/50 G06N3/08

    摘要: A system is provided for error resiliency in a bit serial computation. A delay monitor enforces an overall processing duration threshold for bit-serial processing all iterations for the bit serial computation, while determining a threshold for processing each iteration. At least some iterations correspond to a respective bit in an input bit sequence. A clock generator generates a clock signal for controlling a performance of the iterations. Each of iteration units perform a particular iteration, starting with a Most Significant Bit (MSB) of the input bit sequence and continuing in descending bit significant order, and by selectively increasing the threshold for at least one iteration while skipping from processing at least one subsequent iteration whose iteration-level processing duration exceeds a remaining amount of an overall processing duration for all iterations, responsive to the at least one iteration requiring more time to complete than a current value of the threshold.

    Machine learning accelerator with decision tree interconnects

    公开(公告)号:US12112242B2

    公开(公告)日:2024-10-08

    申请号:US16986506

    申请日:2020-08-06

    IPC分类号: G06N20/00 G06N5/01

    CPC分类号: G06N20/00 G06N5/01

    摘要: Techniques for performing improved machine learning using decision trees are disclosed. In one example, a system includes a plurality of decision tree structures, and configuration logic operatively coupled to the plurality of decision tree structures. The configuration logic selectively configures the plurality of decision tree structures to form at least one of: one or more combined decision tree structures, wherein a combined decision tree structure comprises multiple interconnected ones of the plurality of decision tree structures; and one or more individual decision tree structures, wherein an individual decision tree structure comprises a single one of the plurality of decision tree structures.

    Similarity-based hierarchical data loading for machine learning training

    公开(公告)号:US11354595B2

    公开(公告)日:2022-06-07

    申请号:US16837133

    申请日:2020-04-01

    摘要: Original data for machine learning training can be received. The original data can be divided into baseline data and difference data. The baseline data and the difference data can be stored in different memory devices of the memory hierarchy associated with a computer, wherein the baseline data is stored in a first memory device having faster access speed than a second memory device in which the difference data is stored. The baseline data and the difference data can be loaded from the different memory devices. The original data can be reconstructed from the baseline data and the difference data. The reconstructed original data can be fed to a machine learning model to train the machine learning model.

    JOB SCHEDULING BASED ON NODE AND APPLICATION CHARACTERISTICS

    公开(公告)号:US20190163540A1

    公开(公告)日:2019-05-30

    申请号:US15827208

    申请日:2017-11-30

    IPC分类号: G06F9/50 G06F9/455

    摘要: Aspects of the present invention disclose a method, computer program product, and system for scheduling an application. The method includes one or more processors receiving a task, the task includes instructions indicating desired nodes to perform the task through programs. The method further includes one or more processors identifying application characteristic information and node characteristic information associated with nodes within a data center composed of nodes. The application characteristic information includes resource utilization information for applications on nodes within the data center. The method further includes one or more processors determining that the nodes reach a threshold level of power consumption. The threshold level is a pre-set maximum amount of power utilized by a node within the data center. The method further includes one or more processors determining a node consuming an amount of power that is below a threshold level of power consumption in the data center.

    POWER EFFICIENCY-AWARE NODE COMPONENT ASSEMBLY

    公开(公告)号:US20190033944A1

    公开(公告)日:2019-01-31

    申请号:US15658494

    申请日:2017-07-25

    IPC分类号: G06F1/32

    摘要: Sub-components assembled into a computer are selected based on sub-component power efficiency levels (for example, low, medium, high) and/or anticipated usage of the computer. Multiple units of each type of sub-component (for example, a CPU) are tested to determine a power efficiency level of each unit. Computers in which sub-component efficiency levels are desired to match an overall computer efficiency level, receive sub-component units of corresponding efficiency level. Computers anticipated to run applications that make intensive use of a given type of sub-component receive the given units having a higher efficiency level. Computers anticipated to run applications that make little use of a given type of sub-component receive a physical unit having a lower efficiency level. Computers anticipated to run a wide variety of applications of no particular usage intensity for a given type of sub-component, receive a unit having an average efficiency level.

    Thermal-and spatial-aware task scheduling

    公开(公告)号:US09817697B2

    公开(公告)日:2017-11-14

    申请号:US15080689

    申请日:2016-03-25

    IPC分类号: G06F9/46 G06F9/48 G06F9/50

    CPC分类号: G06F9/5094

    摘要: A method, apparatus, and computer program product are provided for thermal- and spatial-aware task scheduling. The method may include monitoring a temperature for each core of a central processing unit having a plurality of cores; determining, from the monitoring, a set of hotspot cores from the plurality of cores determining temperature information and distance information for each hotspot core in the set of hotspot cores relative to each of the other cores on the central processing unit; calculating a placement metric for each core of the central processing unit based at least on the determined distance information and the determined temperature information; and scheduling a task by allocating the task to one or more cores of the central processing unit according to the placement metric.