Monitoring performance of a processing device to manage non-precise events

    公开(公告)号:US10365988B2

    公开(公告)日:2019-07-30

    申请号:US15705854

    申请日:2017-09-15

    申请人: Intel Corporation

    IPC分类号: G06F11/00 G06F11/34

    摘要: Embodiments disclosed herein provide for monitoring performance of a processing device to manage non-precise events. A processing device includes a performance counter to track a non-precise event and to increment upon occurrence of the non-precise event, wherein the non-precise event comprises a first type of performance event that is not linked to an instruction in an instruction trace. The processing device also includes a first handler circuit to generate and store a first record, the first record comprising architectural metadata defining a state of the processing device at a time of generation of the first record, wherein the first handler circuit to generate records corresponding to precise events. The processing device further includes a second handler circuit communicably coupled to the first handler circuit, the second handler circuit to cause the first handler circuit to generate a second record for the non-precise event upon overflow of the performance counter.

    Apparatuses, methods, and systems for memory disambiguation

    公开(公告)号:US10067762B2

    公开(公告)日:2018-09-04

    申请号:US15201218

    申请日:2016-07-01

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Apparatuses, methods, and systems relating to memory disambiguation are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, an execution unit to execute the decoded instruction, a retirement unit to retire an executed instruction in program order, and a memory disambiguation circuit to allocate an entry in a memory disambiguation table for a first load instruction that is to be flushed for a memory ordering violation, the entry comprising a counter value and an instruction pointer for the first load instruction.

    Monitoring performance of a processing device to manage non-precise events

    公开(公告)号:US09766999B2

    公开(公告)日:2017-09-19

    申请号:US14292140

    申请日:2014-05-30

    申请人: Intel Corporation

    IPC分类号: G06F11/00 G06F11/34

    摘要: In accordance with embodiments disclosed herein, there is provided systems and methods for monitoring performance of a processing device to manage non-precise events. A processing device includes a performance counter to increment upon occurrence of a non-precise event in the processing device. The processing device also includes a precise event based sampling (PEBS) enable control communicably coupled to the performance counter. The processing device also includes a PEBS handler to generate and store a PEBS record including an architectural metadata defining a state of the processing device at a time of generation of the PEBS record. The processing device further includes a non-precise event based sampling (NPEBS) module communicably coupled to the PEBS control and the PEBS handler. The NPEBS module causes the PEBS handler to generate the PEBS record for the non-precise event upon overflow of the performance counter.

    METHODS, SYSTEMS, AND APPARATUSES FOR VARIABLE WIDTH UNALIGNED FETCH IN A PROCESSOR

    公开(公告)号:US20240220253A1

    公开(公告)日:2024-07-04

    申请号:US18148397

    申请日:2022-12-29

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F12/0875

    CPC分类号: G06F9/30047 G06F12/0875

    摘要: Techniques for implementing a variable width unaligned fetch for instructions are described. In certain examples, a hardware processor core includes fetch circuitry to perform a single fetch operation to fetch from a paged memory: (i) a multiple cache line width of instruction data, between a minimum width that is greater than one cache line and a maximum width that is a plurality of cache lines, when the multiple cache line width of the instruction data does not include a page boundary of the paged memory, and (ii) less than or equal to one cache line width of the instruction data when the multiple cache line width of the instruction data does include the page boundary of the paged memory; decoder circuitry to decode a single instruction, comprising an opcode, from the instruction data into a decoded instruction; and execution circuitry to execute the decoded instruction according to the opcode.

    Physical register table for eliminating move instructions

    公开(公告)号:US10417001B2

    公开(公告)日:2019-09-17

    申请号:US13728416

    申请日:2012-12-27

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38 G06F12/02

    摘要: Embodiments of an invention for a physical register table for eliminating move instructions are disclosed. In one embodiment, a processor includes a physical register file, a register allocation table, and a physical register table. The register allocation table is to store mappings of logical registers to physical registers. The physical register table is to store entries including pointers to physical registers in the mappings. The number of entry locations in the physical register table is less than the number of physical registers in the physical register file.

    System and Method for Load Balancing in Out-of-Order Clustered Decoding

    公开(公告)号:US20180088956A1

    公开(公告)日:2018-03-29

    申请号:US15280460

    申请日:2016-09-29

    申请人: Intel Corporation

    发明人: Jonathan D. Combs

    IPC分类号: G06F9/38 G06F9/30

    摘要: A processor includes a back end to execute decoded instructions and a front end. The front end includes two decode clusters and circuitry to receive data elements representing undecoded instructions, in program order, and to direct subsets of the data elements to the decode clusters. An IP generator directs one subset of data elements to the first cluster, detects a condition indicating that a load balancing action should be taken, and directs a subset of data elements immediately following the first subset in program order to the first or second decode cluster dependent on the action taken. The action may include annotating a BTB entry, inserting a fake branch in the BTB, forcing a cluster switch, or suppressing a cluster switch. The detected condition may be a predicated taken branch or an annotation thereof, or a heuristic based on a queue state, a count of uops, or a latency value.

    System and method for load balancing in out-of-order clustered decoding

    公开(公告)号:US10331454B2

    公开(公告)日:2019-06-25

    申请号:US15280460

    申请日:2016-09-29

    申请人: Intel Corporation

    发明人: Jonathan D. Combs

    IPC分类号: G06F9/38

    摘要: A processor includes a back end to execute decoded instructions and a front end. The front end includes two decode clusters and circuitry to receive data elements representing undecoded instructions, in program order, and to direct subsets of the data elements to the decode clusters. An IP generator directs one subset of data elements to the first cluster, detects a condition indicating that a load balancing action should be taken, and directs a subset of data elements immediately following the first subset in program order to the first or second decode cluster dependent on the action taken. The action may include annotating a BTB entry, inserting a fake branch in the BTB, forcing a cluster switch, or suppressing a cluster switch. The detected condition may be a predicated taken branch or an annotation thereof, or a heuristic based on a queue state, a count of uops, or a latency value.