Abstract:
An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
Abstract:
A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
Abstract:
An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
Abstract:
A system on a chip (SoC) is provided including processing cores and a root complex. The transaction requests are communicated between a root port of the root complex and a device, the root port including electrical idle (EI) exit detect circuitry and a reference clock source. The root port supports a first link state, in which the reference clock source and EI exit detect circuitry of the root port are disabled but a common mode voltage is maintained, and a second link state, in which the reference clock source and EI exit detect circuitry are disabled and the common mode voltage is not maintained. The root port transitions to the first link state based on a service latency requirement of the device being less than a threshold and to the second link state based on the service latency requirement being greater than or equal to the threshold.
Abstract:
An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
Abstract:
A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.
Abstract:
A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
Abstract:
An apparatus and method are described for enforcement of reserved bits. For example, one embodiment of a processor comprises: a memory management unit to store a set of bits including a set of reserved bits to a system memory; reserved bit enforcement logic to generate a pseudo-random pattern in the reserved bits and an error correction code over the pseudo-random pattern prior to storing the reserved bits; the memory management unit to load the reserved bits including the pseudo-random pattern and the error correction code; the reserved bit enforcement logic to use the error correction code to determine whether the reserved bits have been modified by software; and if the reserved bits have been modified, then the processor to generate an error condition and if not modified, then the processor to continue normal execution.
Abstract:
A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.
Abstract:
An apparatus and method are described for enforcement of reserved bits. For example, one embodiment of a processor comprises: a memory management unit to store a set of bits including a set of reserved bits to a system memory; reserved bit enforcement logic to generate a pseudo-random pattern in the reserved bits and an error correction code over the pseudo-random pattern prior to storing the reserved bits; the memory management unit to load the reserved bits including the pseudo-random pattern and the error correction code; the reserved bit enforcement logic to use the error correction code to determine whether the reserved bits have been modified by software; and if the reserved bits have been modified, then the processor to generate an error condition and if not modified, then the processor to continue normal execution.