摘要:
Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
摘要:
A method for converting a signed fixed point number into a floating point number that includes reading an input number corresponding to a signed fixed point number to be converted, determining whether the input number is less than zero, setting a sign bit based upon whether the input number is less than zero or greater than or equal to zero, computing a first intermediate result by exclusive-ORing the input number with the sign bit, computing leading zeros of the first intermediate result, padding the first intermediate result based upon the sign bit, computing a second intermediate result by shifting the padded first intermediate result to the left by the leading zeros, computing an exponent portion and a fraction portion, conditionally incrementing the fraction portion based on the sign bit, correcting the exponent portion and the fraction portion if the incremented fraction portion overflows, and returning the floating point number.
摘要:
Instead of having a processor with an instruction set architecture (ISA) that includes fixed architected operands, an improved processor supports additional characteristic bits for computing instructions (e.g., a multiply-add, load/store instructions). Such additional bits for the certain instructions influence the processing of these instructions by the processor. Also, a new instruction is introduced for further usage of the proposed method. Typically these additional characteristic bits as well as the instruction can be automatically generated by compilers to provide relatively well-suited instruction sequences for the processor.
摘要:
The invention relates to a method and system for the design and implementation of state machine engines. A first constraints checking step checks a state transition function created by a designer against constraints imposed by the implementation technology in order to detect all portions of the state transition function that are in conflict with the constraints. A subsequent conflict resolution step tries to determine one or more suggested ways to meet the conflicting constraints, by investigating how the original state transition function can be modified such that all constraints are met. A final presentation and selection step provides the designer textual and/or graphically results of the constraints check and suggested modifications. The modifications can be accepted interactively, or the state transition function can be changed manually. In the latter case, the modified state transition function will be processed starting again with the constraints checking step.
摘要:
The present, invention improves the hash table lookup operation by using a new processor cache architecture. A speculative processing of entries stored in the cache is combined with a delayed evaluation of cache entries. The speculative processing means that for each cache entry retrieved from main memory in a step of the hash table lookup operation it is assumed that it already contains the selected hash table entry. The delayed evaluation means that certain steps of the lookup operation are performed in parallel with others. In advantageous embodiments the invention can also be used in conjunction with a hierarchy of inclusive caches. The preferred embodiments of the invention involve a new approach for a transition rule cache of a BaRT-FSM controller.
摘要:
The invention relates to a method of optimizing a state transition function specification for a state machine engine based on a probability distribution for the state transitions. For the preferred embodiment of the invention, a B-FSM state machine engine accesses a transition rule memory using a processor cache. The invention allows improving the cache hit rate by exploiting the probability distribution. The N transition rules that comprise a hash table entry will be loaded in a burst mode from the main memory, from which the N transition rules are transferred to the processor cache. Because the comparison of the actual state and input values against each of the transition rules can immediately start after each of these rules has been received, the overall performance is improved as the transition rule that is most likely to be selected is the first to be transferred as part of the burst access.
摘要:
A method detects active power dissipation in an integrated circuit. The method includes receiving a hardware design for the integrated circuit having one or more clock domains, wherein the hardware design comprises a local clock buffer for a clock domain, wherein the local clock buffer is configured to receive a clock signal and an actuation signal. The method includes adding instrumentation logic to the design for the clock domain, wherein the instrumentation logic is configured to compare a first value of the actuation signal determined at a beginning point of a test period to a second value of the actuation signal determined at a time when the clock domain is in an idle condition. The method includes detecting the clock domain includes unintended active power dissipation, in response to the first value of the actuation signal not being equal to the second value of the actuation signal.
摘要:
Various systems and processes may be used to speed up multi-threaded execution. In certain implementations, a system and process may include the ability to write results of a first group of execution units associated with a first register file into the first register file using a first write port of the first register file and write results of a second group of execution units associated with a second register file into the second register file using a first write port of the second register file. The system, apparatus, and process may also include the ability to connect, in a shared register file mode, results of the second group of execution units to a second write port of the first register file and connect, in a split register file mode, results of a part of the first group of execution units to the second write port of the first register file.
摘要:
Various systems, apparatuses, processes, and/or products may be used to calculate an SHA-2 hash function in a general-purpose processor. In some implementations, a system, apparatus, process, and/or product may include the ability to calculate at least one SHA-2 sigma function by using an execution unit adapted for performing a processor instruction, the execution unit including an integrated circuit primarily designed for calculating the SHA-2 sigma function(s), and calculating the SHA-2 hash function with general-purpose hardware processing components of the processor based on the sigma function(s). In certain implementations, the calculation of the SHA-2 sigma function(s) can be performed by the integrated circuit within a single instruction, allowing for a faster calculation of the SHA-2 hash function.
摘要:
A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.