摘要:
An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, a processor according to one embodiment of the invention comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.
摘要:
In one embodiment, the present disclosure provides a method capable of processing a variety of different operations. A method according to one embodiment may include loading configuration data from a shared memory unit into a hardware configuration register, the hardware configuration register located within circuitry included within a hardware accelerator unit. The method may also include issuing a command set from a microengine to the hardware accelerator unit having the circuitry. The method may additionally include receiving the command set at the circuitry from the microengine, the command set configured to allow for the processing of a variety of different operations. The method may further include processing an appropriate operation based upon the configuration data loaded into the hardware configuration register. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要:
Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.
摘要:
A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.
摘要:
In one embodiment, the present disclosure provides a method capable of processing a variety of different operations. A method according to one embodiment may include loading configuration data from a shared memory unit into a hardware configuration register, the hardware configuration register located within circuitry included within a hardware accelerator unit. The method may also include issuing a command set from a microengine to the hardware accelerator unit having the circuitry. The method may additionally include receiving the command set at the circuitry from the microengine, the command set configured to allow for the processing of a variety of different operations. The method may further include processing an appropriate operation based upon the configuration data loaded into the hardware configuration register. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要:
The present disclosure provides a method for generating RAID syndromes. In one embodiment the method may include loading a first data byte of a first disk block and a first data byte of a second disk block from a storage device to an arithmetic logic unit. The method may further include XORing the first data byte of the first disk block and the first data byte of the second disk block to generate a first result and storing the first result in a results buffer. The method may also include iteratively repeating, loading intermediate data bytes corresponding to the first disk block and intermediate data bytes corresponding to the second disk block from the storage device to the arithmetic logic unit. The method may additionally include XORing the intermediate data bytes corresponding to the first disk block and the intermediate data bytes corresponding to the second disk block to generate intermediate results and generating a RAID syndrome based on, at least in part, the intermediate results. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要:
Methods and apparatus for low-latency link compression schemes. Under the schemes, selected packets or messages are dynamically selected for compression in view of current transmit queue levels. The latency incurred during compression and decompression is not added to the data-path, but sits on the side of the transmit queue. The system monitors the queue depth and, accordingly, initiates compression jobs based on the depth. Different compression levels may be dynamically selected and used based on queue depth. Under various schemes, either packets or messages are enqueued in the transmit queue or pointers to such packets and messages are enqueued. Additionally, packets/message may be compressed prior to being enqueued, or after being enqueued, wherein an original uncompressed packet is replaced with a compressed packet. Compressed and uncompressed packets may be stored in queues or buffers and transmitted using a different numbers of transmit cycles based on their compression ratios. The schemes may be implemented to improve the effective bandwidth of various types of links, including serial links, bus-type links, and socket-to-socket links in multi-socket systems.
摘要:
Techniques are described herein to overlay and merge any number of tables of equivalent size and structure. Bits or patterns of bits that are similar among tables may be set to a voltage value representative of respective logical ‘0’ or ‘1’. The bits that are different among the tables may be connected to either the value of a table selection signal or its inverse.
摘要:
In one embodiment, the present invention includes a processor having logic to perform a round of a cryptographic algorithm responsive to first and second round micro-operations to perform the round on first and second pairs of columns, where the logic includes dual datapaths that are half the width of the cryptographic algorithm width (or smaller). Additional logic may be used to combine the results of the first and second round micro-operations to obtain a round result. Other embodiments are described and claimed.
摘要:
An embodiment includes at least one processing unit to perform at least first and second sets of diffusion-related operations to produce a resulting block from a data block, and that includes at least one stage and at least one other stage. The at least one stage is to select one of first operands and second operands input to the at least one other stage. The first and second operands are respectively associated with the first and second sets of operations, respectively. The at least one other stage involves arithmetic and logical operations common to both the first and second sets of operations. At least one other processing unit is to perform at least one set of cryptographic-related operations (different, at least in part, from the first and second sets of operations) on at least one of (1) another block to produce the data block and (2) the resulting block.