摘要:
A data processing apparatus and method are provided for performing rearrangement operations. The data processing apparatus has a register data store with a plurality of registers, each register storing a plurality of data elements. Processing circuitry is responsive to control signals to perform processing operations on the data elements. An instruction decoder is responsive to at least one but no more than N rearrangement instructions, where N is an odd plural number, to generate control signals to control the processing circuitry to perform a rearrangement process at least equivalent to: obtaining as source data elements the data elements stored in N registers of said register data store as identified by the at least one re-arrangement instruction; performing a rearrangement operation to rearrange the source data elements between a regular N-way interleaved order and a de-interleaved order in order to produce a sequence of result data elements; and outputting the sequence of result data elements for storing in the register data store. This provides a particularly efficient technique for performing N-way interleave and de-interleave operations, where N is an odd number, resulting in high performance, low energy consumption, and reduced register use when compared with known prior art techniques.
摘要:
A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element. Instruction decoder circuitry is responsive to a predicated multiply-accumulate instruction specifying as input operands a first input data element, a second input data element, and a predicate value, to generate control signals to control the data processing circuitry to perform a multiply-accumulate operation by: multiplying said first input data element and said second input data element to produce a multiplication data element; if the predicate value has a first value, producing a result accumulate data element by adding the multiplication data element to an initial accumulate data element; and if the predicate value has a second value, producing the result accumulate data element by subtracting the multiplication data element from the initial accumulate data element. Such an approach provides a particularly efficient mechanism for performing complex sequences of multiply-add and multiply-subtract operations, facilitating improvements in performance, energy consumption and code density when compared with known prior art techniques.
摘要:
An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing Single Instruction Multiple Data (SIMD) processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation. The associated method involves controlling processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation in response to a combined rearrangement arithmetic instruction and providing the scalar logic size parameter to configure the rearrangement operation. A computer program product is also provided comprising at least one combined rearrangement arithmetic instruction.
摘要:
Arithmetic coding utilises probability values associated with contexts and context indexed values. The probability values are stored within a random access memory 6 from where they are fetched to a cache memory 8 before being supplied to an arithmetic encoder and decoder 4. The context indexed values used are mapped to the plurality of contexts employed such that context indexed values used to process data values close by in a position within the stream of data values being processed have a greater statistical likelihood of sharing a group of contexts than context values used to process data values far away in position within the stream of data values. Thus, a group of contexts for which the probability values are fetched together into the cache memory 8 will have an increased statistical likelihood of being used together in close proximity in processing the stream of data values. This reduces the number of cache flush operations and cache line fill operations.
摘要:
An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering.
摘要:
A method, computer program product and data processing apparatus for filtering data, in particular for use in deblocking filters. The method comprising applying a plurality of m filter coefficients which each have a value which is a negative power of two and which sum to one, to a plurality of m input data items to produce a filtered output data item, by performing a sequence of averaging calculations comprising averaging input data items to which a smallest filter coefficient is to be applied to produce first averaged data and averaging the first averaged data with other averaged input data or with input data items to which larger filter coefficients are to be applied the plurality of m filter coefficients being applied to the plurality of m input data items via a sequence of averaging calculations such that a data width of any calculated data does not exceed that of the input data being averaged.
摘要:
Character codes 2 representing pictograph font characters 6 may be used to determine an address 8 within a variable length coded data stream 10 of pixel data for the whole font relevant to the character 6 concerned. This access is via a two level table lookup with the first table level Table 1 returning an initial offset HuffOff within the coded data stream, an average size AvSz of data for a character and a pointer TB2Off to a second table Table 2. The second table is then used to lookup an error value Err to correct an estimate of the address generated from the information in the first table using the error value Err and the position N within the second table Table 2 that led to the match. The pixel bitmaps 36 for pictograph characters 6 can be divided into smaller tiles 38 and each of these tiles given a code. The tile codes may then be Huffman coded to provide highly efficient compression of the pixel bitmap font data.
摘要:
Character codes 2 representing pictograph font characters 6 may be used to determine an address 8 within a variable length coded data stream 10 of pixel data for the whole font relevant to the character 6 concerned. This access is via a two level table lookup with the first table level Table 1 returning an initial offset HuffOff within the coded data stream, an average size AvSz of data for a character and a pointer TB2Off to a second table Table 2. The second table is then used to lookup an error value Err to correct an estimate of the address generated from the information in the first table using the error value Err and the position N within the second table Table 2 that led to the match. The pixel bitmaps 36 for pictograph characters 6 can be divided into smaller tiles 38 and each of these tiles given a code. The tile codes may then be Huffman coded to provide highly efficient compression of the pixel bitmap font data.
摘要:
A digital signal processing system comprising a central processing unit core 2, a memory 8 and a coprocessor 4 operates using coprocessor memory access instructions (e.g. LDC, STC). The addressing mode information within these coprocessor memory access instructions (P, U, W, Offset) not only controls the addressing mode used by the central processing unit core 2 but is also used by the coprocessor 4 to determine the number of data words in the transfer being specified such that the coprocessor 4 can terminate the transfer at the appropriate time. Knowledge in advance of the number of words in a transfer is also advantageous in some bus systems, such as those that can be used with synchronous DRAM. The Offset field within the instruction may be used to specify changes to be made in the value provided by the central processing unit core 2 upon execution of a particular instruction and also to specify the number of words in the transfer. This arrangement is well suited to working through a regular array of data such as in digital signal processing operations. If the Offset field is not being used, then the number of words to be transferred may default to 1.
摘要:
An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry. In response to those control signals, the SIMD data processing circuitry performs the plurality of iterations of a multiply-accumulate process, each iteration involving performance of N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements. For each iteration, the SIMD data processing circuitry determines N input data elements from said first vector and a single coefficient data element from the second vector to be multiplied with each of the N input data elements. The N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process are then used to produce N multiply-accumulate results. This mechanism provides a particularly energy efficient mechanism for performing SIMD multiply-accumulate operations, as for example are required for FIR filter processes.