摘要:
Apparatus for processing data includes processing circuitry 16, 18, 20, 22, 24, 26 and decoder circuitry 14 for decoding program instructions. The program instructions decoded include a floating point pre-conversion instruction which performs round-to-nearest ties to even rounding upon the mantissa field of an input floating number to generate an output floating point number with the same mantissa length but with the mantissa rounded to a position corresponding to a shorter mantissa field. The output mantissa field includes a suffix of zero values concatenated the rounded value. The decoder for circuitry 14 is also responsive to an integer pre-conversion instruction to quantize and input integer value using round-to-nearest ties to even rounding to form an output integer operand with a number of significant bits matched to the mantissa size of a floating point number to which the integer is later to be converted using an integer-to-floating point conversion instruction.
摘要:
A graphics processing unit 2 includes a texture pipeline 6 having a first pipeline portion 18 and a second pipeline portion 20. A subject instruction within the first pipeline portion 18 is recirculated within the first pipeline portion 18 until descriptor data to be loaded from a memory 4 by that subject instruction has been cached within a shared descriptor cache 22. When the descriptor has been stored within the shared descriptor cache 22, then the subject instruction is passed to the second pipeline portion 20 where further processing operations are performed and the subject instruction recirculated until those further processing operations have completed. The descriptor data is locked within the shared descriptor cache 22 until there are no pending subject instructions within the texture pipeline 6 which required to use that descriptor data.
摘要:
A graphics processing unit 2 includes a texture pipeline 6 having a first pipeline portion 18 and a second pipeline portion 20. A subject instruction within the first pipeline portion 18 is recirculated within the first pipeline portion 18 until descriptor data to be loaded from a memory 4 by that subject instruction has been cached within a shared descriptor cache 22. When the descriptor has been stored within the shared descriptor cache 22, then the subject instruction is passed to the second pipeline portion 20 where further processing operations are performed and the subject instruction recirculated until those further processing operations have completed. The descriptor data is locked within the shared descriptor cache 22 until there are no pending subject instructions within the texture pipeline 6 which required to use that descriptor data.
摘要:
A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.
摘要:
A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.
摘要:
A data processing apparatus comprises processing circuitry arranged to process processing threads using resources accessible to the processing circuitry. A pipeline is provided for handling at least two pending threads awaiting processing by the processing circuitry. The pipeline includes at least one resource-requesting pipeline stage for requesting access to resources for the pending threads. A priority controller controls priority levels of the pending threads. The priority levels define a priority with which pending threads are granted access to resources. When a pending thread reaches a final pipeline stage, if the request resources are not yet available then the priority level of that thread is raised selectively and the thread is returned to a first pipeline stage of the pipeline. If the requested resources are available then the thread is forwarded from the pipeline.
摘要:
Apparatus for processing data includes processing circuitry 16, 18, 20, 22, 24, 26 and decoder circuitry 14 for decoding program instructions. The program instructions decoded include a floating point pre-conversion instruction which performs round-to-nearest ties to even rounding upon the mantissa field of an input floating number to generate an output floating point number with the same mantissa length but with the mantissa rounded to a position corresponding to a shorter mantissa field. The output mantissa field includes a suffix of zero values concatenated the rounded value. The decoder for circuitry 14 is also responsive to an integer pre-conversion instruction to quantise and input integer value using round-to-nearest ties to even rounding to form an output integer operand with a number of significant bits matched to the mantissa size of a floating point number to which the integer is later to be converted using an integer-to-floating point conversion instruction.
摘要:
A memory management arrangement includes a memory management unit 1, a cache memory 2 and a queue arrangement 3. The queue 3 is a first-in, first-out (FIFO) buffer which can queue failed memory access requests and return them as inputs to the memory management unit 1 via the bus 5 for retrying through the memory management unit at a later time.If a memory access request sent to the memory management unit 1 experiences a cache “miss”, instead of blocking memory access requests until the required address data has been loaded into the cache 2, the memory management unit 1 operates to place the failed memory access request in the replay queue 3, and allows subsequent memory access requests to continue.The failed memory access requests in the queue 3 are then continuously circulated through the memory management unit 1 from the queue alternately with new memory access requests from other access initiators 4.
摘要:
A graphics processing unit 2 includes a texture pipeline 6 which performs filter operations upon texture values. If the texture values are integer texture values, then they may be processed by the texture pipeline in a variable order corresponding to the order in which they are retrieved from a memory 4. If the texture values are floating point texture values, then they are processed in a fixed order in order to ensure result invariants as the filter operation is non-associative for floating point values. The filter operation is not commenced until all of the floating point texture values have been retrieved from the memory 4 and other available for processing.
摘要:
A data processing apparatus and method are provided for processing a received workload in order to generate result data. A thread group generator generates from the received workload a plurality of thread groups to be executed to process the received workload. Each thread may be either an active thread whose output is required to form the result data, or a dummy thread required to resolve the inter-thread dependency for one of the active threads but whose output is not required to form the result data. Execution flow modification circuitry is responsive to the received thread group having at least one dummy thread, to cause the thread execution unit to selectively omit at least part of the execution of at least one of the plurality of instructions when executing each dummy thread, in dependence on control information associated with the predetermined program.