SIMD Operand Permutation with Selection from among Multiple Registers

    公开(公告)号:US20230325196A1

    公开(公告)日:2023-10-12

    申请号:US18299452

    申请日:2023-04-12

    Applicant: Apple Inc.

    CPC classification number: G06F9/3887 G06T1/60 G06T1/20 G06F9/30098

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Floating-point Division Circuitry with Subnormal Support

    公开(公告)号:US20220317971A1

    公开(公告)日:2022-10-06

    申请号:US17218041

    申请日:2021-03-30

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.

    Datapath circuitry for math operations using SIMD pipelines

    公开(公告)号:US11256518B2

    公开(公告)日:2022-02-22

    申请号:US16597625

    申请日:2019-10-09

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.

    SIMD Operand Permutation with Selection from among Multiple Registers

    公开(公告)号:US20210406031A1

    公开(公告)日:2021-12-30

    申请号:US17470682

    申请日:2021-09-09

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

    Power saving with dynamic pulse insertion

    公开(公告)号:US10270434B2

    公开(公告)日:2019-04-23

    申请号:US15046926

    申请日:2016-02-18

    Applicant: Apple Inc.

    Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N−1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

    Parallel processing circuitry for encoded fields of related threads

    公开(公告)号:US10089077B1

    公开(公告)日:2018-10-02

    申请号:US15402820

    申请日:2017-01-10

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus performs an arithmetic operation using first circuitry, on type value inputs for different threads that are encoded to represent values to be operated on by the first circuitry. In some embodiments, second arithmetic circuitry is configured to perform an arithmetic operation on an output of the first circuitry and an input (e.g., address information such as a base and an offset) that is common to the different threads and has a greater number of bits than the output of the first circuitry. In various embodiments, disclosed techniques may allow decoding of encoded values for different threads (which may reduce memory requirements relative to non-encoded values) with a shorter critical path and lower power consumption, e.g., relative to sequential decoding.

    Floating-point division circuitry with subnormal support

    公开(公告)号:US11836459B2

    公开(公告)日:2023-12-05

    申请号:US17218041

    申请日:2021-03-30

    Applicant: Apple Inc.

    CPC classification number: G06F7/4873 G06F7/4876 G06F7/49984

    Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.

    Routing circuitry for permutation of single-instruction multiple-data operands

    公开(公告)号:US11294672B2

    公开(公告)日:2022-04-05

    申请号:US16548812

    申请日:2019-08-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset. A second level may be configured to select an operand from a previous level for each of the pipelines, and may select from among only a portion of output operands from the previous level to provide an operand for a respective pipeline.

    Pessimistic dependency handling based on storage regions

    公开(公告)号:US10114650B2

    公开(公告)日:2018-10-30

    申请号:US14629464

    申请日:2015-02-23

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive an instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.

    OPERAND CACHE CONTROL TECHNIQUES
    10.
    发明申请
    OPERAND CACHE CONTROL TECHNIQUES 有权
    操作缓存控制技术

    公开(公告)号:US20170075810A1

    公开(公告)日:2017-03-16

    申请号:US14851859

    申请日:2015-09-11

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.

    Abstract translation: 公开了关于操作数高速缓存的每流水线控制的技术。 在一些实施例中,装置包括寄存器文件和多个执行流水线。 在一些实施例中,设备还包括操作数高速缓存,其包括多个条目,每个条目包括多个部分,每个部分被配置为存储对应的执行管线的操作数。 在一些实施例中,操作数高速缓存在设备的操作期间被配置为仅在条目的部分的子集中存储数据。 在一些实施例中,所述装置被配置为针对操作数高速缓存中的每个条目存储指示该条目是否有效的每条目有效值以及指示每个部分的数据是否有效的每部分状态信息以及数据是否 对于每个部分相对于寄存器文件中相应条目中的数据进行修改。

Patent Agency Ranking