Systems, methods, and computer program products for performing mathematical operations
    3.
    发明授权
    Systems, methods, and computer program products for performing mathematical operations 有权
    用于执行数学运算的系统,方法和计算机程序产品

    公开(公告)号:US09489342B2

    公开(公告)日:2016-11-08

    申请号:US14127178

    申请日:2013-06-21

    申请人: Intel Corporation

    发明人: Niraj Gupta Karthik N

    IPC分类号: G06F17/10 G06F17/15 G06F17/16

    CPC分类号: G06F17/10 G06F17/15 G06F17/16

    摘要: The system has first, second, third, and fourth subsystems. Each subsystem has first and second multipliers coupled, respectively, to first and second adders. Each multiplier has two inputs. The first adder is coupled to a first output, a first accumulator, and a bit shifter. The bit shifter is coupled to a third adder. The third adder is coupled to a multiplexer. The multiplexer is coupled to a second output and a second accumulator. The second adder is coupled to the third adder and the multiplexer. The first outputs of the first and second subsystems are coupled directly to a fourth adder, the second outputs of the first and second subsystems are coupled directly to a fifth adder, the first outputs of the third and fourth subsystems are coupled directly to a sixth adder, and the second outputs of the third and fourth subsystems are coupled directly to a seventh adder.

    摘要翻译: 该系统具有第一,第二,第三和第四子系统。 每个子系统具有分别耦合到第一和第二加法器的第一和第二乘法器。 每个乘法器有两个输入。 第一加法器耦合到第一输出,第一累加器和位移位器。 位移器耦合到第三加法器。 第三加法器耦合到多路复用器。 复用器耦合到第二输出和第二累加器。 第二加法器耦合到第三加法器和多路复用器。 第一和第二子系统的第一输出直接耦合到第四加法器,第一和第二子系统的第二输出直接耦合到第五加法器,第三和第四子系统的第一输出直接耦合到第六加法器 并且第三和第四子系统的第二输出直接耦合到第七加法器。

    USING A GLOBAL BARRIER TO SYNCHRONIZE ACROSS LOCAL THREAD GROUPS IN GENERAL PURPOSE PROGRAMMING ON GPU
    5.
    发明申请
    USING A GLOBAL BARRIER TO SYNCHRONIZE ACROSS LOCAL THREAD GROUPS IN GENERAL PURPOSE PROGRAMMING ON GPU 有权
    使用全局障碍物同步在GPU上的一般目的编程中的本地螺纹组

    公开(公告)号:US20150187042A1

    公开(公告)日:2015-07-02

    申请号:US14563601

    申请日:2014-12-08

    申请人: Intel Corporation

    发明人: Niraj Gupta

    IPC分类号: G06T1/20 G06F9/38

    CPC分类号: G06F9/3851 G06F9/48

    摘要: Methods and systems may synchronize workloads across local thread groups. The methods and systems may provide for receiving, at a graphics processor, a workload from a host processor and receiving, at a plurality of processing elements, a plurality of threads that from one or more local thread groups. Additionally, the processing of the workload may be synchronized across the one or more thread groups. In one example, the global barrier determines that all threads across the thread groups have been completed without polling.

    摘要翻译: 方法和系统可以跨本地线程组同步工作负载。 所述方法和系统可以提供在图形处理器处从主机处理器接收工作负载并且在多个处理元件处接收来自一个或多个本地线程组的多个线程。 另外,工作负载的处理可以跨越一个或多个线程组同步。 在一个示例中,全局障碍确定线程组中的所有线程都已完成,而无需轮询。

    Techniques for connected component labeling
    6.
    发明授权
    Techniques for connected component labeling 有权
    连接组件标签技术

    公开(公告)号:US09042652B2

    公开(公告)日:2015-05-26

    申请号:US13666913

    申请日:2012-11-01

    申请人: Intel Corporation

    IPC分类号: G06K9/34 G06K9/46 G06T7/00

    摘要: An apparatus may include a memory, a processor circuit, and a connected component labeling module. The connected component labeling module may be operative of the processor circuit to determine one or more connected components during reading of an image comprising a multiplicity of pixels from the memory, assign a label to a plurality of pixels of the multiplicity of pixels, generate one or more label connections for a respective one or more labels, each label connection linking a higher label to a lowest label for the same connected component, and write to the memory for each label of the one or more labels a lowest label as defined by the label connection for the each label after a label is assigned to each pixel.

    摘要翻译: 装置可以包括存储器,处理器电路和连接的部件标签模块。 连接的组件标注模块可操作于处理器电路,以在从存储器读取包括多个像素的图像的读取期间确定一个或多个连接的组件,将标签分配给多个像素的多个像素,生成一个或多个 针对相应的一个或多个标签的更多标签连接,每个标签连接将较高标签链接到相同连接部件的最低标签,并且向该存储器写入一个或多个标签的每个标签,该标签由标签定义 将标签分配给每个像素后,每个标签的连接。

    Initiation of cache flushes and invalidations on graphics processors
    7.
    发明授权
    Initiation of cache flushes and invalidations on graphics processors 有权
    在图形处理器上启动缓存刷新和无效

    公开(公告)号:US09563561B2

    公开(公告)日:2017-02-07

    申请号:US13926328

    申请日:2013-06-25

    申请人: Intel Corporation

    IPC分类号: G06F12/08

    摘要: Methods and systems may provide for receiving, at a graphics processor, a workload from a host processor and using a kernel on the graphics processor to issue a thread group for execution of the workload on the graphics processor. Additionally, one or more coherency messages may be initiated, by the graphics processor, in response to a thread-related condition of one or more caches on the graphics processor. In one example, the thread-related condition is associated with the execution of the workload on the graphics processor and indicates that the one or more caches on the graphics processor are not coherent with a system memory associated with the host processor.

    摘要翻译: 方法和系统可以提供在图形处理器处接收来自主机处理器的工作负载并且使用图形处理器上的内核来发布用于在图形处理器上执行工作负载的线程组。 另外,响应于图形处理器上的一个或多个高速缓存的线程相关状况,图形处理器可以启动一个或多个一致性消息。 在一个示例中,线程相关条件与图形处理器上的工作负载的执行相关联,并且指示图形处理器上的一个或多个高速缓存与与主机处理器相关联的系统存储器不一致。

    PARALLEL FLOOD-FILL TECHNIQUES AND ARCHITECTURE
    8.
    发明申请
    PARALLEL FLOOD-FILL TECHNIQUES AND ARCHITECTURE 审中-公开
    并行浮法技术和建筑

    公开(公告)号:US20150077422A1

    公开(公告)日:2015-03-19

    申请号:US14550214

    申请日:2014-11-21

    申请人: INTEL CORPORATION

    IPC分类号: G06T1/20

    CPC分类号: G06T1/20

    摘要: Flood-fill techniques and architecture are disclosed. In accordance with one embodiment, the architecture comprises a hardware primitive with a software interface which collectively allow for both data-based and task-based parallelism in executing a flood-fill process. The hardware primitive is defined to do the flood-fill function and is scalable and may be implemented with a bitwise definition that can be tuned to meet power/performance targets, in some embodiments. In executing a flood-fill operation, and in accordance with an example embodiment, the software interface produces parallel threads and issues them to processing elements, such that each of the threads can run independently until done. Each processing element in turn accesses a flood-fill hardware primitive, each of which is configured to flood a seed inside an N×M image block. In some cases, processing element commands to the flood-fill hardware primitive(s) can be queued and acted upon pursuant to an arbitration scheme.

    摘要翻译: 洪水填充技术和结构被公开。 根据一个实施例,该架构包括具有软件接口的硬件原语,该软件接口在执行洪水填充处理时共同允许基于数据和基于任务的并行性。 硬件原语被定义为执行洪水填充功能并且是可扩展的,并且可以在一些实施例中以可以调整以满足功率/性能目标的按位定义来实现。 在执行洪水填充操作时,并且根据示例性实施例,软件接口产生并行线程并将其发布到处理元件,使得每个线程可以独立运行直到完成。 每个处理元件依次访问洪水填充硬件图元,每个填充硬件图元被配置为在N×M图像块内淹没种子。 在某些情况下,根据仲裁方案,可以对洪水填充硬件原语的处理单元命令进行排队和执行。

    Efficient method and hardware implementation for nearest neighbor search

    公开(公告)号:US10380106B2

    公开(公告)日:2019-08-13

    申请号:US14564151

    申请日:2014-12-09

    申请人: Intel Corporation

    发明人: Niraj Gupta

    摘要: Systems and methods may provide feature matching in object-recognition applications. The systems and methods may determine various features of an object and determine what type of object to which the features correspond. The systems and methods may also detect objects within a database and extract vectors based on unique features of the objects. The extracted vectors may be stored in a memory such as a buffer. The extracted vectors may be used to match against a database of objects of interest or test vectors. Features within the objects may then be quickly and efficiently determined based on the best matches between the extracted vectors and the test vectors, thereby determining suitable best matches while avoiding the necessity to search the full database.

    Parallel flood-fill techniques and architecture

    公开(公告)号:US09972062B2

    公开(公告)日:2018-05-15

    申请号:US14550214

    申请日:2014-11-21

    申请人: INTEL CORPORATION

    IPC分类号: G06F15/80 G09G5/02 G06T1/20

    CPC分类号: G06T1/20

    摘要: Flood-fill techniques and architecture are disclosed. In accordance with one embodiment, the architecture comprises a hardware primitive with a software interface which collectively allow for both data-based and task-based parallelism in executing a flood-fill process. The hardware primitive is defined to do the flood-fill function and is scalable and may be implemented with a bitwise definition that can be tuned to meet power/performance targets, in some embodiments. In executing a flood-fill operation, and in accordance with an example embodiment, the software interface produces parallel threads and issues them to processing elements, such that each of the threads can run independently until done. Each processing element in turn accesses a flood-fill hardware primitive, each of which is configured to flood a seed inside an N×M image block. In some cases, processing element commands to the flood-fill hardware primitive(s) can be queued and acted upon pursuant to an arbitration scheme.