Method for managing hardware resources within a simultaneous multi-threaded processing system
    1.
    发明授权
    Method for managing hardware resources within a simultaneous multi-threaded processing system 有权
    同时管理多线程处理系统中的硬件资源的方法

    公开(公告)号:US08640109B2

    公开(公告)日:2014-01-28

    申请号:US13444013

    申请日:2012-04-11

    IPC分类号: G06F9/44

    CPC分类号: G06F11/3442 G06F8/443

    摘要: A method for managing hardware resources and threads within a data processing system is disclosed. Compilation attributes of a function are collected during and after the compilation of the function. The pre-processing attributes of the function are also collected before the execution of the function. The collected attributes of the function are then analyzed, and a runtime configuration is assigned to the function based of the result of the attribute analysis. The runtime configuration may include, for example, the designation of the function to be executed under either a single-threaded mode or a simultaneous multi-threaded mode. During the execution of the function, real-time attributes of the function are being continuously collected. If necessary, the runtime configuration under which the function is being executed can be changed based on the real-time attributes collected during the execution of the function.

    摘要翻译: 公开了一种在数据处理系统内管理硬件资源和线程的方法。 在编译函数期间和之后收集函数的编译属性。 函数的预处理属性也在执行函数之前收集。 然后对功能的收集属性进行分析,并根据属性分析的结果为运行时配置分配功能。 运行时配置可以包括例如在单线程模式或同时多线程模式下执行的功能的指定。 在执行功能期间,功能的实​​时属性正在不断的收集。 如果需要,可以基于在执行功能期间收集的实时属性来更改正在执行功能的运行时配置。

    METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN PROCESSING NODES
    2.
    发明申请
    METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN PROCESSING NODES 审中-公开
    减少加工节点间存储高优先级系统的多个工作项目队列同步的方法

    公开(公告)号:US20130254776A1

    公开(公告)日:2013-09-26

    申请号:US13621215

    申请日:2012-09-15

    IPC分类号: G06F9/50

    摘要: A method efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The method comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units.

    摘要翻译: 一种方法在具有全局命令队列(GCQ)和至少一个高等待时间节点的多节点数据处理系统内有效地分派/完成工作单元。 该方法包括:在高延迟处理器节点处,建立本地命令/工作队列(LCQ)的工作调度逻辑,其中可以在执行之前分段由本地处理单元执行的多个工作项; 第一本地处理单元通过工作请求检索比本地处理单元在正常工作完成/执行周期中完成的更大的工作块大小; 存储在本地命令/工作队列(LCQ)中检索的较大块的大小的工作; 使第一个本地处理单位能够在本地安排和完成立法会内存储的部分工作; 只有当本地处理单位派出了立法会内的所有工作时,才向GCQ发送下一个工作请求。

    System for iterative interactive ray tracing in a multiprocessor environment
    3.
    发明授权
    System for iterative interactive ray tracing in a multiprocessor environment 失效
    用于多处理器环境中迭代交互光线跟踪的系统

    公开(公告)号:US08525826B2

    公开(公告)日:2013-09-03

    申请号:US12188290

    申请日:2008-08-08

    IPC分类号: G06T15/00

    摘要: A method comprises receiving scene model data including a scene geometry model and a plurality of pixel data describing objects arranged in a scene. The method generates a primary ray based on a selected first pixel data. In the event the primary ray intersects an object in the scene, the method determines primary hit color data and generates a plurality of secondary rays. The method groups the secondary packets and arranges the packets in a queue based on the octant of each direction vector in the secondary ray packet. The method generates secondary color data based on the secondary ray packets in the queue and generates a pixel color based on the primary hit color data, and the secondary color data. The method generates an image based on the pixel color for the pixel data.

    摘要翻译: 一种方法包括接收包括场景几何模型和描述在场景中布置的对象的多个像素数据的场景模型数据。 该方法基于所选择的第一像素数据生成主射线。 在主要射线与场景中的物体相交的情况下,该方法确定主要命中颜色数据并产生多个次要射线。 该方法对二次分组进行分组,并根据二次射线分组中每个方向向量的八分圆排列队列中的分组。 该方法基于队列中的二次射线包生成次色数据,并根据主要命中颜色数据和次要颜色数据生成像素颜色。 该方法基于像素数据的像素颜色生成图像。

    Efficient Multi-Level Software Cache Using SIMD Vector Permute Functionality
    4.
    发明申请
    Efficient Multi-Level Software Cache Using SIMD Vector Permute Functionality 有权
    使用SIMD向量权限功能的高效多级软件缓存

    公开(公告)号:US20110161548A1

    公开(公告)日:2011-06-30

    申请号:US12648667

    申请日:2009-12-29

    IPC分类号: G06F12/08 G06F12/00

    摘要: A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location.

    摘要翻译: 缓存管理器接收对数据的请求,其中包括请求的有效地址。 高速缓存管理器确定所请求的有效地址是否匹配存储在映射的标签向量中的最近使用的有效地址。 当最近使用的有效地址与所请求的有效地址匹配时,高速缓存管理器识别对应的高速缓存位置并从所识别的高速缓存位置检索数据。 然而,当最近使用的有效地址不能匹配所请求的有效地址时,高速缓存管理器确定所请求的有效地址是否匹配存储在映射的标签向量中的后续有效地址。 当高速缓存管理器确定与随后的有效地址的匹配时,高速缓存管理器识别与随后的有效地址相对应的不同高速缓存位置,并从不同的高速缓存位置检索数据。

    System and method for cache optimized data formatting
    5.
    发明授权
    System and method for cache optimized data formatting 失效
    缓存优化数据格式化的系统和方法

    公开(公告)号:US07864187B2

    公开(公告)日:2011-01-04

    申请号:US11840976

    申请日:2007-08-19

    IPC分类号: G09G5/00 G06F15/16

    摘要: A system and method for cache optimized data formatting is presented. A processor generates images by calculating a plurality of image point values using height data, color data, and normal data. Normal data is computed for a particular image point using pixel data adjacent to the image point. The computed normalized data, along with corresponding height data and color data, are included in a limited space data stream and sent to a processor to generate an image. The normalized data may be computed using adjacent pixel data at any time prior to inserting the normalized data in the limited space data stream.

    摘要翻译: 介绍了缓存优化数据格式化的系统和方法。 处理器通过使用高度数据,颜色数据和正常数据计算多个图像点值来生成图像。 使用与图像点相邻的像素数据,针对特定图像点计算正常数据。 计算的归一化数据以及对应的高度数据和颜色数据被包括在有限的空间数据流中,并被发送到处理器以生成图像。 可以在将有规律化数据插入有限空间数据流之前的任何时间使用相邻像素数据来计算归一化数据。

    Hiding memory latency
    6.
    发明授权
    Hiding memory latency 失效
    隐藏内存延迟

    公开(公告)号:US07620951B2

    公开(公告)日:2009-11-17

    申请号:US12049293

    申请日:2008-03-15

    IPC分类号: G06F9/46

    CPC分类号: G06F9/322 G06F8/41 G06F9/3851

    摘要: An approach to hiding memory latency in a multi-thread environment is presented. Branch Indirect and Set Link (BISL) and/or Branch Indirect and Set Link if External Data (BISLED) instructions are placed in thread code during compilation at instances that correspond to a prolonged instruction. A prolonged instruction is an instruction that instigates latency in a computer system, such as a DMA instruction. When a first thread encounters a BISL or a BISLED instruction, the first thread passes control to a second thread while the first thread's prolonged instruction executes. In turn, the computer system masks the latency of the first thread's prolonged instruction. The system can be optimized based on the memory latency by creating more threads and further dividing a register pool amongst the threads to further hide memory latency in operations that are highly memory bound.

    摘要翻译: 介绍了一种在多线程环境中隐藏内存延迟的方法。 分支间接和设置链接(BISL)和/或分支间接和设置链接,如果外部数据(BISLED)指令在对应于延长的指令的实例的编译期间被放置在线程代码中。 延长的指令是指示计算机系统中的延迟,例如DMA指令。 当第一个线程遇到BISL或BISLED指令时,第一个线程在第一个线程的延长指令执行时将控制传递给第二个线程。 反过来,计算机系统掩盖了第一个线程延长的指令的延迟。 可以通过创建更多线程并在线程之间进一步划分寄存器池来进一步隐藏高度内存限制的操作中的内存延迟,从而可以基于内存延迟来优化系统。

    Adaptive span computation during ray casting
    7.
    发明授权
    Adaptive span computation during ray casting 有权
    光线投射期间的自适应跨度计算

    公开(公告)号:US07538767B2

    公开(公告)日:2009-05-26

    申请号:US12037372

    申请日:2008-02-26

    IPC分类号: G06T15/50

    CPC分类号: G06T15/20

    摘要: Adaptive span computation when ray casting is presented. A processor uses start point fractional values during view screen segment computations that start a view screen segment's computations a particular distance away from a down point. This prevents an excessive sampling density during image generation without wasting processor resources. The processor identifies a start point fractional value for each view screen segment based upon each view screen segment's identifier, and computes a view screen segment start point for each view screen segment using the start point fractional value. View screen segment start points are “tiered” and are a particular distance away from the down point. This stops the view screen segments from converging to a point of severe over sampling while, at the same time, providing a pseudo-uniform sampling density.

    摘要翻译: 射线投射时的自适应跨度计算。 处理器在视图屏幕段计算期间使用起始点分数值,该计算开始视图屏幕段的计算距离下降点的特定距离。 这防止了图像生成过程中的过度采样密度,而不会浪费处理器资源。 处理器基于每个视图屏幕段的标识符来识别每个视图屏幕段的开始点分数值,并且使用起始点分数值来计算每个视图屏幕段的视图屏幕段开始点。 查看屏幕段开始点是“分层”的,距离下降点是特定的距离。 这样就可以阻止观看屏幕段收敛到严重过采样点,同时提供一个伪均匀的采样密度。

    Ray tracing with depth buffered display
    8.
    发明授权
    Ray tracing with depth buffered display 失效
    光线跟踪与深度缓冲显示

    公开(公告)号:US07439973B2

    公开(公告)日:2008-10-21

    申请号:US11201651

    申请日:2005-08-11

    IPC分类号: G06T15/40

    摘要: An image that includes ray traced pixel data and rasterized pixel data is generated. A synergistic processing unit (SPU) uses a rendering algorithm to generate ray traced data for objects that require high-quality image rendering. The ray traced data is fragmented, whereby each fragment includes a ray traced pixel depth value and a ray traced pixel color value. A rasterizer compares ray traced pixel depth values to corresponding rasterized pixel depth values, and overwrites ray traced pixel data with rasterized pixel data when the corresponding rasterized fragment is “closer” to a viewing point, which results in composite data. A display subsystem uses the resultant composite data to generate an image on a user's display.

    摘要翻译: 生成包括光线跟踪像素数据和光栅化像素数据的图像。 协同处理单元(SPU)使用渲染算法为需要高质量图像渲染的对象生成光线跟踪数据。 光线跟踪的数据被分段,由此每个片段包括光线跟踪的像素深度值和光线跟踪的像素颜色值。 光栅化器将光线跟踪的像素深度值与相应的光栅化像素深度值进行比较,并且当对应的光栅化片段“靠近”到观察点时,将光栅跟踪的像素数据重写为光栅跟踪像素数据,这导致复合数据。 显示子系统使用所得到的复合数据在用户的显示器上生成图像。

    Virtual Devices Using a Plurality of Processors
    9.
    发明申请
    Virtual Devices Using a Plurality of Processors 失效
    使用多个处理器的虚拟设备

    公开(公告)号:US20080168443A1

    公开(公告)日:2008-07-10

    申请号:US12049179

    申请日:2008-03-14

    IPC分类号: G06F15/76 G06F9/46 G06F9/30

    CPC分类号: G06F9/4843 G06F9/544

    摘要: An approach is provided to allow virtual devices that use a plurality of processors in a multiprocessor systems, such as the BE environment. Using this method, a synergistic processing unit (SPU) can either be dedicated to performing a particular function (i.e., audio, video, etc.) or a single SPU can be programmed to perform several functions on behalf of the other processors in the system. The application, preferably running in one of the primary (PU) processors, issues IOCTL commands through device drivers that correspond to SPUs. The kernel managing the primary processors responds by sending an appropriate message to the SPU that is performing the dedicated function. Using this method, an SPU can be virtualized for swapping multiple tasks or dedicated to performing a particular task.

    摘要翻译: 提供了一种方法来允许在诸如BE环境的多处理器系统中使用多个处理器的虚拟设备。 使用这种方法,协同处理单元(SPU)可以专用于执行特定功能(即,音频,视频等),或者单个SPU可被编程为代表系统中的其他处理器执行若干功能 。 优选地,在主(PU)处理器之一中运行的应用通过对应于SPU的设备驱动器发出IOCTL命令。 管理主处理器的内核通过向执行专用功能的SPU发送适当的消息来做出响应。 使用此方法,可以将SPU虚拟化用于交换多个任务或专用于执行特定任务。

    System and method for terrain rendering using a limited memory footprint
    10.
    发明授权
    System and method for terrain rendering using a limited memory footprint 失效
    使用有限的内存占用的地形渲染的系统和方法

    公开(公告)号:US07212199B2

    公开(公告)日:2007-05-01

    申请号:US10875946

    申请日:2004-06-24

    IPC分类号: G06T15/00

    CPC分类号: G06T17/05 G06T15/06

    摘要: A system and method for terrain rendering using a limited memory footprint is presented. A system and method to perform vertical ray terrain rendering by using a terrain data subset for image point value calculations. Terrain data is segmented into terrain data subsets whereby the terrain data subsets are processed in parallel. A bottom view ray intersects the terrain data to provide a memory footprint starting point. In addition, environmental visibility settings provide a memory footprint ending point. The memory footprint starting point, the memory footprint ending point, and vertical ray adjacent data points define a terrain data subset that corresponds to a particular vertical ray. The terrain data subset includes height and color information which are used for vertical ray coherence terrain rendering.

    摘要翻译: 提出了一种使用有限内存占用的地形渲染的系统和方法。 通过使用用于图像点值计算的地形数据子集来执行垂直射线地形渲染的系统和方法。 地形数据被分割成地形数据子集,由此地形数据子集被并行处理。 底视图与地形数据相交以提供内存覆盖起始点。 此外,环境可见性设置提供了内存占用的终点。 内存占用开始点,内存占位符终点和垂直射线相邻数据点定义对应于特定垂直射线的地形数据子集。 地形数据子集包括用于垂直射线相干地形渲染的高度和颜色信息。