摘要:
A method and apparatus forenhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
Embodiments of the invention provide language support for CPU-GPU platforms. In one embodiment, code can be flexibly executed on both the CPU and GPU. CPU code can offload a kernel to the GPU. That kernel may in turn call preexisting libraries on the CPU, or make other calls into CPU functions. This allows an application to be built without requiring the entire call chain to be recompiled. Additionally, in one embodiment data may be shared seamlessly between CPU and GPU. This includes sharing objects that may have virtual functions. Embodiments thus ensure the right virtual function gets invoked on the CPU or the GPU if a virtual function is called by either the CPU or GPU.
摘要:
In one embodiment, the present invention includes a method including initiating a cleaning operation to clear a first processor core of a system of pending operations, and preventing injection of new events into a second processor core if the cleaning operation is not serviced in the first processor core. In this way, lock situations may be broken without their detection. Other embodiments are described and claimed.
摘要:
Methods, apparatus, and articles of manufacture control a device or system that has an operational limit related to the rate or frequency of operation. The frequency of operation is controlled at a variable rate calculated to maximize the system or apparatus performance over a calculated period of time short enough that a controlling factor, such as power consumption, does not vary significantly during the period. Known system parameters, such as thermal resistance and capacitance of an integrated circuit (IC) and its package, and measured values, such as current junction temperature in an IC, are used to calculate a time-dependent frequency of operation for the upcoming time period that results in the best overall performance without exceeding the operational limit, such as the junction temperature.
摘要:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.
摘要:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.
摘要:
A coprocessor performs an overhead function of a Java virtual machine executing in a main processor. The coprocessor includes memory access circuitry configured to access a memory also accessible by the host processor. Pointer receiving circuitry is configured to receive at least one pointer to data in the memory pertinent to the overhead function. Function performing circuitry is configured to perform the overhead function to operate on the data in the memory pointed to by the at least one pointer. Result passing circuitry configured to pass a result back to the main processor. For example, overhead functions that may be performed by the coprocessor include bytecode verification, just-in-time compiling and garbage collection.
摘要:
A processor and associated memory device that includes a fetcher for fetching instructions stored in the memory device. Each instruction constitutes either a value generating instruction or a non-value generating instruction. The processor further including a decoder for decoding the instructions, an issue unit for routing decoded instructions to an execution unit. The processor further having a predictor being responsive to a first set of instructions, from among the value generating instructions, for predicting, with respect to each one instruction in said first set of instructions, a predicted value that is determined on the basis of a prediction criterion which includes: (i) a previous value generated by the instruction; and (ii) at a stride.
摘要:
In one embodiment, the present invention includes a multicore processor having first and second cores to independently execute instructions, the first core visible to an operating system (OS) and the second core transparent to the OS and heterogeneous from the first core. A task controller, which may be included in or coupled to the multicore processor, can cause dynamic migration of a first process scheduled by the OS to the first core to the second core transparently to the OS. Other embodiments are described and claimed.
摘要:
A system for delivering ultrasonic focused energy, the system comprising a transducer unit for delivering the ultrasonic focused energy, the transducer unit comprising an interface medium adapted to contact at least a portion of a treatment region in a treatment area, and further comprising an electromagnetic (EM) radiating element adapted to transmit EM radiation towards the interface medium, wherein a reflection of the EM radiation from the interface medium is indicative of an extent of acoustic contact; and an energy processing unit adapted to send electrical energy to the transducer unit.