Abstract:
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Abstract:
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Abstract:
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Abstract:
Some embodiments include apparatus and methods having an input to receive an input signal, additional inputs to receive clock signals having different phases to sample the input signal, and a decision feedback equalizer (DFE) having DFE slices. The DFE slices include a number of data comparators to provide data information based on the sampling of the input signal, and a number of phase error comparators to provide phase error information associated with the sampling of the input signal. The number of phase error comparators of the DFE slices is not greater than the number of data comparators of the DFE slices.
Abstract:
Implementations of the disclosure provide a processing device comprising a branch predictor circuit to obtain a branch history for an application. The branch history comprising references to branching instructions associated with the application and an outcome of executing each branch. Using the branch history, a neutral network is trained to produce a weighted value for each branch of the branching instructions. Features of the branching instructions are identified based on the weighted values. Each feature identifying predictive information regarding the outcome of at least one branch of correlated branches having corresponding outcomes. A feature vector is determined based on the features. The feature vector comprises a plurality of data fields that identify an occurrence of a corresponding feature of the correlated branches with respect to the branch history. Using the feature vector, a data model is produced to determine a predicted outcome associated with the correlated branches.
Abstract:
Implementations of the disclosure provide a processing device comprising a branch predictor circuit to obtain a branch history for an application. The branch history comprising references to branching instructions associated with the application and an outcome of executing each branch. Using the branch history, a neutral network is trained to produce a weighted value for each branch of the branching instructions. Features of the branching instructions are identified based on the weighted values. Each feature identifying predictive information regarding the outcome of at least one branch of correlated branches having corresponding outcomes. A feature vector is determined based on the features. The feature vector comprises a plurality of data fields that identify an occurrence of a corresponding feature of the correlated branches with respect to the branch history. Using the feature vector, a data model is produced to determine a predicted outcome associated with the correlated branches.
Abstract:
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Abstract:
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Abstract:
An embodiment of a semiconductor package apparatus may include technology to process one or more vectors with a sum of squares operation with a layer of a multi-layer neural network, and determine a fixed-point approximation for the sum of squares operation. Other embodiments are disclosed and claimed.
Abstract:
A processor includes a front end to decode an instruction, an allocator to pass the instruction to a nearest neighbor logic unit (NNLU) to execute the instruction, and a retirement unit to retire the instruction. The NNLU includes logic to determine input of the instruction for which nearest neighbors will be calculated, transform the input, retrieve candidate atoms for which the nearest neighbors will be calculated, compute distance between the candidate atoms and the input, and determine the nearest neighbors for the input based upon the computed distance.