-
公开(公告)号:US11100390B2
公开(公告)日:2021-08-24
申请号:US15950550
申请日:2018-04-11
发明人: Chad Balling McBride , Amol Ashok Ambardekar , Kent D. Cedola , George Petre , Larry Marvin Wall , Boris Bobrov
IPC分类号: G06N3/063 , G06N3/04 , G06N3/06 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/08 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a “fence,” or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.
-
公开(公告)号:US09619488B2
公开(公告)日:2017-04-11
申请号:US14163999
申请日:2014-01-24
发明人: Amol Ashok Ambardekar , Christopher Leonard Huybregts , Larry Wall , Damoun Houshangi , Hrishikesh Pathak
CPC分类号: G06F17/30256 , G06F17/30144 , G06F17/30247 , G06F17/3053 , G06F17/30598 , G06F17/3079 , G06F17/3087 , G06K9/00671 , G06K9/6202 , G06K9/6227 , G06K2209/27
摘要: A computing device having adaptable image search and methods for operating an image recognition program on the computing device are disclosed herein. An image recognition program may receive a query from a user and a target image within which a search based on the query is to be performed using one or more of a plurality of locally stored image recognition models, which are determined to be able to perform the search with sufficiently high confidence. The query may comprise text that is typed or converted from speech. The image recognition program performs the search within the target image for a target region of the target image using at least one selected image recognition model stored locally, and returns a search result to the user.
-
公开(公告)号:US11182667B2
公开(公告)日:2021-11-23
申请号:US15813952
申请日:2017-11-15
发明人: George Petre , Chad Balling McBride , Amol Ashok Ambardekar , Kent D. Cedola , Larry Marvin Wall , Boris Bobrov
IPC分类号: G06N3/06 , G06N3/10 , G06N3/04 , G06F9/38 , G06N3/063 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/08 , H03M7/30 , H04L12/715 , H04L29/08 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can be limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. By inserting a selected padding in the input data to align the input data in memory, data read/writes can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller/iterator can generate one or more instructions that inserts the selected padding into the data. The data padding can be calculated using various characteristics of the input data as well as the NN/DNN as well as characteristics of the cooperating memory components. Padding on the output data can be utilized to support the data alignment at the memory components and the cooperating processing units of the NN/DNN.
-
公开(公告)号:US10996739B2
公开(公告)日:2021-05-04
申请号:US15847785
申请日:2017-12-19
发明人: Amol Ashok Ambardekar , Chad Balling McBride , George Petre , Kent D. Cedola , Larry Marvin Wall
摘要: Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.
-
公开(公告)号:US10963403B2
公开(公告)日:2021-03-30
申请号:US15829832
申请日:2017-12-01
发明人: George Petre , Chad Balling McBride , Amol Ashok Ambardekar , Kent D. Cedola , Boris Bobrov , Larry Marvin Wall
IPC分类号: G06N3/063 , G06N3/02 , G06F13/16 , G06N3/04 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: The performance of a neural network (NN) can be limited by the number of operations being performed. Using a line buffer that is directed to shift a memory block by a selected shift stride for cooperating neurons, data that is operatively residing memory and which would require multiple write cycles into a cooperating line buffer can be processed as in a single line buffer write cycle thereby enhancing the performance of a NN/DNN. A controller and/or iterator can generate one or more instructions having the memory block shifting values for communication to the line buffer. The shifting values can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, shift the data of the memory block and write the data in the line buffer for subsequent processing.
-
公开(公告)号:US11526581B2
公开(公告)日:2022-12-13
申请号:US17085337
申请日:2020-10-30
IPC分类号: G06F17/16
摘要: A method of performing matrix computations includes receiving a compression-encoded matrix including a plurality of rows. Each row of the compression-encoded matrix has a plurality of defined element values and, for each such defined element value, a schedule tag indicating a schedule for using the defined element value in a scheduled matrix computation. The method further includes loading the plurality of rows of the compression-encoded matrix into a corresponding plurality of work memory banks, and providing decoded input data to a matrix computation module configured for performing the scheduled matrix computation. For each work memory bank, a next defined element value and a corresponding schedule tag are read. If the schedule tag meets a scheduling condition, the next defined element value is provided to the matrix computation module. Otherwise, a default element value is provided to the matrix computation module.
-
公开(公告)号:US11256976B2
公开(公告)日:2022-02-22
申请号:US15719351
申请日:2017-09-28
发明人: Kent D. Cedola , Larry Marvin Wall , Boris Bobrov , George Petre , Chad Balling McBride , Amol Ashok Ambardekar
IPC分类号: G06N3/063 , G06N3/04 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H03M7/30 , H04L45/00 , H04L67/02 , H04L67/1001 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L45/50
摘要: Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.
-
8.
公开(公告)号:US11176448B2
公开(公告)日:2021-11-16
申请号:US16843800
申请日:2020-04-08
发明人: Chad Balling McBride , Timothy Hume Heil , Amol Ashok Ambardekar , George Petre , Kent D. Cedola , Larry Marvin Wall , Boris Bobrov
IPC分类号: G06F1/32 , G06F9/46 , G06F17/15 , G06N3/04 , G06N3/06 , G06N3/08 , G06N3/063 , G06F12/0862 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.
-
公开(公告)号:US11528033B2
公开(公告)日:2022-12-13
申请号:US15953356
申请日:2018-04-13
发明人: Joseph Leon Corkery , Benjamin Eliot Lundell , Larry Marvin Wall , Chad Balling McBride , Amol Ashok Ambardekar , George Petre , Kent D. Cedola , Boris Bobrov
IPC分类号: H03M7/30 , G06N3/04 , G06N3/063 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H04L45/02 , H04L67/02 , G06F9/30 , H04L67/1001 , G06F9/48 , G06F12/02 , G06F13/16 , G06F1/3234 , G06F13/28 , H03M7/46 , H04L45/50
摘要: A deep neural network (“DNN”) module compresses and decompresses neuron-generated activation data to reduce the utilization of memory bus bandwidth. The compression unit receives an uncompressed chunk of data generated by a neuron in the DNN module. The compression unit generates a mask portion and a data portion of a compressed output chunk. The mask portion encodes the presence and location of the zero and non-zero bytes in the uncompressed chunk of data. The data portion stores truncated non-zero bytes from the uncompressed chunk of data. A decompression unit receives a compressed chunk of data from memory in the DNN processor or memory of an application host. The decompression unit decompresses the compressed chunk of data using the mask portion and the data portion.
-
公开(公告)号:US11514648B2
公开(公告)日:2022-11-29
申请号:US17133493
申请日:2020-12-23
摘要: An image data annotation system automatically annotates a physical object within individual images frames of an image sequence with relevant object annotations based on a three-dimensional (3D) model of the physical object. Annotating the individual image frames with object annotations includes updating individual image frames within image input data to generate annotated image data that is suitable for reliably training a DNN object detection architecture. Exemplary object annotations that the image data annotation system can automatically apply to individual image frames include, inter alia, object pose, image pose, object masks, 3D bounding boxes composited over the physical object, 2D bounding boxes composited over the physical object, and/or depth map information. Annotating the individual image frames may be accomplished by aligning the 3D model of the physical object with a multi-view reconstruction of the physical object that is generated by inputting an image sequence into a Structure-from-Motion and/or Multi-view Stereo pipeline.
-
-
-
-
-
-
-
-
-