DEBUGGING COMMUNICATION AMONG UNITS ON PROCESSOR SIMULATOR

    公开(公告)号:US20230185694A1

    公开(公告)日:2023-06-15

    申请号:US17547757

    申请日:2021-12-10

    发明人: Hiroshi Inoue

    IPC分类号: G06F11/36 G06F11/34 G06F9/30

    摘要: A method is provided for identifying a data transfer mismatch between a sender and a receiver from among units of a software simulator of a hardware processor. The simulator runs the plurality of the units which communicate with each other via First-In First-Outs (FIFOs). The method counts amounts of data the sender writes to the FIFOs and the receiver reads from the FIFOs for a given data transfer. The method avoids blocking during FIFO reading and writing operations by (i) reading dummy data by the receiver, even if the FIFOs are empty, when the receiver tries reading from the FIFOs, and (ii) discarding written data if the FIFOs are full, when the sender tries writing to the FIFOs. The method identifies mismatches in the amount of data the sender writes to the FIFOs versus the amount of data the receiver reads from the FIFOs for the given data transfer.

    Multi-sample dropout for faster deep neural network training

    公开(公告)号:US11630988B2

    公开(公告)日:2023-04-18

    申请号:US16686565

    申请日:2019-11-18

    发明人: Hiroshi Inoue

    IPC分类号: G06N3/04 G06N3/08 G06N3/082

    摘要: A computer-implemented method, a computer program product, and a computer system for multi-sample dropout in deep neural network training. A computer creates multiple dropout samples in a minibatch, starting from a dropout layer and ending at a loss function layer in a deep neural network. At the dropout layer in the deep neural network, the computer applies multiple random masks for respective ones of the multiple dropout samples. At a fully connected layer in the deep neural network, the computer applies a shared parameter for all of the multiple dropout samples. After the loss function layer in the deep neural network, the computer calculates a final loss value, by averaging loss values of the respective ones of the multiple dropout samples.

    Sorting an array consisting of a large number of elements

    公开(公告)号:US11372929B2

    公开(公告)日:2022-06-28

    申请号:US16655288

    申请日:2019-10-17

    发明人: Hiroshi Inoue

    摘要: Sorting an array consisting of large number of elements. The present invention provides an apparatus for executing a multiway merging process which generates one output sequence from N input sequences on an array consisting of a large number of elements. The apparatus includes: an execution unit configured to execute the multiway merging process on N input sequences without rearranging the elements based on a plurality of input sequences; and a generation unit configured to rearrange the elements constituting the input sequences according to an output sequence that has been generated by the multiway merging process in the execution unit so as to generate a sorted array of elements.

    EFFICIENT GRAPH OPTIMIZATION
    4.
    发明申请

    公开(公告)号:US20200081918A1

    公开(公告)日:2020-03-12

    申请号:US16685397

    申请日:2019-11-15

    发明人: Hiroshi Inoue

    IPC分类号: G06F16/901 G06F16/23

    摘要: A method includes generating, using a processor, a graph including a plurality of nodes and a plurality of paths between the nodes, the graph representing a system comprising an arrangement of elements, and a data structure in which, for each node pair of the plurality of nodes, a count of the number of paths of length S between the node pair is stored in association with the length S. The method includes modifying the graph, using the processor, to obtain a modification of the graph related to the count of the number of paths of length S, and estimating, based on the data structure and the modification, an objective function that is quantitatively representative of the connections between the nodes and being indicative of performance of the system.

    OPTIMIZING MEMORY FENCES BASED ON WORKLOAD
    5.
    发明申请

    公开(公告)号:US20190227844A1

    公开(公告)日:2019-07-25

    申请号:US15877737

    申请日:2018-01-23

    IPC分类号: G06F9/50

    摘要: A method, computer program product, and apparatus for optimizing memory fences based on workload are provided. The method includes determining whether to execute a target program on a single hardware thread or a plurality of hardware threads. The method also includes assigning one of a light-weight memory fence and a heavy-weight memory fence as a memory fence in the target program based on whether to execute the target program on the single hardware thread or the plurality of hardware threads. The method further includes assigning the light-weight memory fence in response to determining to execute the target program on the single hardware thread, and the heavy-weight memory fence is assigned in response to determining to execute the target program on the plurality of hardware threads.

    DATA AUGMENTATION FOR IMAGE CLASSIFICATION TASKS

    公开(公告)号:US20190087695A1

    公开(公告)日:2019-03-21

    申请号:US15843687

    申请日:2017-12-15

    发明人: Hiroshi Inoue

    摘要: A computer-implemented method and systems are provided for performing machine learning for an image classification task. The method includes selecting, by a processor operatively coupled to one or more databases, a first and a second image from one or more training sets in the one or more databases. The method further includes overlaying, by the processor, the second image on the first image to form a mixed image, by averaging an intensity of each of a plurality of co-located pixel pairs in the first and the second image. The method also includes training, by the processor, a machine learning process configured for the image classification task using the mixed image to augment data used by the machine learning process for the image classification task.

    Controlling priority of dynamic compilation

    公开(公告)号:US10127061B2

    公开(公告)日:2018-11-13

    申请号:US14832726

    申请日:2015-08-21

    摘要: A method for controlling priority of dynamic compilation by a computer system is disclosed. A task is executed by using a thread pooled in a thread pool. A metric related to the dynamic compilation is monitored. And then, determination is done whether the metric meets a predetermined criterion. In returning of the thread to the thread pool for next execution, a lowering of priority of the thread is caused if the metric is determined to meet the predetermined criterion. The lowering of priority of the thread may be caused by causing the thread to sleep for a period of time before the returning. The metric may be a length of a compilation queue for the dynamic compilation or a utilization rate of a compiler thread executing the dynamic compilation.

    DETERMINING A METHOD TO INLINE USING AN ACTUAL FOOTPRINT CALCULATION
    10.
    发明申请
    DETERMINING A METHOD TO INLINE USING AN ACTUAL FOOTPRINT CALCULATION 有权
    确定一种使用实际计算方法进行内联的方法

    公开(公告)号:US20140245274A1

    公开(公告)日:2014-08-28

    申请号:US14186136

    申请日:2014-02-21

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4443

    摘要: Techniques for calculating the actual footprint of a computer-implemented method are disclosed. An example computer-implemented method includes a computer creating a map indicating to which code method each instruction included in compiled code belongs. This computer-implemented method also includes the computer sampling instructions executed using a hardware performance counter. This computer-implemented method also includes the computer mapping the sampled instructions to the code methods to which the instructions belong using the map. This computer-implemented method also includes the computer calculating the actual footprint of each code method as the total number of instructions sampled at least once among the instructions belonging to the code methods.

    摘要翻译: 公开了用于计算计算机实现的方法的实际足迹的技术。 计算机实现的方法的示例包括计算机创建映射,指示包含在编译代码中的每个指令的哪个代码方法属于哪个代码方法。 该计算机实现的方法还包括使用硬件性能计数器执行的计算机采样指令。 该计算机实现的方法还包括计算机将采样指令映射到指令所属的代码方法。 该计算机实现的方法还包括计算机将每个代码方法的实际覆盖区域计算为属于代码方法的指令中的至少一次采样的指令的总数。