Network switch with integrated gradient aggregation for distributed machine learning

    公开(公告)号:US12236323B1

    公开(公告)日:2025-02-25

    申请号:US18217483

    申请日:2023-06-30

    Applicant: Innovium, Inc.

    Abstract: Distributed machine learning systems and other distributed computing systems are improved by embedding compute logic at the network switch level to perform collective actions, such as reduction operations, on gradients or other data processed by the nodes of the system. The switch is configured to recognize data units that carry data associated with a collective action that needs to be performed by the distributed system, referred to herein as “compute data,” and process that data using a compute subsystem within the switch. The compute subsystem includes a compute engine that is configured to perform various operations on the compute data, such as “reduction” operations, and forward the results back to the compute nodes. The reduction operations may include, for instance, summation, averaging, bitwise operations, and so forth. In this manner, the network switch may take over some or all of the processing of the distributed system during the collective phase.

    Shared traffic manager
    2.
    发明授权

    公开(公告)号:US12068972B1

    公开(公告)日:2024-08-20

    申请号:US18208648

    申请日:2023-06-12

    Applicant: Innovium, Inc.

    CPC classification number: H04L47/6255 H04L49/901 H04L49/9084

    Abstract: A traffic manager is shared amongst two or more egress blocks of a network device, thereby allowing traffic management resources to be shared between the egress blocks. Schedulers within a traffic manager may generate and queue read instructions for reading buffered portions of data units that are ready to be sent to the egress blocks. The traffic manager may be configured to select a read instruction for a given buffer bank from the read instruction queues based on a scoring mechanism or other selection logic. To avoid sending too much data to an egress block during a given time slot, once a data unit portion has been read from the buffer, it may be temporarily stored in a shallow read data cache. Alternatively, a single, non-bank specific controller may determine all of the read instructions and write operations that should be executed in a given time slot.

    Distributed artificial intelligence extension modules for network switches

    公开(公告)号:US11516149B1

    公开(公告)日:2022-11-29

    申请号:US17367331

    申请日:2021-07-03

    Applicant: Innovium, Inc.

    Abstract: Distributed machine learning systems and other distributed computing systems are improved by compute logic embedded in extension modules coupled directly to network switches. The compute logic performs collective actions, such as reduction operations, on gradients or other compute data processed by the nodes of the system. The reduction operations may include, for instance, summation, averaging, bitwise operations, and so forth. In this manner, the extension modules may take over some or all of the processing of the distributed system during the collective phase. An inline version of the module sits between a switch and the network. Data units carrying compute data are intercepted and processed using the compute logic, while other data units pass through the module transparently to or from the switch. Multiple modules may be connected to the switch, each coupled to a different group of nodes, and sharing intermediate results. A sidecar version is also described.

    Distributed artificial intelligence extension modules for network switches

    公开(公告)号:US11057318B1

    公开(公告)日:2021-07-06

    申请号:US16552938

    申请日:2019-08-27

    Applicant: Innovium, Inc.

    Abstract: Distributed machine learning systems and other distributed computing systems are improved by compute logic embedded in extension modules coupled directly to network switches. The compute logic performs collective actions, such as reduction operations, on gradients or other compute data processed by the nodes of the system. The reduction operations may include, for instance, summation, averaging, bitwise operations, and so forth. In this manner, the extension modules may take over some or all of the processing of the distributed system during the collective phase. An inline version of the module sits between a switch and the network. Data units carrying compute data are intercepted and processed using the compute logic, while other data units pass through the module transparently to or from the switch. Multiple modules may be connected to the switch, each coupled to a different group of nodes, and sharing intermediate results. A sidecar version is also described.

    Visibility packets with inflated latency

    公开(公告)号:US11044204B1

    公开(公告)日:2021-06-22

    申请号:US16527278

    申请日:2019-07-31

    Applicant: Innovium, Inc.

    Abstract: Nodes within a network are configured to adapt to changing path states, due to congestion, node failures, and/or other factors. A node may selectively convey path information and/or other state information to another node by annotating the information into packets it receives from the other node. A node may selectively reflect these annotated packets back to the other node, or other nodes that subsequently receive these annotated packets may reflect them. A weighted cost multipathing selection technique is improved by dynamically adjusting weights of paths in response to feedback indicating the current state of the network topology, such as collected through these reflected packets. In an embodiment, certain packets that would have been dropped may instead be transformed into “special visibility” packets that may be stored and/or sent for analysis. In an embodiment, insight into the performance of a network device is enhanced through the use of programmable visibility engines.

    Intelligent packet queues with delay-based actions

    公开(公告)号:US10673770B1

    公开(公告)日:2020-06-02

    申请号:US16288165

    申请日:2019-02-28

    Applicant: Innovium, Inc.

    Abstract: A network device organizes packets into various queues, in which the packets await processing. Queue management logic tracks how long certain packet(s), such as a designated marker packet, remain in a queue. Based thereon, the logic produces a measure of delay for the queue, referred to herein as the “queue delay.” Based on a comparison of the current queue delay to one or more thresholds, various associated delay-based actions may be performed, such as tagging and/or dropping packets departing from the queue, or preventing addition enqueues to the queue. In an embodiment, a queue may be expired based on the queue delay, and all packets dropped. In other embodiments, when a packet is dropped prior to enqueue into an assigned queue, copies of some or all of the packets already within the queue at the time the packet was dropped may be forwarded to a visibility component for analysis.

    Traffic analyzer for autonomously configuring a network device

    公开(公告)号:US10652154B1

    公开(公告)日:2020-05-12

    申请号:US16186369

    申请日:2018-11-09

    Applicant: Innovium, Inc.

    Abstract: Approaches, techniques, and mechanisms facilitate actionable reporting of network state information and real-time, autonomous network engineering directly in-network at a switch or other network device. A data collector within the network device collects state information and/or data unit information from various device components, such as traffic managers and packet processors. The data collector, which may optionally generate additional state information by performing various calculations on the information it receives, is configured to then provide at least some of the state information to an analyzer device connected to an analyzer interface. The analyzer device, which may be a separate device, performs various analyses on the state information, depending on how it is configured. The analyzer device outputs reports that identify statuses, errors, misconfigurations, and/or suggested actions to take to improve operation of the network device. In an embodiment, some or all actions that may be suggested therein are executed automatically.

    Scalable ingress arbitration for merging control and payload

    公开(公告)号:US10554572B1

    公开(公告)日:2020-02-04

    申请号:US15433825

    申请日:2017-02-15

    Applicant: Innovium, Inc.

    Abstract: Approaches, techniques, and mechanisms are disclosed for improving the efficiency with which data units are handled within a device, such as a networking device. Received data units, or portions thereof, are temporarily stored within one or more memories of a merging component, while the merging component waits to receive control information for the data units. Once received, the merging component merges the control information with the associated data units. The merging component dispatches the merged data units, or portions thereof, to an interconnect component, which forwards the merged data units to destinations indicated by the control information. The device is configured to intelligently schedule the dispatching of merged data units to the interconnect component. To this end, the device includes a scheduler configured to select which merged data units to dispatch at which times based on a variety of factors described herein.

    SYSTEM AND METHOD FOR ENABLING HIGH READ RATES TO DATA ELEMENT LISTS

    公开(公告)号:US20180081577A1

    公开(公告)日:2018-03-22

    申请号:US15815854

    申请日:2017-11-17

    Applicant: Innovium, Inc.

    Abstract: A memory system for a network device is described. The memory system includes a main memory configured to store one or more data elements. Further, the memory system includes a link memory that is configured to maintain one or more pointers to interconnect the one or more data elements stored in the main memory. The memory system also includes a free-entry manager that is configured to generate an available bank set including one or more locations in the link memory. In addition, the memory system includes a context manager that is configured to maintain metadata for a list of the one or more data elements.

Patent Agency Ranking