Flow control technique for EOS system

    公开(公告)号:US10339132B2

    公开(公告)日:2019-07-02

    申请号:US14795066

    申请日:2015-07-09

    申请人: NetApp, Inc.

    IPC分类号: G06F16/23

    摘要: A flow control technique prevents exhaustion of storage resources in an exactly once semantics (EOS) system of a storage input/output stack executing on a node of a cluster. An EOS server may service transactions sent by an EOS client and issue replies with results to the EOS client. In order to replay the transactions during normal operation after recovery from a crash, the EOS server persistently stores the transactions in the storage resources until an acknowledgement of completion is received from the EOS client for each pending transaction. The EOS client may issue a checkpoint acknowledgement, e.g., as a prune record, after a periodic interval that marks the completion of all pending transactions issued prior to the record. The EOS server need only log the prune record (rather than each pending transaction) to thereby prevent exhaustion of the storage resources, while also minimizing logging overhead of the server. In response to the crash and during replay of the transactions, the EOS server may employ the prune records to ignore those transactions that have already been acknowledged by the EOS client, thereby reducing time required for replay of the transactions.

    EXACTLY ONCE SEMANTICS
    2.
    发明申请
    EXACTLY ONCE SEMANTICS 审中-公开
    完整的语义

    公开(公告)号:US20160246522A1

    公开(公告)日:2016-08-25

    申请号:US14631408

    申请日:2015-02-25

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06 G06F11/14

    摘要: An exactly once semantics (EOS) system of a storage input/output (I/O) stack implements a technique ensuring that non-idempotent operations occur exactly once in a storage system embodied as a node of a cluster. Illustratively, a first layer of the storage I/O stack may act as a client issuing a non-idempotent operation to second layer of the stack, which may act as a server. According to the technique, the EOS system may wrap (i.e., encapsulate) the non-idempotent operation within a transaction embodied as an EOS transaction data structure having a transaction identifier that uniquely identifies the transaction. The server may complete the transaction and reply with a result to the client, which may acknowledge receipt of the reply. In response to a crash and subsequent recovery of the node, the EOS system may determine whether the transaction had completed prior to the crash. If so, the EOS system ensures that the transaction is not re-played (re-executed). Otherwise, the EOS system allows execution of the transaction such that the transaction occurs exactly once.

    摘要翻译: 一个存储输入/输出(I / O)堆栈的完全一次语义(EOS)系统实现了一种技术,确保非特权操作在体现为集群节点的存储系统中发生一次。 示例性地,存储I / O堆栈的第一层可以充当向堆叠的第二层发出非幂等操作的客户端,其可以充当服务器。 根据该技术,EOS系统可以将具有具有唯一地识别交易的事务标识符的EOS事务数据结构体现的事务中的非幂等操作包裹(即封装)。 服务器可以完成交易并将结果回复给客户端,这可以确认收到回复。 响应于节点的崩溃和随后的恢复,EOS系统可以确定事务在崩溃之前是否已经完成。 如果是这样,EOS系统确保事务不被重新播放(重新执行)。 否则,EOS系统允许执行事务,使得事务正好发生一次。

    TECHNIQUE FOR REDUCING METADATA STORED IN A MEMORY OF A NODE
    3.
    发明申请
    TECHNIQUE FOR REDUCING METADATA STORED IN A MEMORY OF A NODE 有权
    用于减少存储在节点存储器中的元数据的技术

    公开(公告)号:US20160357743A1

    公开(公告)日:2016-12-08

    申请号:US14728482

    申请日:2015-06-02

    申请人: NetApp, Inc.

    IPC分类号: G06F17/30

    摘要: A technique reduces an amount of metadata stored in a memory of a node in a cluster. An extent store layer of a storage input/output (I/O) stack executing on the node stores key-value pairs in a plurality of data structures, e.g., cuckoo hash tables, resident in the memory. The cuckoo hash table embodies metadata that describes an extent and, as such, may be organized to associate a location on disk with a value that identifies the location on disk. The value may be embodied as a locator that includes a reference count used to support deduplication functionality of the extent store layer with respect to the extent. The reference count is divided into two portions: a delta count portion stored in memory for each slot of the hash table and an overflow count portion stored on disk in a header of each extent. One bit of the delta count portion is reserved as an overflow bit that indicates whether the in-memory reference count has overflowed. Another bit of the delta count portion is reserved as a sign bit that indicates whether the value of the remaining delta count portion, which stores the “delta” of the reference count, is positive or negative. Overflow updates to the overflow count portion on disk are postponed until all of the bits of the delta count portion are consumed as negative/positive transitions.

    摘要翻译: 一种技术减少了存储在群集中的节点的存储器中的元数据量。 在节点上执行的存储输入/输出(I / O)堆栈的盘区存储层将密钥值对存储在驻留在存储器中的多个数据结构(例如,布谷鸟哈希表)中。 杜鹃哈希表体现了描述范围的元数据,因此可以被组织以将磁盘上的位置与标识磁盘上的位置的值相关联。 该值可以体现为定位器,其包括用于相对于该范围支持扩展存储层的重复数据删除功能的引用计数。 引用计数被分为两部分:存储在哈希表的每个时隙的存储器中的增量计数部分和存储在每个盘区的标题中的盘上的溢出计数部分。 增量计数部分的一位被保留为指示内存中引用计数是否溢出的溢出位。 增量计数部分的另一位被保留为符号位,其指示存储引用计数的“delta”的剩余增量计数部分的值是正还是负。 推迟到磁盘溢出计数部分的溢出更新,直到增量计数部分的所有位被消耗为负/正转移。

    RATE MATCHING TECHNIQUE FOR BALANCING SEGMENT CLEANING AND I/O WORKLOAD
    4.
    发明申请
    RATE MATCHING TECHNIQUE FOR BALANCING SEGMENT CLEANING AND I/O WORKLOAD 有权
    用于平衡分段清理和I / O工作负载的速率匹配技术

    公开(公告)号:US20160077745A1

    公开(公告)日:2016-03-17

    申请号:US14484565

    申请日:2014-09-12

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06 G06F12/02

    摘要: A rate matching technique may be configured to adjust a rate of cleaning of one or more selected segments of the storage array to accommodate a variable rate of incoming workload processed by a storage input/output (I/O) stack executing on one or more nodes of a cluster. An extent store layer of the storage I/O stack may clean a segment in accordance with segment cleaning which, illustratively, may be embodied as a segment cleaning process. The rate matching technique may be implemented as a feedback control mechanism configured to adjust the segment cleaning process based on the incoming workload. Components of the feedback control mechanism may include one or more weight schedulers and various accounting data structures, e.g., counters, configured to track the progress of segment cleaning and free space usage. The counters may also be used to balance the rates of segment cleaning and incoming I/O workload, which may change depending upon an incoming I/O rate. When the incoming I/O rate changes, the rate of segment cleaning may be adjusted accordingly to ensure that rates are substantially balanced.

    摘要翻译: 速率匹配技术可以被配置为调整存储阵列的一个或多个所选段的清理速率,以适应由在一个或多个节点上执行的存储输入/输出(I / O)栈处理的进入工作负载的可变速率 的集群。 存储I / O堆栈的盘区存储层可以根据段清洁来清洁段,其示例性地可以被实现为段清理过程。 速率匹配技术可以被实现为反馈控制机制,其被配置为基于输入的工作负载来调整段清除过程。 反馈控制机构的组件可以包括一个或多个权重调度器和各种会计数据结构,例如计数器,其被配置为跟踪段清洁和可用空间使用的进度。 这些计数器也可用于平衡段清除和输入I / O工作负载的速率,这可能会根据传入的I / O速率而改变。 当进入的I / O速率变化时,可以相应地调节段清洁的速率,以确保速率基本平衡。

    Perturb key technique
    5.
    发明授权

    公开(公告)号:US10216966B2

    公开(公告)日:2019-02-26

    申请号:US15052332

    申请日:2016-02-24

    申请人: NetApp, Inc.

    IPC分类号: G06F21/78

    摘要: A technique perturbs an extent key to compute a candidate extent key in the event of a collision with metadata (i.e., two extents having different data that yield identical hash values) stored in a memory of a node in a cluster. The perturbing technique may be used to compute a candidate extent key that is not previously stored in an extent store instance. The candidate extent key may be computed from a hash value of an extent using a perturbing algorithm, i.e., a hash collision computation, which illustratively adds a perturb value to the hash value. The perturb value is illustratively sufficient to ensure that the candidate extent key resolves to a same hash bucket and node (extent store instance) as the original extent key. In essence, the technique ensures that the original extent key is perturbed in a deterministic manner to generate the candidate extent key, so that the original extent and candidate extent key “decode” to the same hash bucket and extent store instance.

    FLOW CONTROL TECHNIQUE FOR EOS SYSTEM
    7.
    发明申请
    FLOW CONTROL TECHNIQUE FOR EOS SYSTEM 审中-公开
    EOS系统的流量控制技术

    公开(公告)号:US20170011062A1

    公开(公告)日:2017-01-12

    申请号:US14795066

    申请日:2015-07-09

    申请人: NetApp, Inc.

    IPC分类号: G06F17/30

    CPC分类号: G06F16/2379 G06F16/2358

    摘要: A flow control technique prevents exhaustion of storage resources in an exactly once semantics (EOS) system of a storage input/output stack executing on a node of a cluster. An EOS server may service transactions sent by an EOS client and issue replies with results to the EOS client. In order to replay the transactions during normal operation after recovery from a crash, the EOS server persistently stores the transactions in the storage resources until an acknowledgement of completion is received from the EOS client for each pending transaction. The EOS client may issue a checkpoint acknowledgement, e.g., as a prune record, after a periodic interval that marks the completion of all pending transactions issued prior to the record. The EOS server need only log the prune record (rather than each pending transaction) to thereby prevent exhaustion of the storage resources, while also minimizing logging overhead of the server. In response to the crash and during replay of the transactions, the EOS server may employ the prune records to ignore those transactions that have already been acknowledged by the EOS client, thereby reducing time required for replay of the transactions.

    摘要翻译: 流控制技术防止在簇的节点上执行的存储输入/输出堆栈的精确一次语义(EOS)系统中的存储资源的耗尽。 EOS服务器可以为EOS客户端发送的事务提供服务,并向EOS客户端发送结果回复。 为了在崩溃恢复后的正常操作中重播事务,EOS服务器将事务永久存储在存储资源中,直到从EOS客户端收到每个待处理事务的完成确认。 EOS客户端可以在标记完成在记录之前发出的所有待处理事务的周期性间隔之后发出检查点确认,例如作为剪枝记录。 EOS服务器只需要记录剪枝记录(而不是每个待处理的事务),从而防止存储资源的耗尽,同时也减少服务器的日志开销。 响应于事务的崩溃和重播期间,EOS服务器可以使用修剪记录来忽略已经被EOS客户端确认的那些事务,从而减少重播事务所需的时间。

    Optimized segment cleaning technique

    公开(公告)号:US10133511B2

    公开(公告)日:2018-11-20

    申请号:US14484820

    申请日:2014-09-12

    申请人: NetApp, Inc.

    IPC分类号: G06F3/06

    摘要: An optimized segment cleaning technique is configured to efficiently clean one or more selected portions or segments of a storage array coupled to one or more nodes of a cluster. A bottom-up approach of the segment cleaning technique is configured to read all blocks of a segment to be cleaned (i.e., an “old” segment) to locate extents stored on the SSDs of the old segment and examine extent metadata to determine whether the extents are valid and, if so, relocate the valid extents to a segment being written (i.e., a “new” segment). A top-down approach of the segment cleaning technique obviates reading of the blocks of the old segment to locate the extents and, instead, examines the extent metadata to determine the valid extents of the old segment. A hybrid approach may extend the top-down approach to include only full stripe read operations needed for relocation and reconstruction of blocks as well as retrieval of valid extents from the stripes, while also avoiding any unnecessary read operations of the bottom-down approach.

    TECHNIQUE FOR REDUCING METADATA STORED IN A MEMORY OF A NODE

    公开(公告)号:US20180173703A1

    公开(公告)日:2018-06-21

    申请号:US15895593

    申请日:2018-02-13

    申请人: NetApp, Inc.

    IPC分类号: G06F17/30 G06F12/02

    摘要: A technique reduces an amount of metadata stored in a memory of a node in a cluster. An extent store layer of a storage input/output (I/O) stack executing on the node stores key-value pairs in a plurality of data structures, e.g., cuckoo hash tables, resident in the memory. The cuckoo hash table embodies metadata that describes an extent and, as such, may be organized to associate a location on disk with a value that identifies the location on disk. The value may be embodied as a locator that includes a reference count used to support deduplication functionality of the extent store layer with respect to the extent. The reference count is divided into two portions: a delta count portion stored in memory for each slot of the hash table and an overflow count portion stored on disk in a header of each extent. One bit of the delta count portion is reserved as an overflow bit that indicates whether the in-memory reference count has overflowed. Another bit of the delta count portion is reserved as a sign bit that indicates whether the value of the remaining delta count portion, which stores the “delta” of the reference count, is positive or negative. Overflow updates to the overflow count portion on disk are postponed until all of the bits of the delta count portion are consumed as negative/positive transitions.

    PERTURB KEY TECHNIQUE
    10.
    发明申请
    PERTURB KEY TECHNIQUE 审中-公开
    PERTURB关键技术

    公开(公告)号:US20160248583A1

    公开(公告)日:2016-08-25

    申请号:US15052332

    申请日:2016-02-24

    申请人: NetApp, Inc.

    IPC分类号: H04L9/08 H04L9/16

    CPC分类号: G06F21/78

    摘要: A technique perturbs an extent key to compute a candidate extent key in the event of a collision with metadata (i.e., two extents having different data that yield identical hash values) stored in a memory of a node in a cluster. The perturbing technique may be used to compute a candidate extent key that is not previously stored in an extent store instance. The candidate extent key may be computed from a hash value of an extent using a perturbing algorithm, i.e., a hash collision computation, which illustratively adds a perturb value to the hash value. The perturb value is illustratively sufficient to ensure that the candidate extent key resolves to a same hash bucket and node (extent store instance) as the original extent key. In essence, the technique ensures that the original extent key is perturbed in a deterministic manner to generate the candidate extent key, so that the original extent and candidate extent key “decode” to the same hash bucket and extent store instance.

    摘要翻译: 在与集群中的节点的存储器中存储的元数据(即,具有产生相同的散列值的不同数据的两个扩展数据块)的冲突的情况下,技术干扰了用于计算候选扩展密钥的扩展密钥。 扰动技术可以用于计算先前不存储在范围存储实例中的候选扩展密钥。 候选范围密钥可以使用扰动算法(即,散列碰撞计算)从扩展的散列值计算,该散列碰撞计算说明性地将扰动值添加到散列值。 扰动值示例性地足以确保候选扩展密钥解析为与原始扩展密钥相同的哈希桶和节点(范围存储实例)。 实质上,该技术确保原始扩展密钥以确定性的方式被扰动以产生候选扩展密钥,使得原始扩展和候选扩展密钥“解码”到相同的哈希桶和扩展存储实例。