Self-adaptive control system for dynamic capacity management of latency-sensitive application servers

    公开(公告)号:US09667498B2

    公开(公告)日:2017-05-30

    申请号:US14450148

    申请日:2014-08-01

    Applicant: Facebook, Inc.

    Abstract: A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.

    DATA RECOVERY IN MULTI-LEADER DISTRIBUTED SYSTEMS
    2.
    发明申请
    DATA RECOVERY IN MULTI-LEADER DISTRIBUTED SYSTEMS 有权
    数据恢复在多领域分布式系统中

    公开(公告)号:US20140195486A1

    公开(公告)日:2014-07-10

    申请号:US13736861

    申请日:2013-01-08

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/30581 G06F11/1662 G06F11/2082 G06F11/2094

    Abstract: Disclosed are a method and system for recovering a distributed system from a failure of a data storage unit. The distributed system includes a plurality of computer systems, each having a read-write computer and a data storage unit. Data is replicated from a particular data storage unit to other data storage units using publish-subscribe model. A read-write computer receives the replicated data, processes the data for any conflicts and stores it in the data storage unit. If a data storage unit fails, another data storage unit that has latest data corresponding to the failed data storage unit is determined and the latest data is replicated to other data storage units. Accordingly, the distributed system continues to have the data of the failed data storage unit. The failed data storage unit may be reconstructed using data from one of the other data storage units in the distributed system.

    Abstract translation: 公开了一种用于从数据存储单元的故障中恢复分布式系统的方法和系统。 分布式系统包括多个计算机系统,每个计算机系统具有读写计算机和数据存储单元。 使用发布 - 订阅模型将数据从特定数据存储单元复制到其他数据存储单元。 读写计算机接收复制数据,处理任何冲突的数据并将其存储在数据存储单元中。 如果数据存储单元发生故障,则确定具有与故障数据存储单元相对应的最新数据的另一数据存储单元,并将最新数据复制到其他数据存储单元。 因此,分布式系统继续具有故障数据存储单元的数据。 可以使用来自分布式系统中的其他数据存储单元之一的数据来重构故障数据存储单元。

    SELF-ADAPTIVE CONTROL SYSTEM FOR DYNAMIC CAPACITY MANAGEMENT OF LATENCY-SENSITIVE APPLICATION SERVERS

    公开(公告)号:US20170223100A1

    公开(公告)日:2017-08-03

    申请号:US15493532

    申请日:2017-04-21

    Applicant: Facebook, Inc.

    Abstract: A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.

    Self-adaptive control system for dynamic capacity management of latency-sensitive application servers

    公开(公告)号:US10212220B2

    公开(公告)日:2019-02-19

    申请号:US15493532

    申请日:2017-04-21

    Applicant: Facebook, Inc.

    Abstract: A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.

    Data recovery in multi-leader distributed systems

    公开(公告)号:US09824132B2

    公开(公告)日:2017-11-21

    申请号:US13736861

    申请日:2013-01-08

    Applicant: Facebook, Inc.

    CPC classification number: G06F17/30581 G06F11/1662 G06F11/2082 G06F11/2094

    Abstract: Disclosed are a method and system for recovering a distributed system from a failure of a data storage unit. The distributed system includes a plurality of computer systems, each having a read-write computer and a data storage unit. Data is replicated from a particular data storage unit to other data storage units using publish-subscribe model. A read-write computer receives the replicated data, processes the data for any conflicts and stores it in the data storage unit. If a data storage unit fails, another data storage unit that has latest data corresponding to the failed data storage unit is determined and the latest data is replicated to other data storage units. Accordingly, the distributed system continues to have the data of the failed data storage unit. The failed data storage unit may be reconstructed using data from one of the other data storage units in the distributed system.

    Subscription groups in publish-subscribe system
    8.
    发明授权
    Subscription groups in publish-subscribe system 有权
    发布订阅系统中的订阅组

    公开(公告)号:US09344395B2

    公开(公告)日:2016-05-17

    申请号:US14620085

    申请日:2015-02-11

    Applicant: Facebook, Inc.

    Abstract: Disclosed is a technology for publishing subscriptions in a publish-subscribe system in accordance with subscription groups. The technology may include (i) determining a consumption characteristic by which each of multiple subscribers in a publish-subscribe system consumes a subscription published by a publisher; (ii) identifying the subscribers whose consumption characteristics satisfy a specified similarity criterion; (iii) defining multiple subscription groups, each of which includes subscriptions of those of the subscribers whose consumption characteristics satisfy the specified similarity criterion; (iv) storing the subscriptions in multiple logical partitions of a storage system where each of the logical partitions contains a separate non-overlapping subset of the subscriptions; and (v) transmitting the subscriptions to the subscribers in accordance with the subscription groups.

    Abstract translation: 公开了一种根据订阅组在发布订阅系统中发布订阅的技术。 该技术可以包括(i)确定消费特征,通过该消费特征,发布订阅系统中的多个订阅者中的每一个消费出版者发布的订阅; (ii)识别其消费特征满足指定的相似性标准的用户; (iii)定义多个订阅组,每个订阅组包括对消费特征满足指定的相似性标准的用户的订阅; (iv)将订阅存储在存储系统的多个逻辑分区中,其中每个逻辑分区包含订阅的单独的非重叠子集; 和(v)根据订阅团体向用户发送订阅。

    SYSTEM AND METHOD FOR IMPLEMENTING CACHE CONSISTENT REGIONAL CLUSTERS
    10.
    发明申请
    SYSTEM AND METHOD FOR IMPLEMENTING CACHE CONSISTENT REGIONAL CLUSTERS 有权
    用于实现高速缓存区域集群的系统和方法

    公开(公告)号:US20150378894A1

    公开(公告)日:2015-12-31

    申请号:US14846409

    申请日:2015-09-04

    Applicant: Facebook, Inc.

    Abstract: When multiple regional data clusters are used to store data in a system, maintaining cache consistency across different regions is important for providing a desirable user experience. In one embodiment, there is a master data cluster where all data writes are performed, and the writes are replicated to each of the slave data clusters in the other regions. Appended to the replication statements are invalidations for cache values for the keys whose values have been changed in the master data cluster. An apparatus in the master data cluster logs replication statements sent to the slave databases. When a slave database fails, the apparatus extracts the invalidations intended for the failed database and publishes the invalidations to a subscriber in the region of the failed database. The subscriber sends the invalidations to the local caches to cause stale data for those keys to be deleted from the caches.

    Abstract translation: 当多个区域数据集群用于在系统中存储数据时,在不同区域之间保持高速缓存的一致性对于提供理想的用户体验是重要的。 在一个实施例中,存在执行所有数据写入的主数据集群,并且将写入复制到其他区域中的每个从属数据集群。 附加到复制语句对于其值在主数据集群中已更改的键的缓存值无效。 主数据集群中的一个装置记录发送到从属数据库的复制语句。 当从属数据库发生故障时,设备将提取针对故障数据库的无效,并在失败的数据库区域中向用户发布无效。 用户将无效发送到本地缓存,以使这些密钥的过期数据从高速缓存中删除。

Patent Agency Ranking