Abstract:
Methods and apparatus for connection redistribution in load-balanced systems that include multiple load balancers each serving multiple nodes. In the connection redistribution method, each node estimates a connection close rate, which may be based on an estimation of the percentage of the overall client traffic received by the respective load balancer that is being handled by the node. The node generates close requests for connections between the respective load balancer and clients according to the connection close rate. The node sends the close requests to its load balancer, which forwards the close requests to the appropriate clients. Upon receiving a close request, a client may close the connection(s) indicated by the request, obtain a public IP address for a load balancer, and initiate new connection(s) to the respective load balancer via the public IP address.
Abstract:
A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. An admission control mechanism may manage requests based on tokens, each of which represents a fixed amount of work. The tokens may be added to a token bucket at rate that is dependent on a target work throughput rate while the number of tokens in the bucket does not exceed its maximum capacity. If at least a pre-determined minimum number of tokens is present in the bucket when a service request is received, it may be serviced. Servicing a request may include deducting an initial number of tokens from the bucket, determining that the amount of work performed in servicing the request is different than that represented by the initially deducted tokens, and deducting additional tokens from or replacing tokens in the bucket to reflect the difference.
Abstract:
A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. The system may determine whether it is operating in an overloaded or underloaded state based on a current work throughput rate, a target work throughput rate, a maximum request rate, or an actual request rate, and may dynamically adjust the maximum request rate in response. For example, if the maximum request rate is being exceeded, the maximum request rate may be raised or lowered, dependent on the current work throughput rate. If the target or committed work throughput rate is being exceeded, but the maximum request rate is not being exceeded, a lower maximum request rate may be proposed. Adjustments to the maximum request rate may be made using multiple incremental adjustments. Service request tokens may be added to a leaky token bucket at the maximum request rate.