Abstract:
An optimized computer architecture for training an neural network includes a system having multiple GPUs. The neural network may be divided into separate portions, and a different portion is assigned to each of the multiple GPUs. Within each GPU, its portion is further divided across multiple training worker threads in multiple processing cores, and each processing core has lock-free access to a local parameter memory. The local parameter memory of each GPU is separately, and individually, synchronized with a remote master parameter memory by lock memory access. Each GPU has a separate set of communication worker threads dedicated to data transfer between the GPU and the remote parameter memory so that the GPU's training worker threads are not involved with cross GPU communications.
Abstract:
A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.
Abstract:
Embodiments are described for dynamically responding to demand for server computing resources. The embodiments can monitor performance of each of multiple computing systems in a data center, identify a particular computing system of the multiple computing systems for allocation of additional computing power, determine availability of an additional power supply to allocate to the identified computing system, determine availability of a capacity on a power distribution line connected to the particular computing system to provide the additional power supply to the particular computing system, and allocate the additional computing power to the identified computing system as a function of the determined availability of the additional power supply and the determined availability of the capacity on the power distribution line. The computing systems selected for reducing power consumption can be selected based on a priority order.
Abstract:
Embodiments are described for dynamically responding to demand for server computing resources. The embodiments can monitor performance of each of multiple computing systems in a data center, identify a particular computing system of the multiple computing systems for allocation of additional computing power, determine availability of an additional power supply to allocate to the identified computing system, and selectively enable or disable a turbo mode of processors associated with the computing devices.
Abstract:
A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.
Abstract:
Embodiments are described for dynamically responding to demand for server computing resources. The embodiments can monitor performance of each of multiple computing systems in a data center, identify a particular computing system of the multiple computing systems for allocation of additional computing power, determine availability of an additional power supply to allocate to the identified computing system, determine availability of a capacity on a power distribution line connected to the particular computing system to provide the additional power supply to the particular computing system, and allocate the additional computing power to the identified computing system as a function of the determined availability of the additional power supply and the determined availability of the capacity on the power distribution line. The computing systems selected for reducing power consumption can be selected based on a priority order.
Abstract:
A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.
Abstract:
An optimized computer architecture for training an neural network includes a system having multiple GPUs. The neural network may be divided into separate portions, and a different portion is assigned to each of the multiple GPUs. Within each GPU, its portion is further divided across multiple training worker threads in multiple processing cores, and each processing core has lock-free access to a local parameter memory. The local parameter memory of each GPU is separately, and individually, synchronized with a remote master parameter memory by lock memory access. Each GPU has a separate set of communication worker threads dedicated to data transfer between the GPU and the remote parameter memory so that the GPU's training worker threads are not involved with cross GPU communications.
Abstract:
A computing system operates according to a method including: processing representations of housing structures with open locations for physically locating computing resources, a physical layout of the open locations, and characteristics of the structures and the resources to generate designated locations for optimally placing or allocating the computing resources in the open locations. The designated locations are generated based on analyzing multiple possible allocation or placement combinations of the computing resources into the open locations as an optimization function.
Abstract:
Embodiments are described for dynamically responding to demand for server computing resources. The embodiments can monitor performance of each of multiple computing systems in a data center, identify a particular computing system of the multiple computing systems for allocation of additional computing power, determine availability of an additional power supply to allocate to the identified computing system, and selectively enable or disable a turbo mode of processors associated with the computing devices.