Cost-Efficient Dynamic Management of Cloud Resources Via Supervised Learning
[Thesis]
Wajahat, Muhammad
Gandhi, Anshul
State University of New York at Stony Brook
2020
200
Ph.D.
State University of New York at Stony Brook
2020
In the past decade, cloud computing has disrupted the computing landscape by providing economical compute and storage resources as services to end-users. Many web-based internet applications, such as Airbnb, Pinterest, and Expedia, are now hosted on public clouds because of the cost-effectiveness and scalability that the cloud provides. However, for cloud providers such as Amazon and Microsoft, providing cloud computing resources on-demand with high availability and elasticity features to latency-sensitive clients requires careful management of these resources to avoid client abandonment. Specifically, the allocation and reallocation of shared computing resources among clients must be done carefully to meet their SLA performance requirements and to realize the full potential of cloud computing -- increasing resource utilization and lowering costs. Prior works either focus on empirically tuning the amount of resources for a given application or develop analytical performance models with the goal of determining the optimal resource allocation for applications. Unfortunately, both approaches have their shortcomings. While the former is application-specific and not generalizable, the latter requires expert application and system knowledge to build accurate models. To solve the resource management problem for different application classes, we need to understand the workload characteristics as well as the relationship between resource allocation and system performance. In this thesis, we use supervised learning techniques to accomplish these objectives. To accomplish the first objective, we use learning and optimization to analyze and model the request-level characteristics of production traces, and show how certain models can be developed to answer questions about system performance under dynamic conditions. These models can also be used to optimally provision resources when a change in workload is known (or can be predicted) ahead of time. However, changes in workload are rarely known in advance, which makes dynamic resource allocation more challenging. To accomplish the second objective of dynamic resource allocation under uncertain workload changes, we consider two scenarios: (1) The scenario where the underlying physical servers that host the cloud are always available. For this scenario, we solve the problem of autoscaling -- the dynamic provisioning of cloud resources for applications that experience time-varying load. (2) The scenario where the underlying servers themselves might become unavailable. For this scenario, we solve the problem of rehoming -- the reallocation of resources across servers for cloud-hosted applications. To analyze and understand workload characteristics, we employ distribution fitting to request-level characteristics of production traces. More specifically, we model the inter-arrival times and service times to well known distributions (or mixtures of them) that enables powerful predictive capabilities in the form of Markov Chain models. These models not only predict system performance, but are also capable of answering what-if questions with respect to changes in workload or the system state. We present our analysis of more than 1600 production traces from storage and internet applications, and show that the Hyperexponential distribution (H2) is well suited for such workloads and can be leveraged to build powerful models due to its properties (i.e. Markovian, heavy-tailed, flexible). To address the autoscaling challenge, we propose MLscale, a machine learning based application-agnostic autoscaler that dynamically allocates and deallocates computing resources for cloud applications in response to unpredictable changes in workload demand. MLscale provides near optimal resource usage while reducing SLA violations without requiring expert application knowledge or tuning. We systematically evaluate the autoscaling of multi-tiered applications from downstream to upstream tiers, starting with the application tier and ending with the persistent storage tier. To address rehoming, we propose MERIT, a model-driven rehoming solution for reallocating compute resources for existing services. In certain cloud environments, unpredictable events such as hotspots or system upgrades, require the evacuation of certain physical servers, thus necessitating the timely resource reallocation of hosted applications through various rehoming actions like live-migration or rebuilding. To optimize rehoming, we employ machine learning techniques, including regression and neural nets, to build cost models for various rehoming options. We propose to leverage these models, to find the optimal rehoming strategy for distributed applications that minimizes the rehoming resource cost and application downtime. It is our thesis that learning-based techniques provide the right balance between accurate but application-specific empirical approaches and generalized but possibly inaccurate analytical approaches towards resource management in the cloud. Such learning-based resource management techniques can greatly enhance the cost-savings of a cloud provider (by up to 40% in our implementation results) while providing the necessary performance SLA guarantees.