Abstract
Today’s data centers have thousands of servers that belong to multiple vendors, and they vary vividly in terms of technology, model, generation and numerous peripherals like firmware, software, and other hardware components. It is a challenge for the data center administrators to keep a track of activities being executed across these servers and the resource consumption at various stages of deployment, delivery, installation, and upgrades. This leads to a subsequent ripple where the data center administrators find it tough to foresee the availability and/or consumption of the
GPU in their data centers. This poses a risk when they need resources to be available for a critical activity for the future but because of an overburdened data center, the resources are unavailable, and the critical tasks fail. On the other hand, if the data center always remains underutilized, the unused servers in the data center become a liability and bring the ROI down. This not only leads to unmitigable risks but also has monetary impact on the profitability of the organization. This also leads to poor resource planning because of lack of visibility of future needs. This paper focuses on
designing a forecasting solution for the data center admins. The solution includes performing data explorations and develop enough understanding of the various metrics of the servers in the data center like CPU, IO, Power etc. and develop univariate/multivariate forecasting solutions using the time series data sets of these metrics. This paper proposes to explore TDM as well as GAM models for predicting behaviors of various factors of data centers and CSPs. This solution would assist the data center admins to have a clearer visibility of the future load/availability of the gpu in their data centers. By having so, they would be in a better state to understand the utilization metrics for their data centers, better financial planning for future investments based on the clear visibility of under/over-utilized resources. This will also lead to better data center resource optimization and rationalization.