Résumé

Reliability is one of the key performance factors in data centres. The out-of-scale energy costs of these facilities lead data centre operators to increase the ambient temperature of the data room to decrease cooling costs. However, increasing ambient temperature reduces the safety margins and can result in a higher number of anomalous events. Anomalies in the data centre need to be detected as soon as possible to optimize cooling efficiency and mitigate the harmful effects over servers. This article proposes the usage of clustering-based outlier detection techniques coupled with a trust and reputation system engine to detect anomalies in data centres. We show how self-organizing maps or growing neural gas can be applied to detect cooling and workload anomalies, respectively, in a real data centre scenario with very good detection and isolation rates, in a way that is robust to the malfunction of the sensors that gather server and environmental information.

Détails

Actions