A Multivariate KPIs Anomaly Detection Framework With Dynamic Balancing Loss Training

Image credit: Unsplash

Abstract

Anomaly detection on multivariate KPIs (Key Performance Indicators, such as CPU utilization, sockets status, and HTTP requests per second) is of utmost importance to the systems’ reliability. Unsupervised methods have been of considerable interests and have significantly progressed due to their superior effectiveness. However, the state-of-art unsupervised anomaly detection methods still suffer from high false or missed alarm rates. To this end, in this paper, we propose MM, a practical M ultivariate KPIs anomaly detection framework following the principles of M ulti-task learning with the proposed dynamic balancing loss function. To capture KPIs’ characteristics to the most extent, we simultaneously train multiple sequential autoencoders with different connections based on a designed semi-Random Connection Recurrent Neural Network (sRC-RNN). These autoencoders can be treated as different reconstruction tasks while training. Furthermore, we propose a dynamic loss function to adaptively balance the tasks’ weights. Extensive experiments show that MM outperforms the state-of-art unsupervised multivariate KPIs anomaly detection algorithms and achieves an average F1-score of 0.95 on two public machine-level KPIs datasets and 0.96 on an internal container-level KPIs dataset.

Publication
IEEE Transactions on Network and Service Management (TNSM)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Supplementary notes can be added here, including code and math.