This book investigates compressive sensing techniques to provide a robust and general framework for network data analytics. The goal is to introduce a compressive sensing framework for missing data interpolation, anomaly detection, data segmentation and activity recognition, and to demonstrate its benefits. Chapter 1 introduces compressive sensing, including its definition, limitation, and how it supports different network analysis applications. Chapter 2 demonstrates the feasibility of compressive sensing in network analytics, the authors we apply it to detect anomalies in the customer care call dataset from a Tier 1 ISP in the United States. A regression-based model is applied to find the relationship between calls and events. The authors illustrate that compressive sensing is effective in identifying important factors and can leverage the low-rank structure and temporal stability to improve the detection accuracy. Chapter 3 discusses that there are several challenges in applying compressive sensing to real-world data. Understanding the reasons behind the challenges is important for designing methods and mitigating their impact. The authors analyze a wide range of real-world traces. The analysis demonstrates that there are different factors that contribute to the violation of the low-rank property in real data. In particular, the authors find that (1) noise, errors, and anomalies, and (2) asynchrony in the time and frequency domains lead to network-induced ambiguity and can easily cause low-rank matrices to become higher-ranked. To address the problem of noise, errors and anomalies in Chap. 4, the authors propose a robust compressive sensing technique. It explicitly accounts for anomalies by decomposing real-world data represented in matrix form into a low-rank matrix, a sparse anomaly matrix, an error term and a small noise matrix. Chapter 5 addresses the problem of lack of synchronization, and the authors propose a data-driven synchronization algorithm.It can eliminate misalignment while taking into account the heterogeneity of real-world data in both time and frequency domains. The data-driven synchronization can be applied to any compressive sensing technique and is general to any real-world data. The authors illustrates that the combination of the two techniques can reduce the ranks of real-world data, improve the effectiveness of compressive sensing and have a wide range of applications.
The networks are constantly generating a wealth of rich and diverse information. This information creates exciting opportunities for network analysis and provides insight into the complex interactions between network entities. However, network analysis often faces the problems of (1) under-constrained, where there is too little data due to feasibility and cost issues in collecting data, or (2) over-constrained, where there is too much data, so the analysis becomes unscalable. Compressive sensing is an effective technique to solve both problems. It utilizes the underlying data structure for analysis. Specifically, to solve the under-constrained problem, compressive sensing technologies can be applied to reconstruct the missing elements or predict the future data. Also, to solve the over-constraint problem, compressive sensing technologies can be applied to identify significant elements
To support compressive sensing in network data analysis, a robust and general framework is needed to support diverse applications. Yet this can be challenging for real-world data where noise, anomalies and lack of synchronization are common. First, the number of unknowns for network analysis can be much larger than the number of measurements. For example, traffic engineering requires knowing the complete traffic matrix between all source and destination pairs, in order to properly configure traffic and avoid congestion. However, measuring the flow between all source and destination pairs is very expensive or even infeasible. Reconstructing data from a small number of measurements is an underconstrained problem. In addition, real-world data is complex and heterogeneous, and often violate the low-level assumptions required by existing compressive sensing techniques. These violations significantly reduce the applicability and effectiveness of existing compressive sensing methods. Third, synchronization of network data reduces the data ranks and increases spatial locality. However, periodic time series exhibit not only misalignment but also different frequencies, which makes it difficult to synchronize data in the time and frequency domains.
The primary audience for this book is data engineers, analysts and researchers, who need to deal with big data with missing anomalous and synchronization problems. Advanced level students focused on compressive sensing techniques will also benefit from this book as a reference.