Identification of Mobile Applications via In-Network Machine Learning

Abstract

Machine Learning based application identification has been intensively studied for applications such as application specific network slicing and zero-rating services. In machine learning, a test dataset, which is usually divided chronologically in the collected dataset, is used to evaluate a trained model.
We advocate that it is often the case with mobile network analysis that this way of training and validation is irrelevant because overestimation of a trained model may occur when the data from one user is included both in training and in test dataset. In this paper, we propose to use IMEI to identify users and isolate test set from the dataset. We observe that conventional method overestimates by about 4% of accuracy on average and by 10% in the worst case compared to our evaluation using IMEI-based split method. In addition, our evaluation also shows the necessity of the IMEI instead of source IP for data isolation, as a single UE may be assign to multiple source IPs over time and thus source IP may not be a substitute for IMEI.