Machine Learning based application identification has been intensively studied for applications such as application specific network slicing and zero-rating services. In machine learning, a test dataset, which is usually divided chronologically in the collected dataset, is used to evaluate a trained model.
We advocate that it is often the case with mobile network analysis that this way of training and validation is irrelevant because overestimation of a trained model may occur when the data from one user is included both in training and in test dataset. In this paper, we propose to use IMEI to identify users and isolate test set from the dataset. We observe that conventional method overestimates by about 4% of accuracy on average and by 10% in the worst case compared to our evaluation using IMEI-based split method. In addition, our evaluation also shows the necessity of the IMEI instead of source IP for data isolation, as a single UE may be assign to multiple source IPs over time and thus source IP may not be a substitute for IMEI.

岩井貴充, 中尾彰宏. “機械学習によるモバイルアプリケーション判定の検証方法と問題点”. 信学技報, vol. 119, no. 92, NS2019-40, pp. 29-34, 2019年6月. copyright©2019 IEICE