- Date : 19/07/2019
- Lieu : Inria de Paris - A215
- Intervenant(s) : Demetris Zeinalipour
- Organisateur(s) : EPI Mimove
Decaying Telco Big Data with Data Postdiction
Demetris Zeinalipour, Université de Cyprus
A Telecommunication company (Telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the radio and backbone infrastructure of such entities spanning densely most urban spaces and widely most rural areas, provides nowadays a unique opportunity to collect immense amounts of data that capture a variety of natural phenomena on an ongoing basis, e.g., traffic, commerce, mobility patterns and emergency response.
In this talk, I will present a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction). Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which doesn’t exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary.
Our proposed TBD-DP operator has the following two conceptual phases:
- (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space;
- (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. In our experimental setup we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace and our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data.