Analysis of streaming data for security

Dr. Nikos Nikolaou, Project Manager

ITML as a technology provider is participating in Data Management and Network Analysis activities of the ROXANNE project. In a multi-disciplinary technological field, which stretches the potential of law enforcement agencies in the fight against crime and terrorism, ITML is leading the efforts on social media data ingestion, as well as on information fusion from component technologies.


In an era that has seen tremendous progress in Information and Communication Technology (ICT), societies seem to be more vulnerable than ever. The reason is that technological innovation and accessibility of ICT infrastructures is also related to threats coming from not only the physical and natural space, but also the cyber space. In the modern world, where numerous applications and devices generate enormous amount of data resulting to the Big Data phenomenon, security and protection are essential in international, regional and national level both in the private and public sector. Currently, security experts are working hard to defend people and organizations, because data streams are reaching more and more dynamic characteristics in terms of volume, velocity and even value, volatility, variability and veracity. These streaming data are mostly created in a real-time process, which means that data analysis and correlation must also be carried out in real-time.

Over the last decade, data mining and analytics have become key players in security. Consequently, Law Enforcement Agencies (LEAs), supported by ICT scientists, make use of audio/image/text analysis and automated search in databases to perform their investigations. Streaming analysis seems to be a new trend in computing and at the same time is in high demand for the majority of organizations. In stream computing, data of arbitrary size originating from various sources are processed in real-time. Streaming analysis is the armor against physical and cyber attacks and network intrusion, which have also non-static characteristics. Systems that conduct analysis of streams must also monitor new information, gradually build models and finally detect anomalies in data that deviate from predictions of the model. Normalization of data, (such as removal of duplicates or outliers) are processes included in data analysis node.

If a LEA is aiming for a quick reaction to a detected criminal issue or even further if is interested on trends detection in order to acquire knowledge, data streaming comprises a powerful tool. However, several challenges exist due to the streaming characteristics, defining a landscape for technological improvements. Scalability is one of the main issues. Due to the exponential growth of data, frameworks and algorithms must follow the growth and complexity of data. Architectures and platforms responsible for data handling are extremely sensitive to the time factor. Efficiently tackling security threats and frauds, while detecting criminal networks and identifying their members is crucial to prevent outdated data or tardiness in their flow. High fault-tolerance together with high throughput are also in demand for data stream environment. Finally, a data streaming environment must be characterized by accuracy and protectiveness of data streams, ensuring privacy not only at an individual level.

The field of data streaming analytics as a weapon against organized crime is continuously growing, offering a large number of alternative tools and technologies. These analytics solutions for data streams can be provided both by the open source community and enterprise technology vendors. Streaming platforms act in three levels, building a full circle:

  • data stream in
  • data processing
  • data stream out

However, to achieve the maximum data management efficiency, a list of factors must be taken into consideration, when adapting a tool or a technology. Different platforms are available for Big Data storage and analytics. The application sector, the shape of data and the profile of workload are also indicators for the most appropriate platform. Moreover, the integration issue is of great importance. The definition of how other applications inside a security organization (e.g., LEA) match the streaming platform and how users have access on the available data must be examined very carefully. The aforementioned factors and characteristics and the significance of each one separately reveal the best choice and solution for the security organization. Of course, each available streaming platform does not fulfil the full set of critical factors mentioned before. However, a decision must be taken based on the problem to address and the objectives of research and analysis.

ROXANNE comprises a Project that enhances the efforts of fight against crime and terrorism. The ROXANNE platform, the cornerstone of technical development, will develop and provide criminal network analysis based on speech, language and video technologies. In that direction, extracted data of speech, text and video analysis will comprise the raw material of data fusion processes. ITML is responsible for data streams alignment that will be carried out by ITML’s Data Fusion Bus (DFB). DFB enables organizations in developing, deploying, operating and managing a big data environment with emphasis on real-time applications. It combines the features and capabilities of several big data applications and utilities within a single platform.

The key capabilities of DFB are:

  • Smart Production Digitization and IoT (g., H2020 ECSEL Projects Productive4.0, 2017-2020 and I-MECH, 2017-2020);
  • Data aggregation from heterogeneous data sources and data stores;
  • Real-time analytics offering ready-to-use Machine Learning algorithms for classification, clustering, regression, anomaly detection;
  • An extendable and highly customizable User Interface for Data Analytics, manipulation and filtering. The UI also includes functionality for managing the platform;
  • Web Services for exploiting the platform outputs for Decision Support.

 

dfb
Figure 1 - DFB: A Data Analytics solution.


ROXANNE project has received funding from the European Union’s Horizon 2020 Work Programme for research and innovation 2018-2020, under grant agreement n°833635. ROXANNE project started in September 2019, its duration is 3 years and is coordinated by Idiap Research Institute. To find more about ROXANNE and its 24 partners (LEAs, SMEs, Industries and Academia) follow us on twitter and connect with us on Linkedin.