This blog discusses the high level methodology of AEGIS.
Step 1: Understanding the stakeholders and their data
One of the challenges we faced during the project’s early steps was to identify all involved parties in the wide set of domains we refer to as Public Safety and Personal Security (PSPS). At a first glance, PSPS might look like a public sector responsibility, however a plethora of private enterprises and organisations are directly or indirectly involved, forming a strong market. Indeed, we came to identify 11 broad stakeholder groups, each of which can be in turn broken down to more specific sub-groups, an indicative set of which is presented in Table 1.
Table 1: AEGIS stakeholders
STAKEHOLDER GROUP | TYPES |
SG1 – Smart Insurance | Insurance Companies, Financial institutions, Insurance brokers |
SG2 – Smart home | Electronics, Smart home technology providers, Safety and security, Energy and Utilities |
SG3 – Smart Automotive | Car manufacturer, Car dealers, Electronics, GPS Navigation System Providers |
SG4 – Health | Nursing homes, Hospitals, Doctors |
SG5 – Public safety / law enforcement | Police, Emergency Medical Service, Fire Service, Search and Rescue, Military |
SG6 – Research communities | Students, Professors, Research institutes |
SG7 – Road Construction companies | |
SG8 – Public sector | Municipalities, Public Authorities |
SG9 – IT Industry | IT software companies, Data scientists, Data Industries |
SG10 – Smart City | Electronics, Smart City technology providers, Smart City planners |
SG11 – End Users | Citizens |
As a next step in outlining the way AEGIS will connect these stakeholders through innovative data-driven solutions and enable the creation of new data value chains, we investigated the data produced and consumed by each group. Thus, we identified the data that each of these stakeholders produces or owns, we investigated the way they leverage them now or the reasons they do not and we tried to get a glimpse of both the needs and the opportunities stemming from the data that they would be interested in but are either currently not available or lack an easy way to harvest, process and extract insights from. Detailed reports on this work can be found in our deliverables hereeither play or could play a dual role, i.e. as data producers and data consumers, depending on the PSPS scenario to be implemented and the nature of the services to be provided.
More details about AEGIS cases and scenarios will be provided in upcoming blogposts, but an indicative value chain that immerses from simply looking at the available data is as follows: Insights on road conditions can be extracted from analysing real-time driving sensor data (coming from the Smart Automotive group – SG3), combined with data coming from the road construction companies (SG7) and public data on car accidents provided by Traffic Police and Emergency Medical Services (SG5) and can be leveraged by (a) insurance companies (SG1) towards enhanced car insurance plans and personalised driver notifications on road hazards depending also on current weather and other contextual information, (b) smart city planners (SG10) collaborating with municipalities (SG8) towards implementing a smart lights network aiming to reduce car accidents caused by poor lighting conditions and road deficiencies inside the city.
The potential of establishing new data value chains among stakeholders is truly immense and the integrated AEGIS value chain is potentially so diverse that may even extend to the complete Big Data Ecosystem [1] (Figure 1)
Step 2: Refining the Big Data Value Chain towards the AEGIS needs
As shown in Figure 1, at the center of the Big Data Ecosystem, at the core of all stakeholder value chains, stands the Data Value Chain which controls the way data from one stakeholder are transformed to value distributed to other stakeholders. The Big Data Value Chain, as defined by Edward Curry [1], comprises five main steps, which are adopted at a high-level and customised to the AEGIS needs, as follows:
- Data Acquisition. AEGIS builds upon a large number of diverse data sources, which include real-time streaming data from home/automotive/city/wearable sensors, as well as satellites, proprietary SQL and no-SQL databases, free text data from social media and information sites, resulting in many technical requirements to be further investigated.
- Data Analysis. In the scope of AEGIS, data analysis involves a variety of data mining methods, including but not limited to, stream data mining and free text mining, which in turn entail time-series analysis and natural language processing, machine learning, etc. Each of these processes brings a number of challenges, such as time series breakout detection and stream frequent pattern mining (for sensor data), multilingualism and lack of structure (in free text) and lack of agreed upon schemas and data standards almost across all the domain. Moreover, although PSPS applications require high accuracy levels, there are inherent data features, e.g the presence of natural language text to be analysed, that render the required analysis not only more labour-intensive, but also error prone.
Another challenge AEGIS aims to tackle is that often enough the criteria used for the analysis of big data cannot -and, under circumstances should not- be known a-priori, but only in analysis time, in order to ensure that the extracted value is not limited by early erroneous decisions. Hence, explorative analysis is at the core of the data analysis step. Exploratory analysis builds on the fact that when analysis starts, the questions to be answered are not (always) known. Questions only emerge a-posteriori together with the extracted answers, which is the case in many of the AEGIS envisioned applications and services.
- Data Curation. In AEGIS this is an umbrella term for various processes regarding data organisation, validation, quality evaluation, and provenance and multiple-purpose annotation. More details on this can be found in our deliverable, so we only mention here 3 important
- The importance of proper definition and measurement of data quality to avoid compromising the value of the final output.
- The need to employ traceable and repeatable curation processes so that data curation steps are verifiable against new versions of data and render the detection of new steps possible.
- The need to avoid irreversible data restructuring. This is a requirement of the previously explained need to enable exploratory analysis, which by definition forbids the application of loss data transformations and compressions, since these may impede future analyses.
- Data Storage, i.e. “the persistence and management of data in a scalable way that satisfies the needs of applications”, which will be discussed in future blogposts presenting the AEGIS architecture.
- Data Usage. Inside AEGIS, this involves various data-driven business activities, the provision of smart decision support and analytics applications, visual analytics and real-time data exploration across all PSPS related fields, to be showcased through the three project demonstrators.
Step 3: Outlining the AEGIS methodology towards data-driven innovation
In the first two steps we identified the AEGIS stakeholders and their data, as well as the actual big data tasks that need to be performed and the challenges to be addressed in order to accomplish the provision of smarter, innovative data-driven services in the PSPS domains. To make this process more concrete, we outlined the expected interactions of the users with AEGIS, in various settings and for various purposes, and collected a set of features and functionalities enabling them.
Naturally, users will interact with the AEGIS system differently, depending on their reason for using its offerings and their background. As the project progresses, specific roles will be designed, but for now the AEGIS users have been grouped under the following high-level categories:
Data provider: The user’s main objective is to make her/his data available for processing or consuming in the AEGIS system.
Service provider: The user’s main objective is to create a service on top of PSPS data that is available through the AEGIS system, leveraging the set of data processing, analysis, visualisation etc tools provided by the system. In this context, a service may be data (to be consumed as-a-service), visualisations, reports, dashboards, RESTful API endpoints etc.
Service Consumer: The user’s main objective is to consume a service offered through AEGIS. In this context, this includes: accessing a visualisation through a link to AEGIS, downloading a report from AEGIS, performing requests to an AEGIS API endpoint etc.
Administrator: This user has advanced capabilities in the AEGIS system and may perform certain jobs that are not offered by the core platform (e.g. through advanced data curation tools that enable more fine-grained data manipulation and/or schema updates), that require extensions of the current system (e.g. adding a new custom algorithm) etc. This is an auxiliary role to highlight the need for non-automated functionalities in certain tasks.
Categories are not mutually exclusive, but are used to better separate and describe the various workflows enabled in the AEGIS system. In an end-to-end usage of the AEGIS system, a user may transparently transit among the categories of service provider, service consumer and data provider.
Figure 3 presents the integrated AEGIS methodology diagram envisioning the high-level workflows of the above user categories and outlines the way AEGIS will materialize the big data value chain.
Remember to check the references for more details on the work performed so far and stay tuned for our upcoming blog posts!
References
- [1] Curry, “The Big Data Value Chain: Definitions, Concepts, and Theoretical Approaches,” in New Horizons for a Data-Driven Economy, Springer, 2016, pp. 29–37.
- AEGIS-D1.1 – Domain Landscape Review and Data Value Chain Definition
- AEGIS-D1.2 – The AEGIS Methodology and High Level Usage Scenarios-v1.0
Blog post authors: NTUA