SOURCE AND DATA PLATFORM FOR ECONOMIC AND FINANCIAL INFORMATION IN EUROPE
 

Introduction

The following sections explain the key components of the DOPA vision. First, there is an introduction to the data pools, which are the foundation of the project. Data from these pools is processed using data supply chains, which are outlined afterwards. Linking data from different data pools is an essential aspect of DOPA, which is addressed subsequent section. The last section of this site outlines the use-cases, which serve as application scenarios for all DOPA technologies.

Data Pool: Large Scale Web and Social Media

The goal of this data pool is to provide a large-scale multilingual time series of economic and financial information extracted from the web and online social networks (OSN).

For DOPA, this data pool is a vital source of information that provides the foundation for all use-cases that seek to analyse unstructured data from the web.
The key challenges in creating such a data pool are what to crawl, how to interpret the data, how to account for privacy matters and how to compensate the respective content producers.

The DOPA partner IMR maintains this data pool by running it’s application-aware crawler which adapts to content management systems and thus efficiently extracts key information from the surrounding noise (formatting, boilerplates) and qualifies the content from a functional standpoint (post vs. comments for instance). For our current use-cases the crawler concentrates on relevant and actives sources such as RSS feeds, news, forums, blogs or OSNs.
To guarantee citizens rights for privacy IMR will develop scalable methods for the anonymisation of the extracted content, especially from conversational source. Additionally, usage guidelines, which comply with EU regulation, will be checked and constantly monitored as the legal framework and the societal demand in this domain evolves.
To ensure fair exploitation of the data, IMR will develop a compensation model for content producers, inspired by the current trading framework explored by some OSN like Twitter.

Data Pool: Preparation and Supply of Statistical Data Sets

Beside raw text data, DOPA also features a structured data pool containing around 30 thousand datasets with 200 million time series and 1.5 billion fact values. This data has been amassed from both open data sources and proprietary data sources alike.

The structured information in this pool is of very high quality and is meant to supplement the potential unstructured information in other data pools.
In order to correctly link data items, each data set requires a correct semantic identity, which describes exactly what kind information is represented by the respective data sets.
Throughout the project the DOPA partner DataMarket will continue to add socio-economic and financial data, and enrich it with semantic information and entity linkage, to ensure maximum utility of the data pool.
Extending the DataMarket platform with linked-data annotations and endpoint will enable not only interlinking datasets of the data pool, but also extension to external data pools, improving data search, discoverability and reuse.
Semantically enabling the data pool will support intelligent visualization choices, guidance, and tooling for data analysis.

Metadata will be exposed via linked data endpoints, enabling DOPA partners and end users to link to any data set, provider, time series or even individual data point on the DataMarket platform.

Data Supply Chain Technology and Platform

DOPA relies on data supply chains for creating data pools as well as for information access and processing.

A data supply chain is a data flow, which may extract, aggregate, annotate and link relevant information from structured or unstructured data sources.

The DOPA partner TU Berlin will design an extensible language for the specification of data supply chains. 
We will devise strategies for compiling this language to scalable data processing platforms and explore opportunities for improving supply chain execution by automatic scheduling heuristics and automatic optimization of the processing.

Data Linkage Service

Efficiently merging data from different data pools requires a measure for determining which data sets are related.

In the DOPA context, the most promising approach for such a measure is to provide unique entity identifiers. Here, the same financial or economic entities receive also the same identifiers.
The availability of such identifiers for entities will enable the integration of information extracted from the data pools.
The data linkage service of the DOPA partner OKKAM will provide so called OKKAM-IDs for entities from a description, which consists of attribute-value pairs.
The entity name system provides a set of web services to query its index, create, merge and update entities.
The description provided in a request will be compared with descriptions stored in the entity name service. As a response the entity name service will return a matching entity or a list of candidate identifiers of entities along with a confidence value.
Thus, data supply chain users can link their key entities to information in the data pools related to these entities.

Use case: Risk Management

One of the target use cases of the DOPA project is the utilization of the data pools and their associated services for risk management business cases.

To ensure applicability of the DOPA research results it is crucial to have a business case that matches the needs of real word economic entities. This way the consortium verifies that the capabilities of the DOPA infrastructure components match the use-case and not the other way around.

Different kinds of financial risks, such as operational risk, market risk, equity capital risk and credit risk, affect economic undertakings.
To find relevant risk management use cases, the DOPA partner VICO Research collaborates with Commerzbank, LBBW, TU-Dresden, and PlanB to find measureable factors that influence these risks.
After evaluating the available data pools with regard to these factors VICO will define a business case, which serves as the target for the data supply chain language and data linkage features.

As a final step VICO research will implement a risk management demonstrator based on the business-case using DOPA’s data supply chain language.

Use case: Market Intelligence

Market Intelligence is a key topic for small and medium enterprises. The actual implementation of market intelligence solutions requires expert knowledge and technical resources. DOPAs market intelligence use-case demonstrates DOPAs ability to lower this entry barrier.

Market Intelligence applications requires collecting and analysing large amounts of structured data as well as textual data such as text messages, articles, posts and comments.

The DOPA partner AMI Software will implement a demonstrator, which enables business end-users to easily consume, explore and discover marketing and economical data and information provided by the DOPA platform. The demonstrator will generate data supply chains from the users queries for market names, companies, or products. Comprehensive and interactive dashboards present the resulting output.

The demonstrator serves two purposes within the DOPA project. First it will help to ensure that the DOPA platform is targeted towards a real-word use case. Second, the demonstrator will showcase how small and medium enterprises can benefit from the DOPA platform. It will allow them to gather strategic and competitive intelligence such as early warning signals, market opportunities and customer orientation.