For DOPA, this data pool is a vital source of information that provides the foundation for all use-cases that seek to analyse unstructured data from the web.
The key challenges in creating such a data pool are what to crawl, how to interpret the data, how to account for privacy matters and how to compensate the respective content producers.
The DOPA partner IMR maintains this data pool by running it’s application-aware crawler which adapts to content management systems and thus efficiently extracts key information from the surrounding noise (formatting, boilerplates) and qualifies the content from a functional standpoint (post vs. comments for instance). For our current use-cases the crawler concentrates on relevant and actives sources such as RSS feeds, news, forums, blogs or OSNs.
To guarantee citizens rights for privacy IMR will develop scalable methods for the anonymisation of the extracted content, especially from conversational source. Additionally, usage guidelines, which comply with EU regulation, will be checked and constantly monitored as the legal framework and the societal demand in this domain evolves.
To ensure fair exploitation of the data, IMR will develop a compensation model for content producers, inspired by the current trading framework explored by some OSN like Twitter.
The structured information in this pool is of very high quality and is meant to supplement the potential unstructured information in other data pools.
In order to correctly link data items, each data set requires a correct semantic identity, which describes exactly what kind information is represented by the respective data sets.
Throughout the project the DOPA partner DataMarket will continue to add socio-economic and financial data, and enrich it with semantic information and entity linkage, to ensure maximum utility of the data pool.
Extending the DataMarket platform with linked-data annotations and endpoint will enable not only interlinking datasets of the data pool, but also extension to external data pools, improving data search, discoverability and reuse.
Semantically enabling the data pool will support intelligent visualization choices, guidance, and tooling for data analysis.
Metadata will be exposed via linked data endpoints, enabling DOPA partners and end users to link to any data set, provider, time series or even individual data point on the DataMarket platform.
A data supply chain is a data flow, which may extract, aggregate, annotate and link relevant information from structured or unstructured data sources.
The DOPA partner TU Berlin will design an extensible language for the specification of data supply chains. We will devise strategies for compiling this language to scalable data processing platforms and explore opportunities for improving supply chain execution by automatic scheduling heuristics and automatic optimization of the processing.
In the DOPA context, the most promising approach for such a measure is to provide unique entity identifiers. Here, the same financial or economic entities receive also the same identifiers.
The availability of such identifiers for entities will enable the integration of information extracted from the data pools.
The data linkage service of the DOPA partner OKKAM will provide so called OKKAM-IDs for entities from a description, which consists of attribute-value pairs.
The entity name system provides a set of web services to query its index, create, merge and update entities.
The description provided in a request will be compared with descriptions stored in the entity name service. As a response the entity name service will return a matching entity or a list of candidate identifiers of entities along with a confidence value.
Thus, data supply chain users can link their key entities to information in the data pools related to these entities.
To ensure applicability of the DOPA research results it is crucial to have a business case that matches the needs of real word economic entities. This way the consortium verifies that the capabilities of the DOPA infrastructure components match the use-case and not the other way around.
Different kinds of financial risks, such as operational risk, market risk, equity capital risk and credit risk, affect economic undertakings.
To find relevant risk management use cases, the DOPA partner VICO Research collaborates with Commerzbank, LBBW, TU-Dresden, and PlanB to find measureable factors that influence these risks.
After evaluating the available data pools with regard to these factors VICO will define a business case, which serves as the target for the data supply chain language and data linkage features.
As a final step VICO research will implement a risk management demonstrator based on the business-case using DOPA’s data supply chain language.
Market Intelligence applications requires collecting and analysing large amounts of structured data as well as textual data such as text messages, articles, posts and comments.
The DOPA partner AMI Software will implement a demonstrator, which enables business end-users to easily consume, explore and discover marketing and economical data and information provided by the DOPA platform. The demonstrator will generate data supply chains from the users queries for market names, companies, or products. Comprehensive and interactive dashboards present the resulting output.
The demonstrator serves two purposes within the DOPA project. First it will help to ensure that the DOPA platform is targeted towards a real-word use case. Second, the demonstrator will showcase how small and medium enterprises can benefit from the DOPA platform. It will allow them to gather strategic and competitive intelligence such as early warning signals, market opportunities and customer orientation.