Exploitation of data from a EU intellectual property and contract law perspective
A "three-step process"
06 August 2021
As we are all well aware, data and the many use cases for unlocking its hidden potential are becoming increasingly important for virtually all industries and sectors. Common buzzwords like "Oil of the 21st Century" or "Industry 4.0" refer to a global trend focusing on the exploitation and, in many cases, monetisation of data. While the potential legal challenges keep scholars and practitioners in a wide range of legal fields (e.g. data protection, cyber security, commercial and antitrust law ) busy, we see room for improvement for data-driven businesses when it comes to a robust contractual basis for a clear allocation of ownership and use rights in such data. This is especially problematic in view of the fact that a comprehensive legal framework for rights in data is currently still lacking (with a "patchwork" of IP, know-how and database protection law in the EU), ultimately leaving it to stakeholders to ensure data rights allocation through respective (license) agreements. However, in many cases, the exploitation of data follows a "three-step process" – collection and preparation of input data, analysis of such input data, generation of output data – resulting in certain "To Dos" for companies when it comes to drafting and negotiating the underlying contractual framework. In this article we would like to provide you with a brief overview of some important aspects on each of these three levels from an IP perspective.
Data exploitation as a "three-step process"
The variety of current and future digital business models relying on the exploitation of data is huge. In principle, two main groups of data exploitation models can be distinguished based on their purpose:
- Internal data: Internal business data (e.g. machine data, behavioural data, financial data) is collected and then analysed to improve internal processes within a company (e.g. for a more efficient production of goods).
- External data: External data (e.g. point-of-sale data) is collected, analysed and then used to better understand customers and to cater to their particular needs.
The second group (external data) is of particular interest for two reasons. First, apart from simply improving a company's product, it can also involve the provision of further services (e.g. consultancy services) to customers based on the data analysed, creating a further layer of potential monetisation and income.
Second, if the data is obtained from an external source, a company will want to ensure that it can use, and, if necessary, further prepare (e.g. clean, enrich, reduce) such external data permanently for its business purposes (including the right to license/sell any new data derived from such data), without running the risk that the data provider claims any rights to the analysis results (or parts thereof) at a later stage (in particular if the data provider suddenly realises that the data is considerably more valuable than it thought, e.g., novel use cases). As recent numbers show, IP and data claims are on the rise. By June 2021 over 2427 IP and data claims were filed before English courts (a rise of 350% compared to 2020).
However, independent from the particular business model, we observe that data exploitation usually follows a three-step process:
1) Input Data --> 2) Data Analytics --> 3) Output Data
These three steps appear simple, but in fact they are not. Each of them provides for a variety of legal challenges (with the question of IP ownership / licensing as discussed here being only one of them!) which should be addressed as early as possible in order to avoid subsequent legal uncertainties or even litigation.
Rights in data
With the onset of the era of digital transformation of businesses, (personal and non-personal) data has become a new type of asset and the existing legal framework must provide answers in terms of ownership and transfer of rights. However, it is important to understand that unsorted or "raw" (industrial) data itself (usually produced in huge quantities, e.g. by machines/sensors), in many cases, does not constitute an original work of authorship by a human, i.e. a copyrighted work, or such copyright protection is at least doubtful. While recent EU legislation efforts for the creation of a new type of "ancillary copyright" regarding non-personal data have been halted due to various legal concerns regarding its scope and impact, existing IP protection regimes seek to fill the gap.
With respect to databases, their creators may claim statutory sui generis database rights if they invested substantially in the collection of such data (e.g. Section 87a German Copyright Act and Section 102-bis Italian Copyright Act, both introduced in 1996 based on EU Directive 96/9/EC on the legal protection of databases). In exceptional cases, output data may even enjoy patent protection if its particular data structure is the result of patented software (German Federal Court (of Justice?) ruling, published in GRUR 2012, 1230, MPEG-2-Videosignalcodierung). Where data constitutes a trade secret as defined in the Trade Secret Directive, the holder of the secret may invoke trade secret infringement and claim damages against the infringing party.
Due to these statutory uncertainties – and the fact that it is not entirely clear which IP protection regime applies to particular data or a particular database in any given case and whether such regime would sufficiently clarify the ownership situation for such data – it is strongly advisable to put a contractual framework in place which creates the basis for a transparent data rights allocation against the background of the above three-step process.
A robust contractual foundation is key
Let's break down the above three-step process of data exploitation with respect to the legal challenges discussed below:
1. Input Data
At the beginning of every data analysis process is the "original" or "raw" data concerning a certain matter or task ("Input Data"). The Input Data must be of certain quality in order for to achieve meaningful analysis results. Depending on the business model (see II. above), Input Data can be generated by a company internally, by means of the automatic searching or collection of large quantities of publicly available data (e.g. scraping, data mining) and/or by obtaining respective data from a third party data provider ("Data Provider"). Especially in the latter case, it is a good idea to clearly stipulate in the respective data provision contract, inter alia, (i) the scope of the licence in the Input Data, i.e. the permitted use and potential modification (e.g. to enrichment with other datasets if required) of the licensed data, and (ii) the allocation of any IP rights in the – valuable – data / databases created based on the Input Data ("Output Data") in order to minimise the risks of any subsequent disputes with the Data Provider.
This is particularly true in terms of Output Data as the Data Provider may wish to argue that it has acquired (co-)ownership, or at least usage rights, to the Output Data due to it being based, to some extent, on the Input Data owned by the Data Provider. Regardless of whether invoking such rights would hold up before a court of law, a corresponding dispute, e.g. under most EU copyright laws, where the question of copyright ownership / infringement would need to be determined (if necessary with the aid of an expert opinion) can be extremely time-consuming and cost-intensive and should be avoided wherever possible.
In this context, we often note that agreements with Data Providers lack clear IP provisions, often diluting IP provisions with data protection rules (related to personal data only) and other provisions regarding data access in the respective case. IP and data protection language should, however, always be kept strictly separate!
Apart from Data Providers, expert Data Scientists specialized in the collection, curation and analysis of data need to be engaged in order to obtain the best analysis results possible. As with Data Provider, agreements with Data Scientists should encompass clear language on the allocation of any IP used and created in the context of his/her engagement.
2. Analysis method / software
The second step involves the actual analysis of the Input Data through respective software solutions, commonly involving certain algorithms or even artificial intelligence ("AI"). Here, IP ownership and licensing need to be taken into account with respect to the Output Data as well as the analysis software itself (which may consist of different components such as the analyis technology/tool, a storage application as well as a frontend dashboard for the decision-making process of the user). If the software used for the analysis is provided by a third party, the company using the software will need to pay attention to whether the software provider expects to obtain any rights in the Output Data as a result of the software's use (which would of course not be desirable for the company seeking exclusive rights to exploit/monetise the Output Data for its own purposes).
If the analysis software is developed by the company itself, it should take the possibility of patent protection into consideration. While the threshold for the registration of software solutions as a patent is generally still rather high, a large number of patent applications and registrations in the field of AI demonstrate that potential patent protection should never be overlooked for these types of developments. Further legal challenges may arise if the software is run on external servers rather than on the company's own IT systems, e.g. in a cloud environment hosted by a third party (AWS, Google etc.), as another player (i.e. the provider of the cloud platform) may become part of the data analysis process.
3. Output Data
The third step encompasses the Output Data, which embodies the actual value generated in the data analysis process and must therefore be subject to a particular level of protection. Like Input Data, Output Data may hold value due to being known only to its creator and therefore may constitute a trade secret, which needs to be subject to appropriate security measures to maintain its protection as a trade secret under the EU know-how protection Directive (as implemented in the EU Member States). Other protection regimes may be possible in parallel, in particular data base rights (see III. above).
The Output Data can be used in two ways. Either strictly internally by the company (e.g. to improve its own processes or products) or it can be licensed to a third party, which can be any customer (e.g. for the customer's further use) or in some cases even the Data Provider itself. Depending on the particular services provided, a respective license agreement governs the scope of the license in the Output Data granted to the customer and should encompass, inter alia, robust obligations to maintain the data's confidential nature. The scope of the licence may vary based on the particular case and the customer's needs. Of course, even such Output Data could, in theory, be subject to further analysis by the customer's own software or be combined with other data – requiring the licence granted in the Output Data to be tailor-made to the particular situation and business model.
Alternatively, exchanges of data may rely on data assignment schemes, whereby the data holder supplies data to the assignee: this contractual framework minimizes the risk of complex post-termination issues that are common to most licensing scheme (who retains rights in data?), albeit the legal uncertainties regarding the subsistence of ownership rights in data (other than trade secret) creates the need for a clear definition of which rights are assigned (e.g. right to access data, rather than a debatable ownership in data).
Whatever the legal scheme (licensing, assignment), data monetization also triggers complex commercial issues, e.g. what is the economic value of data? The parties may wish to rely on different criteria (e.g. business / cost / economic / market value of data), but creating valuable data always requires a great deal of attention in structuring Output Data, for example: (i) ensuring data cleanliness, (ii) ensuring explainability, i.e. transparency and auditability of AI generating Output Data, (iii) ensuring adequate data governance (e.g. is personal data retained in accordance with data protection laws? Is data unbiased?). The better the data governance, the higher the sale/license price (and the lesser the liability risk!). Consistently, recent courts' and regulators' decisions in the EU, together with the draft AI Regulation, set high transparency and explainability standards for businesses wishing to invest in AI-based data analytics.
Conclusion and outlook
While there is still some degree of legal uncertainty regarding the legal nature of and rights in (non-personal) data, the current legal framework in Germany and Italy (and most likely under EU law in general) allows for respective rights allocation on the contractual level between stakeholders. While we expect legislators to further address these issues concerning data ownership at some point in the next few years, market actors may need to take matters into their own hands. The need to provide for a robust contractual basis to secure their IP rights at an early stage in order to keep control of their data (property) and prevent its improper use by other parties. In any event, implementing a respective company strategy regarding the use of data will become crucial given the EU Commission's expectations for 2025 of an increase in data volume of about 530 per cent (172 zettabytes) and a data economy worth EUR 829 billion.