Go back to menu

Scraping the barrel?

Legal issues arising from data scraping

30 November 2018

Why do you sometimes have to click those diagrams before you can log into a social media platform? Because the website wants to know you are a human – and not a bot looking to scratch its surface with a view to obtaining valuable data (a "data scraper"). Data scrapers extract, aggregate and combine data from various sources. Scraped data is then generally stored on a local system and used for different purposes such as recruitment, sentiment analysis, assessing credit risk, identifying trends, marketing and sales.

While 2018 has seen global organisations focus on doing the right thing when handling data, many are getting it wrong where data scraping is concerned.

On the one hand, organisations often fail to adequately protect themselves from unauthorised scraping. On the other hand, companies which scrape data often do so without realising the legal and brand implications – including whether the scraping is allowed at all. This article considers recent global enforcement trends and some key 'do's and don'ts'.

What is data scraping?

Data scraping is the process of using software to harvest automatically, or "scrape", publicly available data from online sources. The recent Brazilian elections have highlighted the potential for marketing companies to abuse data scraping software. Candidates and political parties allegedly engaged marketeers who used the software to gather phone numbers from Facebook. Using the phone numbers, the marketeers set up WhatsApp groups to distribute fake news to Brazilian citizens. The allegations have spurred Brazil's electoral court to investigate whether this impacted the result of the election, with the scandal being dubbed "WhatsApp gate" on social media.

Scraped data can be stored, copied or analysed for various new purposes. Use cases for data scraping include: 

  • contact scraping – e.g. retrieving email addresses of businesses / individuals to compile a mailing list
  • web scraping – e.g. accessing the underlying code of a website and copying data from that code
  • competitor monitoring – e.g. retrieving price information of competitors, including for monitoring purposes / to help advance an organisation's competitive edge
  • reputation monitoring – e.g. scraping comments made on social media platforms or review sites to monitor business reputation
  • screen scraping – e.g. using scraping tools to emulate a human end-user to extract publicly available data on a large scale

Some estimate data scraping to account for nearly 50% of all website visits.

Who is at risk and from what?

Scraping is often invisible. The scraper knows what they are doing – the scraped site may not. The harvesting will often take place contrary to the terms of use and privacy statements of the sources that are scraped. Regulators want to limit this – and put individuals back in the driving seat.

Clearly scraping can have commercially beneficial uses, such as compiling the results which appear on search engines and aggregating prices of products or services for ease of comparison. However, as the events in Brazil illustrate, scraping can be misused, especially when it is used to access and then store personal data.  

Despite the pervasiveness of data scraping, the potential legal issues which arise from its use have received surprisingly little attention. Companies should be aware of the following risks:

  1. An organisation's content may be vulnerable to scraping by third parties.
  2. There is also potential enforcement risk to companies which use scraping tech as part of their business model, for example where the GDPR applies. 
  3. Supply chain risk can arise from engaging third parties to carry out data analytics on an organisation's behalf and / or from purchasing data from third parties who may have obtained the data through scraping. The regulatory heat in this area is dialling up. 

We briefly consider the potential issues from an EU and US perspective below, highlighting some key recent developments, with a particular focus on data protection law.  

Companies who scrape data

Data scrapers may find themselves on the receiving end of legal action by a scraped business under the following regimes:

Intellectual property: Scraped data may comprise copyrighted work and accordingly, misappropriation can amount to infringement. In the EU, it may also amount to an infringement of the standalone database right. Where the targeted data falls within the scope of a "trade secret" for the purposes of the new Trade Secrets Directive, this will provide the scraped business with another means of pursuit for misappropriation. In the US, a Federal Court in New York ruled in 2013 that a news aggregator that reproduced articles published by the Associated Press verbatim had violated Associated Press's copyright over its articles.

Contract: Website terms of use may explicitly forbid scraping or contain other content restrictions enabling the website owner to sue for breach of contract. There is no clear precedent on whether website terms form binding contracts in the UK. In the absence of authority, it is safer to assume that acting contrary to website terms could give rise to a cause of action. The Irish High Court, for example, ruled in a case involving Ryanair that website terms can be binding. In the US, if a website user is bound by the website's terms of service and causes damage by breaching those terms, the user may be liable for breach of contract.

The Computer Fraud and Abuse Act: In the US, the Computer Fraud and Abuse Act ("CFAA") provides a civil cause of action against anyone who accesses a computer without authorisation, as well as providing for criminal offences. The cause of action has received significant judicial attention and US courts in civil cases have grappled with whether scraping constitutes unauthorised access giving rise to liability under the CFAA.

Although courts have come to differing conclusions, there is a trend to require that scrapers take technical steps to circumvent protections around data before liability under the CFAA can attach.

An important case involving LinkedIn, which considers whether scraping publicly available information from social media websites constitutes a violation of the CFAA, is currently pending a ruling in the Ninth Circuit Court of Appeals in California. LinkedIn has appealed a lower court decision that allowed HiQ Labs to continue scraping publicly available data from its website.

Tort law: The US tort of trespass to chattel, which historically has applied to interferences with the property of another, has taken on new life in the digital age, and a number of cases allege liability for scraping under this cause of action.

A website owner who has suffered scraping will pursue claims using these causes of action in order to obtain an injunction or recover damages. However, where personal data is targeted by scrapers, data scrapers are also at risk from enforcement action, particularly under the GDPR.

What is the EU position?

What you often have is one organisation collecting data and another using it – often for an entirely different purpose. The GDPR seeks to protect individuals from invisible processing.  Headline point: it is very hard to prove that invisible scraping is fair and transparent.

The GDPR applies to processing of personal data processed in the context of an organisation's EU operations / establishment. It will sometimes apply to businesses entirely outside of Europe – i.e. to the extent that they offer goods or services, or monitor the behaviour of individuals located within the EU. Key considerations include:

  • Legitimate interests: Organisations need a legal basis to process the relevant data. Many organisations will seek to rely on what is termed the "legitimate interests" basis for processing, since it will be difficult for a scraper to demonstrate that it has obtained an individual's consent. Legitimate interests is not a panacea. It entails balancing the interests of the individual against those of the business, taking into account the reasonable expectations of the individual.   
  • Fairness, proportionality and necessity: Processing of personal data must also be limited to that which is fair, proportionate and necessary. Generally, data scraping software bulk gathers data, so businesses intending to scrape data must give careful thought as to the amount of data which will be collected.
  • Purpose limitation: The purpose of the scraping is therefore critical to identifying the relevant legal basis under GDPR. For example, if a business scrapes data to compile a marketing list which is sold to third parties, it is unlikely that the individual would reasonably expect their personal data to be used in such a manner.

A key question will therefore be to consider whether the nature of the proposed scraping activity is permissible under the GDPR at all. This will not necessarily be clear cut – careful analysis is required.

Finally, it is worth noting that the UK has specific rules criminalising unauthorised access of computer systems / material under the Computer Misuse Act 1990. On 12 November 2018, the UK Information Commissioner's Office ("ICO") announced that it had brought its first prosecution under the Act, securing a conviction and a six-month jail sentence against an individual for obtaining and disclosing personal data without permission. However, the courts are yet to rule whether data scraping meets the requirements of the offence.

The US perspective

The US does not currently have comprehensive data privacy legislation at the federal level, although Congress is actively considering such a measure. State statutes do mandate certain privacy-related rights, but most do not broadly regulate the collection and use of personal data.  However, a trend toward more extensive regulation that may implicate scraping activities is developing.

For example, California recently passed a state law, the California Consumer Privacy Act, which regulates data privacy. The law, which comes into effect in 2020 and applies only to certain businesses, contains provisions that would require companies collecting personal data to disclose how such data will be used and allow consumers to opt out of data collection. As data scrapers generally do not make such disclosures or provide for an opt-out option, scraping may implicate California's new statute.

At the federal level, there is a risk of regulatory action when companies who use scraped data misrepresent the origin of that data. In 2014, the Federal Trade Commission, which regulates unfair and deceptive trade practices in the US, commenced an enforcement action in which it alleged that a social media company had engaged in deceptive practices by representing that data contained on its website was user generated, when in fact the vast majority of data on its website had been scraped from Facebook.

Companies whose data is scraped

Depending on the jurisdiction, companies who are victims of unscrupulous scraping may be able to avail themselves of the remedies provided by contract, tort, intellectual property law or the CFAA, as referred to above.   

However, organisations whose data is scraped should bear in mind that they could be subject to enforcement action by the regulators for failing to take sufficiently appropriate measures to protect the information they process (e.g. via their websites). For example, from a GDPR perspective, a regulator may take the view that a scraped business failed to take appropriate technical and organisational measures to protect the relevant personal data from unauthorised access, use or disclosure (e.g. if the scraped business did not put in place appropriate website protections given the nature of the processing and state of the art technology available in the circumstances).

In the UK, financial institutions that are victims of scraping may be at dual risk of enforcement action. While the ICO will examine potential breaches of data protection law, the FCA may also investigate whether the institution had adequate systems and controls in place. In the US, the SEC and federal banking regulators also require financial institutions and registered entities to maintain reasonable safeguards with respect to customer data.

Companies with data scraping in their supply chain

Where an organisation has data scraping in its supply chain, it should carefully consider the position under the GDPR. There are signs that the ICO, for example, is likely to take a keen interest in data supply chain due diligence.

In November 2018, the ICO issued a report to Parliament on the use of data analytics in political campaigns, highlighting the risk to businesses using third parties to compile marketing lists or undertake data analytics. The ICO was particularly concerned that political parties purchased marketing lists from data brokers and used third party data analytics companies without sufficient due diligence as to how the data had been gathered.

Although the ICO's findings were made in the context of political campaigns, it is likely that it would adopt the same approach in respect of businesses' commercial activities.

Reducing the risks – do's and don't's

For organisations whose activities are affected by data scraping, here are six suggestions to help minimise legal risk and brand damage arising from data scraping:

  1. Know the law: Various local laws already regulate whether (and how) organisations can use scraping technology to analyse data. The GDPR, for example, may need to be carefully considered to ascertain whether the proposed scraping activity is permitted by law at all.
  2. Update website terms and privacy notices: Organisations should carefully check whether their website terms restrict use and access to content uploaded to their sites – e.g. work product, or personal information. The enforceability of these terms will vary across jurisdictions, however, at the very least, robust contractual provisions help set expectation and intention. Privacy notices should be clear about any proposed data sharing and scraping activities.
  3. Know your supply chain: No delegation of accountability. Regulators are increasingly keen to double down on organisations and political parties purchasing marketing lists and information from data brokers. They expect organisations using third party analytics companies to ensure that they carry out sufficient checks and diligence their supply chain. The leaders will build practical checks to audit their suppliers' compliance with data rules.
  4. Technical safeguards: Website owners should check that they take appropriate measures to safeguard personal information they process, taking into account what technology is available in the market to protect against the risk, particularly to the extent that their processing is subject to EU data protection rules. 
  5. Soft regulation: Regulators globally are increasingly emphasising that complying with the law is not enough. Scraping may be legal – but is it ethical? Elizabeth Denham, the UK Information Commissioner, has called for an "ethical pause" when using data. Not asking the right questions creates up risk from a legal and reputational perspective. Asking questions about why and how you will use data scraping whilst minimising societal, consumer protection and privacy related issues is critical.
  6. Continuing audit: The GDPR requires organisations to mandatorily carry out a detailed data protection impact assessment where their processing of personal data poses a high risk to individuals' rights.