Who is selling data about researchers and why?

Published on 01 February 2023 | Updated on 04 April 2024

Subscribe to Diplo's Blog

In the digital age, major academic publishers have been shifting business models from traditional publishing to data analytics and brokering. In spite of this, their customers – researchers – have not been sufficiently informed or don’t know about the data tracking practices.

How does data tracking work in academia?

Collecting the personal and behavioural data of researchers requires an ensemble of tracking tools, ranging from spyware and browser fingerprinting to tracking website visits via authentication systems.

The most frequently used publishing platforms are tracking who is reading their articles through JavaScript code from Google and/or Meta. JavaScript can access the Document Object Model (DOM) interface of the publisher’s website and record any information from the original page. In other words, Google/Meta knows what you were searching for. In addition, publisher platforms are bundled with hundreds of other third-party asset sources: simple trackers, audience tools like Neustar, AddThis, Adobe Audience Manager, Oracle Marketing Cloud, etc. For example, a person who opens a single Nature’s article is tracked by more than 70 instruments.

User tracking on academic publisher platforms

More invasive than ordinary cookie-based tracking is done through browser fingerprinting tools (such as Panopticlick) that collect biometric data (mouse movement, typing speed, etc.) from the user’s browser that are used to create unique profiles of researchers.

Tracking website visits via authentication systems is also done by Elsevier, a large publishing group that describes itself as a ‘leader in information and analytics’.

Elsevier ‘About’ page

Elsevier has recently installed ThreatMetrix on its ScienceDirect platform – one of the largest bibliographic databases of scientific and medical publications. ThreatMetrix by LexisNexis helps its customers ‘understand the digital DNA of online users’ and is claimed to be able to identify 4.5 billion devices.

Traces of ThreatMetrix on the ScienceDirect platform (Elephant in the Lab)

Elsevier and Springer Nature recently proposed installing spyware on university networks, stating there is a need to protect higher education from cybercriminals and websites like Sci-Hub. Additionally, the research information management system PURE enables Elsevier’s parent company, RELX, to gain information not just about published academic work, but about each phase of the research lifecycle: experiment planning, data acquisition, data analysis, etc.

Unfortunately, what we know about data tracking in academia is based on the individual analysis of the code delivered to browsers as publishers do not disclose the full scope of what they are tracking. It remains to be explained how exactly data brokering increases the already high profitability of publishers (up to 37,8% operating margin in 2021).

How are publishers using the collected data?

The new business model of publishers allows them to monetise and monopolise data about knowledge, as well as to expand their range of services. Data collected through research activities can be combined with other data from the internet and be used to create user profiles that are sold to advertisers, or, as in the case of Thomson Reuters and RELX, to the global security industry and law enforcement organisations. For example, it is still unclear whether a lawyer who consults specialised legal information platforms on immigration issues contributes to their clients being found.

Joschka Selinger: Data Tracking in Research: Academic Freedom at Risk?

RELX, with its insights into all the phases of the research process, has comprehensive, data-based information about research activities, which can make Elsevier and RELX indispensable for the governance of academic institutions and universities. Researchers are evaluated based on information stored in a publisher’s database, i.e. institutional decisions on promotions could be influenced by the algorithmic assessment of Elsevier’s resources. On the other hand, data tracking can pose a potential threat to individual scientists.

Interestingly, Elsevier is one of the contractors of EU’s Open Science Monitor for monitoring open science trends, although it once supported actions against open science.

Globally, publishers could take a geopolitical approach and potentially cut off entire countries from the flow of scientific information that they control. The German Research Foundation (DFG) recently acknowledged that ‘risks could potentially arise from the major publishers presenting a censored programme on the Chinese market’ and ‘personalized data being generated aboutwho uses and recommends the censored documents’.

What’s at risk?: Digital and privacy implications

Data tracking by third parties in the context of academic research does not conform to data protection legislation: the affected researchers are not sufficiently informed about data-tracking practices. Even if a researcher consents, for example, to third-party cookies on a publisher’s website, that consent is not legally valid in most cases (at least in the EU). Under the EU’s General Data Protection Regulation (GDPR), consent needs to be based on free decision. The monopoly of publishers on where and how a particular article is accessible (academic articles can be published only once) forces researchers to consent to tracking practices. As such, forced consent for data tracking violates their fundamental right to informational self-determination (the right to decide independently and freely what happens to their personal data, and when and how it can be used).

Ms Alexandra Elbakyan, founder of Sci-Hub, a popular and widely used pirate web service that enables access to nearly all scholarly literature through donated and/or leaked credentials, accused Elsevier of violating the right to science and culture under Article 27 of the Universal Declaration of Human Rights. Elsevier filed a lawsuit against Sci-Hub for violating US copyright law.

The shift towards data analytics in the commercial business model of academic publishers creates a risk that data about research content and trends will lie solely in the hands of private companies, including publicly and philanthropically funded institutions and stakeholders. Additionally, science would be under the external interference of algorithms and algorithmic optimisation that would select which research should be done and what researchers should publish. This all violates Article 13 of the EU Charter on Fundamental Rights (‘academic freedom shall be respected’) and Article 5 of the German Constitution (‘science, research and teaching shall be free’).

Lastly, unregulated or undetected data tracking can encroach competition law, as new publishers barely have a chance to enter the market.

What can and should researchers do?

As rights activist Mr Joschka Selinger stated: ‘Be aware. Request information about data. File data protection complaints if violations are detected. Request injunctive relief through the courts.’

Additionally, research institutions should secure a high level of data protection by default and minimise tracking by contract design. In a recent paper, the DFG emphasised that ‘scholars and academic institutions must become aware of the problem and clarify the legal, technical and ethical framework conditions of their information supply’. Without adopting a position on these practices, research institutions share responsibility for the violation of the right to informational self-determination.

So please, can we Stop Tracking Science? Sign the petition.

Ms Inga Patarčić is a research data manager at the Max Delbrück Center for Molecular Medicine in Berlin. She holds a PhD in Bioinformatics (Humboldt University of Berlin) and an MSc in Molecular Biology (University of Zagreb). As an active science communicator, she (co)organises events such as the Long Night of Sciences in Berlin and Znanost u Prolazu in Croatia. She has recently completed Diplo’s Introduction to Internet Governance online course.

Browse through our alumni blog posts at Diplo Alumni Blog and Diplo Wisdom Circle (DWC) blog posts.

The latest from Diplo and GIP

Tailor your subscription to your interests, from updates on the dynamic world of digital diplomacy to the latest trends in AI.

Subscribe to more Diplo and Geneva Internet Platform newsletters!

Subscribe now

Trending in Diplo Academy

Trending in Resources

Trending in Topics

Courses & Programmes

Faculty & Alumni

Publications

Research

Core concepts of diplomacy

Technology and diplomacy

Trending in Blogs

Diplo Events

DigWatch Events

Trending Projects

Contact us

Who is selling data about researchers and why?

Contents

Subscribe to Diplo's Blog

How does data tracking work in academia?

How are publishers using the collected data?

What’s at risk?: Digital and privacy implications

What can and should researchers do?

The latest from Diplo and GIP

Diplo: Effective and inclusive diplomacy

Diplo on Social

Trending in Diplo Academy

Trending in Resources

Trending in Topics

Courses & Programmes

Faculty & Alumni

Publications

Research

Core concepts of diplomacy

Technology and diplomacy

Trending in Blogs

Diplo Events

DigWatch Events

Trending Projects

Contact us

Social icons

Who is selling data about researchers and why?

Contents

Subscribe to Diplo's Blog

How does data tracking work in academia?

How are publishers using the collected data?

What’s at risk?: Digital and privacy implications

What can and should researchers do?

The latest from Diplo and GIP

Diplo: Effective and inclusive diplomacy

Diplo on Social