There’s an online version of a document written in 1989 by Sir Tim Berners-Lee when he was trying to help researchers at the physics institute CERN keep track of their data while dealing with high researcher turnover.
In it, the authors state: “When two years is a typical length of stay, information is constantly being lost… The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found.”
If that still sounds like the typical research institute of today, it does suggest Sir Tim wasn’t entirely successful in solving this issue. Can it be that the problem of managing research data is harder than inventing the Internet – the ultimate outcome of the ‘hyperlinks’ solution he proposed?
FAIR enough?
If anything, the problem has become harder over time. Due to the rapid pace of change in information technology, most cancer researchers – indeed most scientists – are now data scientists in a way undreamt of 40 years ago.
Results and experimental details can be spread across handwritten notes, emails, presentations, spreadsheets, documents with file names ending in something like ‘FinalFINAL.doc’, R scripts and more. This data is often locked away in systems and formats that aren’t easily shared, can’t be accessed by others and are impossible to understand except for those who created them; and only then while they retain important linking details in their memories.
These challenges are at the heart of the concept of FAIR data. You’ll likely know it’s an acronym for data that follows these four foundational principles – Findability, Accessibility, Interoperability, and Reusability (…see what they did there?). By adhering to FAIR principles, researchers can increase data reproducibility, transparency, and collaboration in science, helping ensure that research data is managed efficiently and in compliance with best practices. This is very much in line with the commitments in the Concordat to Support Research Integrity.
FAIR’s fair
What is FAIR data?
The FAIR (Findable, Accessible, Interoperable, Reusable) data emphasize machine-actionability. The main objective of FAIR is to increase data reuse by researchers. The core concepts of the FAIR principles are based on good scientific practice and intuitively grounded.
Why do we need it?
To ensure fairness, inclusivity, and transparency in research, promoting better insights and avoiding bias.
FAIR was first outlined in a 2016 paper that identifies a problem very similar to that faced at CERN in 1989:
“We often need several weeks (or months) of specialist technical effort to gather the data (because) we do not pay our valuable digital objects the careful attention they deserve when we create and preserve them.”
The FAIR principles were quickly adopted by funders like CRUK and research institutions as a framework for ensuring data is well-managed. Many researchers – particularly those who specialise in managing extremely large data sets – are well-versed in FAIRification (the process of making data FAIR).
So why are we still seeing so many of the same issues with managing data?
All’s FAIR
While researchers are often positive about the concepts of FAIRness, there are barriers to its application. The first barrier is learning about FAIR. Only a third of respondents to ‘The State of Open Data 2022’ survey were familiar with FAIR, with another third saying they hadn’t heard of it at all. Part of the reason for this article is to raise awareness. It can be easy for those well-acquainted with FAIR to skip over the basics, or rush into deeper complexities, when explaining FAIR and the many new terms and concepts associated with it. Finding the right starting point is key to avoid being overwhelmed at the very start (see “What next with FAIR?” below on suggestions for beginners).
Another barrier is finding accessible ways to engage with FAIR principles when there are so many other demands on researchers’ time. If you’re just starting, don’t feel you have to read everything about FAIR data. Even a little knowledge can reveal something practical and applicable that can be useful to you right now. Taking steps to make your data more organised and well-annotated – perhaps using a template for recording your experiments that reminds you to record all the important details each time – is likely to benefit ‘future you’ when you access and reuse your own data, without having to rely on memory to find and understand everything.
For those more immersed in FAIRification, perfectionism can be a barrier. Some people start crafting plans to redesign whole systems, reformat workflows and retrain their teams in emerging best practice. While these are positive directions and make great long-term objectives, they also require lots of time and energy to fulfil. Breaking these down to more accessible elements with a mixture of short- and long-term goals – including sharing what you have learnt with others – can be helpful.