Ten years into the open data era, we have achieved something of genuine significance: we have won the argument.
In 2016, “open academic data” still required evangelism. The hardest part was persuading colleagues that data sharing mattered at all. Today, that conversation has fundamentally shifted. The majority of researchers now accept that sharing data is valuable, and this matters enormously, for it means the debate has moved from ideology to operations. The challenge is no longer one of belief. It is one of incentives, infrastructure, and quality.
The last decade of open data was principally concerned with volume, getting more datasets into the world, then the next decade must concern itself with value: ensuring that shared data is usable, trusted, discoverable, and credited in ways that genuinely alter researcher behaviour. We now have more shared data than at any point in history, yet we remain far short of having sufficient reusable data, for humans, and machines.
We prioritised sharing over reuse
The figures tell a story of both progress and stagnation. Awareness of FAIR principles has risen markedly over the past decade. Yet the persistent difficulty has scarcely shifted, with researchers continuing to report receiving inadequate credit for sharing data, and this gap is closing at a glacial pace. This matters because credit is the engine of the system. Without it, data sharing becomes either altruism or compliance. And compliance has a predictable failure mode: it optimises for the minimum required output.
This is how we arrive at what some have termed “data dumping grounds” – datasets that technically satisfy a mandate but are poorly described, difficult to interpret, and effectively inert for anyone hoping to reuse them. If we measure success primarily by counting deposits, we ought not be surprised when we receive deposits optimised for counting.
More policy, less enthusiasm
Support for open data remains high, yet support for mandates has declined sharply in certain regions. The most plausible explanation is not that researchers have turned against openness, but that they have experienced the reality of implementation. Mandates without adequate time, funding, training, infrastructure, or recognition do not feel like progress. They feel like yet another unfunded administrative burden, one more task to complete after the actual research is done.
This is not an argument against mandates. It is an argument against mandates alone.
Mandates can create compliance. They rarely create quality by themselves.
FAIR is straightforward to understand and remarkably difficult to execute well.
We have spent years treating the gap between awareness and practice as an education problem: teach researchers what FAIR means and they will implement it. But the gap persists because it is fundamentally an engineering and workflow problem. A matter of tools, integration, staffing, and standards. Most researchers lack the time, and often the specialist knowledge, to produce machine-actionable metadata, select appropriate schemas, apply controlled vocabularies correctly, and anticipate downstream interoperability requirements. Nor should they be expected to do so unaided.
The AI opportunity to make “Good” the easy path
It is here that the next decade becomes genuinely interesting. In barely a year, researchers’ adoption of AI tools for data-related work has increased notably, particularly in the two areas where compliance is low and value most significant – data processing and metadata creation. This is not a gradual cultural shift; it is the pattern one observes when tools begin solving real problems within real workflows.
AI will not magically render data FAIR. But it can alter the economics of FAIR:
- It can draft metadata so that researchers begin from seventy percent rather than zero.
- It can identify missing fields, inconsistent units, broken formats, and common interoperability errors.
- It can recommend standards and vocabularies appropriate to discipline and repository requirements.
- It can reduce the box ticking afterthought approach, which currently undermines quality.
This represents the most significant shift available to us: moving FAIR from “best practice” to “path of least resistance.”
However, AI is only as useful as the standards it can target. It performs well with clear structure and shared rules; it is considerably less reliable amid ambiguity. The next decade of open data cannot therefore be premised on the notion that “AI will resolve matters.” It must instead be: AI combined with standards, stewardship, and incentives.
Value not volume: Redefining what we measure
If we wish the next ten years to differ meaningfully from the last, we must change what we reward and what we measure.
Volume metrics are seductive precisely because they are straightforward: number of datasets deposited, number of repositories, number of mandates, number of downloads.
Value metrics are more demanding, but they are what actually matter:
- Reuse: citations of datasets, documented downstream use, integration into subsequent studies
- Quality: completeness of metadata, adherence to community standards, interoperability assessments passed
- Time-to-share: how early data becomes available within the research lifecycle, not months after publication
- Trust: provenance, versioning, validation, and clear licensing
- Equity: whether infrastructure and support are genuinely available across regions and disciplines, not merely within well-resourced institutions
The open data movement will reach maturity when success is defined by impactful reuse, not merely successful deposit. If the first decade was characterised by advocacy, the second must be defined by operationalisation.
1. Make Credit Real
We already know that recognition constitutes a primary barrier, and it is not resolving itself.
Datasets must be treated as first-class research objects wherever it matters:
- Hiring and promotion criteria that explicitly acknowledge data contributions
- Funding applications that meaningfully evaluate dataset outputs and stewardship plans
- Consistent dataset citation norms, enforced across publishers and platforms
2. Build Workflows that Reduce Friction
When data deposition is integrated directly into researchers’ existing environments,submission systems, electronic laboratory notebooks, analysis platforms, behaviour changes. Reduce steps, reduce context-switching, reduce ambiguity. If sharing well is easy, it happens. If it is difficult, it becomes a mere checkbox.
3. Fund the Missing Layer: Data Stewardship and Training
Mandates create obligations. Stewardship creates quality. This requires sustained investment in:
- Data stewards and librarians
- Institutional support services
- Discipline-specific training that extends beyond “what FAIR stands for”
- Local infrastructure suited to local contexts, because one size emphatically does not fit all
4. Use AI to make FAIR easy
The objective should be:
- AI-assisted metadata creation with human review
- Automated validation checks integrated into repositories and workflows
- Clear provenance and versioning
- Automating metadata crosswalks and exposure to machines
2035: Open FAIR data is just research done well
By 2035, sharing well-documented, reusable data should cease to be a special achievement. It should be unremarkable. Standard. Expected. Not because researchers have suddenly become more virtuous, but because the system has finally aligned incentives, tooling, and support.
The first decade constructed the moral case for open data. The next decade must construct the practical reality. Academic research stands at yet another technology-driven inflection point. The institutions that embrace machine-first FAIR will find themselves having more impact for their research and researchers.
More reuse. More trust. More interoperability.
Value, not volume.
The State of Open Data report is published by Digital Science, Springer Nature and Figshare.
The post Value over Volume: The Next Ten Years of Open Data appeared first on Digital Science.
from Digital Science https://ift.tt/7uQ1k2N
No comments:
Post a Comment