The APPF Central Office data team, in cooperation with the Adelaide and Canberra nodes, recently completed its contribution to the Australian Research Data Commons (ARDC) Data Retention Program.
The goal of the Data Retention Program was to help NCRIS facilities and research organisations to manage their data collections more effectively. It involved reviewing existing datasets and making sure they utilised consistent international metadata standards – making them more findable and accessible for future users.
The multi-year project began in March 2021 and was supported by the ARDC as part of its initative to build a sustainable investment model for data collections of national significance, and in doing so support leading edge research.
ARDC program manager Max Wilkinson says the project contributed to a more consistent and effective national research infrastructure.
“By encouraging the use of standardised protocols and metadata, the ARDC is helping the research community adopt common data management and storage practices that promote and enable FAIR (findable, accessible, interoperable and reusable) principles”, he says.
“This helps researchers recognise and leverage the value of data collections of national significance in future projects.”
Of course, the project will help APPF maximise the value of our accumulated data too. Maintaining universally FAIR datasets means researchers working with APPF can leverage existing data to avoid duplication, streamline their experiments, and make new discoveries more efficiently.
Having better oversight of our data collection also makes it easier to plan and implement data storage solutions that provide the levels of security and availability we require.
APPF data architect Rakesh David says the project was a good opportunity to review the legacy plant phenotyping datasets generated by the Adelaide and Canberra nodes over the past five to ten years.
“We identified over 200 primary plant phenotyping datasets that could be made publicly available,” he says.
“The next step was to collate all the necessary information so it could be packaged as individual plant phenotyping datasets which, in our case, usually involved camera images of plants taken in controlled environments or in natural ecosystems.”
The datasets were enriched with contextual information through the addition of 13 standardised metadata fields to help with assessment and governance of the data – including details of the primary researchers, a description of the project, location details for the resource and licensing information.
The final step was to integrate with the data repository solution used by each node’s host institution (the University of Adelaide’s Figshare for The Plant Accelerator® and ANU Data Commons for the ANU node) to publish the data and its associated metadata, and then assign a DOI to uniquely reference each dataset.
Completing APPF’s part of the Data Retention Program was a significant undertaking. Working in partnership with ARDC, the data team published:
- 13 standardised metadata fields to improve discoverability (including persistent identifiers like DOIs, ORCIDs and RORs to uniquely reference datasets, researchers and institutions).
- 223 separate plant phenotyping datasets in institutional and national repositories.
- 06 TB of plant phenotyping images.
The team is now in the process of applying FAIR principles as a consistent framework for publishing future datasets across the APPF network. This will help create federated data collections and establish our plant phenotyping datasets as a single discoverable resource for the research community.
APPF’s data team is also leading the ARDC Multi-scalar Crop Characterisation Network (MCCN) project, which will establish protocols for consistent and FAIR data storage across all Australian crop trial sites and research projects.
This project received investment from the Australian Research Data Commons (ARDC). The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).