The SciDataCon 2025 Programme is now published.

13–16 Oct 2025
Brisbane Convention & Exhibition Centre
Australia/Brisbane timezone

Integrating Machine Learning Standards in Disseminating Machine Learning Research

14 Oct 2025, 16:44
11m
Brisbane Convention & Exhibition Centre

Brisbane Convention & Exhibition Centre

Merivale St, South Brisbane QLD 410

Speaker

Scott Edmunds (GigaScience Press, BGI Hong Kong)

Description

The increasing use of AI-based approaches such as machine learning (ML) across diverse scientific fields presents challenges for reproducibly disseminating and assessing research. As ML becomes increasingly integral to clinical applications, there is also a critical need for transparent reporting methods to ensure both comprehensibility and the reproducibility of pre-clinical research and clinical trials supporting them. To address this issue there are a growing number of standards, checklists and guidelines enabling more standardized reporting of ML research, but the proliferation and complexity of these make them challenging to use in the process of disseminating research. Particularly in assessment and peer review of scientific papers which has to date been an ad hoc process that has struggled to throw light on increasingly complicated computational supporting methods that are otherwise unintelligible to other researchers. Taking the publication process beyond these black boxes, GigaScience Press has experimented with integrating many of these ML-standards into their assessment and publication workflows to make the outputs more FAIR. But having a broad-scope has necessitated going beyond the many field specific and standards to look for more generalist and automated approaches. In this talk we will introduce and map the landscape of different field-specific (predominantly clinical) reporting guidelines alongside some new broader-scope generalist ML-standards and checklists that have been released. Outlining the rationale for our eventual adoption of the DOME recommendations for machine learning in biology. The DOME recommendations initially formulated by the ELIXIR-Machine Learning Focus Group, DOME being an acronym for: Data, Optimisation, Model, and Evaluation. Initially use of these guidelines has been to help screen which ML publications are suitable to send out to peer-review. Through further collaborating with the DOME-community, we have now integrated their DOME Data Stewardship Wizard (DOME-DSW) and DOME Registry tools into our peer-review and publication process. At the end of the review process process a DOME Registry persistent identifier is included in the manuscript to increase the visibility and discoverability of the annotations and different ML components (data, models, methods, etc.) making up the study to the research community. Carrying out these experiments we have found a more practical approach for assessing and sharing ML-research is “Trust and Transparency” rather than trying to test de facto reproducibility. Presented here these efforts provide a useful case study of approaches, workflows and strategies to more logically handle the peer review and dissemination of data intensive ML research. We emphasise the need for continued dialogue and collaboration among various ML communities to create unified, comprehensive standards, ultimately enhancing the credibility and impact of ML-based scientific research. With the hope of others testing this approach in outlets with different scopes and publication volumes to see if it remains practical and can become a wider standard for sharing of ML research in a rigorous, reproducible and FAIR manner.

Primary authors

Presentation materials

There are no materials yet.