SciDataCon 2025

Name: SciDataCon 2025
Start: 2025-10-13T08:00:00+10:00
End: 2025-10-16T16:00:00+10:00
Location: Brisbane Convention & Exhibition Centre

13–16 Oct 2025

Brisbane Convention & Exhibition Centre

Australia/Brisbane timezone

SciDataCon Organisers

scidatacon@codata.org

Data-Driven Risk Identification in Supervision Reports of the Ministry of Health

13 Oct 2025, 15:47

11m

P1 (Brisbane Convention & Exhibition Centre)

P1

Brisbane Convention & Exhibition Centre

Presentation Data Science and Data Analysis Presentations Session 2: Data and Research & Data Science and Data Analysis

Daphne Raban (University of Haifa, CODATA Israel NC)

In response to inefficiencies in governmental regulation, such as excessive regulation, an overabundance of laws and procedures, lack of flexibility, and disregard for the costs, countries around the world began efforts to optimize regulation, in part by privatization. The trend toward privatization of social services necessitated substantial development of governmental supervision practices. Risk management in regulation emerged as an efficient approach to supervising public sector services, assisting regulators in deciding the extent of intervention necessary to prevent harm to the public interest and to ensure that service recipients are protected and safe. Today, risk management in supervision is a critical component of decision-making processes under conditions of uncertainty and is recognized as one of eleven principles of best practices in supervision and enforcement by the OECD.
This study explores the potential of artificial intelligence (AI) in identifying and categorizing risks from unstructured open text, using advanced natural language processing (NLP) architectures such as Dicta and HeBERT. The research aimed to develop a methodology for analyzing supervision reports from the healthcare sector, enabling risk detection and classification into predefined categories.
The study's results indicate high performance of the Dicta model in identifying and classifying risks from unstructured text, achieving an accuracy of 93.3%, a recall of 85.9%, and an F1 score of 92.3%. In comparison, the HeBERT model yielded lower results across all metrics. In the multi-class classification task, Dicta also outperformed HeBERT, with an accuracy of 74.4% versus 65.1%, respectively. These differences were statistically significant (p < 0.05), underscoring the advantages of using Hebrew-adapted models, particularly those tailored to the healthcare domain.
The study highlights the critical role of semantic features and keywords in risk identification. It also addresses challenges associated with ambiguous sentences and overlapping categories, emphasizing the need for future research to develop multi-category classification algorithms. While Dicta showed superior performance in identifying key categories such as “Infrastructure, Equipment, and Logistics” and “Medical Services and Quality of Care,” HeBERT exhibited limitations in distinguishing mid-range categories, resulting in higher error rates.
The findings suggest practical applications for regulatory bodies, such as optimizing resource allocation, enhancing decision-making through data-driven insights, and improving transparency and service quality. Despite its promising results, the study acknowledges limitations, including the reliance on a single corpus of healthcare supervision reports and the constrained sample size. Future research should expand the corpus and explore AI techniques for less structured texts.
This research provides a foundational framework for applying AI to risk detection in healthcare and other domains, offering valuable insights for improving supervision, monitoring, and service delivery.

Dr Avital Zadok (University of Haifa)

Daphne Raban (University of Haifa, CODATA Israel NC)

There are no materials yet.

SciDataCon 2025

SciDataCon Organisers

Data-Driven Risk Identification in Supervision Reports of the Ministry of Health

P1

Brisbane Convention & Exhibition Centre

Speaker

Description

Primary author

Co-author

Presentation materials

Choose timezone

SciDataCon 2025

SciDataCon Organisers

Speaker

Description

Primary author

Co-author

Presentation materials