Research software plays a critical role in modern science, yet finding software with similar functionality remains difficult. This work proposes a flexible methodology for automatically classifying research software using information extracted from repositories and associated publications.
The methodology enables the addition of new categories without retraining existing classifiers and was evaluated on community-curated datasets. The resulting classifier achieved an F1 score of 92% during cross-validation and 76% on an unseen test set, substantially outperforming previous approaches.
The research was integrated into existing software metadata extraction workflows, helping researchers discover and organize software projects based on their functionality and scientific domain.
Citation
Ciuciu-Kiss, J. T. (2022). A Methodology for Research Software Classification [Master's thesis, Universidad Politécnica de Madrid].
BibTeX
@mastersthesis{ciuciukiss2022methodology,
title={A Methodology for Research Software Classification},
author={Ciuciu-Kiss, Jenifer Tabita},
school={Universidad Politécnica de Madrid},
year={2022}
}
Find citation updates on GitHub, Google Scholar, ORCID.