Conference paper · October 26, 2025

Attributes, Taxonomies and Semantic alignment for Automated Research Software Classification

Research software (RS) plays a critical role in computational science, yet remains poorly categorized and difficult to discover or reuse. This research explores RS classification by investigating how textual and metadata attributes can be leveraged to develop scalable, interpretable classification methodologies. Existing taxonomies are evaluated through alignment with scientific knowledge graphs to identify redundancies and structural gaps. Labeled datasets are constructed by linking publications to software repositories, and RS attributes, such as README files, abstracts, and source code features are benchmarked using multiple machine learning models and embedding strategies. A methodology that integrates semantic enrichment and transformer-based models is proposed for robust RS classification. Preliminary findings highlight the informativeness of publication abstracts for classification tasks and expose limitations in current community-defined taxonomies.

This paper evaluates textual and metadata attributes for classifying research software. It connects publications to repositories, compares models and embedding strategies, and examines how existing software taxonomies align with scientific knowledge graphs.

The findings identify publication abstracts as a useful classification signal and highlight limitations in current community taxonomies.

Citation

Ciuciu-Kiss, J. T. (2025). Attributes, Taxonomies and Semantic Alignment for Automated Research Software Classification. ISWC 2025 Companion Volume.

BibTeX
@inproceedings{ciuciukiss2025researchsoftware,
  title={Attributes, Taxonomies and Semantic Alignment for Automated Research Software Classification},
  author={Ciuciu-Kiss, Jenifer Tabita},
  booktitle={ISWC 2025 Companion Volume},
  year={2025}
}

Find citation updates on GitHub, Google Scholar, ORCID.