Workshop paper · June 16, 2025

A study of the categories used in ‘Papers with Code’

An increasing number of machine learning developers share research software online to support their scientific investigations. In order to improve software findability, the scientific community has developed domain-specific taxonomies. However, are these taxonomies appropriate for software classification? This paper explores this question through a case study on Papers with Code, a popular platform where authors share their publications together with their software implementations. We define and apply a comparative framework with state-ofthe-art text similarity techniques (TF-IDF, Sentence-BERT, CLIP), and we assess the level of overlap between different software categories defined in the platform, based on the methods descriptions contained in them. Our results show significant category overlap, which may limit the effectiveness of classification algorithms. While community-defined categories provide a useful foundation, they may require refinement, such as subcategories or refined definitions, to better capture interdisciplinary methods and improve classification accuracy.

This study examines the structure of Papers with Code categories, their value for classifying research software, and the gaps that require semantic alignment.

Citation

Ciuciu-Kiss, J. T., & Garijo, D. (2025). A Study of the Categories Used in 'Papers with Code'. Natural Scientific Language Processing at ESWC.

BibTeX
@inproceedings{ciuciukiss2025paperswithcode,
  title={A Study of the Categories Used in 'Papers with Code'},
  author={Ciuciu-Kiss, Jenifer Tabita and Garijo, Daniel},
  booktitle={Natural Scientific Language Processing at ESWC},
  year={2025}
}

Find citation updates on GitHub, Google Scholar, ORCID.