Research software is difficult to discover and compare because repositories often lack consistent metadata and scientific categories.
My PhD research examines which repository and publication attributes support classification, how community taxonomies align with scientific knowledge graphs, and which machine-learning methods produce interpretable results.
An earlier implementation reached 92% F1 during training and 76% on the test set, compared with a 36% prior baseline. The method was integrated into SOMEF, an open-source framework for extracting software metadata.