Description
Graphs are a fundamental tool for representing complex relationships within a dataset, with applications ranging from epidemiology (e.g., contact networks) to neuroscience (e.g., brain connectomes). However, in many real-world scenarios, we need to go beyond analyzing a single graph to identify structural differences between distinct groups, such as "healthy" versus "non-healthy" cohorts. This topic focuses on developing a computational pipeline that uses Variable Importance Measures (VIM) to determine which network metrics (e.g., node centrality, clustering coefficients, or global efficiency) are the most effective at differentiating between groups of networks.
Tasks
Implement a pipeline to
- calculate a wide range of topological and spectral network metrics,
- train a classical machine learning model (e.g., Random Forest),
- apply VIM techniques to rank metrics (e.g., via Permutation Importance, Gini Importance),
Requirements
- Knowledge of Python and willingness to learn an appropriate graph analysis library
- Basic knowledge in classical machine learning pipeline (e.g., feature extraction, classification, evaluation)
Environment
The deliverable will be a standalone Python-based analysis pipeline