• Home
  • About
  • People
  • Publications
  • Software
  • Partners
  • Contact

Leiden Computational Network Science

Network science research group at Leiden University

  • Home
  • About
  • People
  • Publications
  • Software
  • Partners
  • Contact

Publications

2026

Boekhout, H. D.; Takes, F. W.

Fast maximal clique enumeration in weighted temporal networks Journal Article

In: Social Network Analysis and Mining, vol. 16, no. 1, pp. 10, 2026, ISSN: 1869-5469.

Abstract | Links | BibTeX

@article{boekhout_fast_2026,
title = {Fast maximal clique enumeration in weighted temporal networks},
author = {H. D. Boekhout and F. W. Takes},
url = {https://doi.org/10.1007/s13278-025-01539-3},
doi = {10.1007/s13278-025-01539-3},
issn = {1869-5469},
year = {2026},
date = {2026-01-01},
urldate = {2026-01-27},
journal = {Social Network Analysis and Mining},
volume = {16},
number = {1},
pages = {10},
abstract = {Cliques, groups of fully connected nodes in a network, are often used to study group dynamics of complex systems. In real-world settings, group-dynamics often have a temporal component. For example, conference attendees moving from one group conversation to another. Recently, maximal clique enumeration methods have been introduced that add temporal (and frequency) constraints, to account for such phenomena. These methods enumerate so-called $$(textbackslashdelta ,textbackslashgamma )$$-maximal cliques. In this work, we introduce an efficient $$(textbackslashdelta ,textbackslashgamma )$$-maximal clique enumeration algorithm, that extends $$textbackslashgamma$$from a frequency constraint to a more versatile weighting constraint. Additionally, we introduce a definition of $$(textbackslashdelta ,textbackslashgamma )$$-cliques, that resolves a problem of existing definitions in the temporal domain. Our approach, which was inspired by a state-of-the-art two-phase approach, introduces a more efficient initial (stretching) phase. Specifically, we reduce the time complexity of this phase to be linear with respect to the number of temporal edges. Furthermore, we introduce a new approach to the second (bulking) phase, which allows us to efficiently prune search tree branches. Consequently, in experiments we observe significant speed-ups, at times by several orders of magnitude, on various (large) real-world datasets. Our algorithm vastly outperforms the existing state-of-the-art methods for temporal networks, while also extending applicability to weighted networks.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Cliques, groups of fully connected nodes in a network, are often used to study group dynamics of complex systems. In real-world settings, group-dynamics often have a temporal component. For example, conference attendees moving from one group conversation to another. Recently, maximal clique enumeration methods have been introduced that add temporal (and frequency) constraints, to account for such phenomena. These methods enumerate so-called $$(textbackslashdelta ,textbackslashgamma )$$-maximal cliques. In this work, we introduce an efficient $$(textbackslashdelta ,textbackslashgamma )$$-maximal clique enumeration algorithm, that extends $$textbackslashgamma$$from a frequency constraint to a more versatile weighting constraint. Additionally, we introduce a definition of $$(textbackslashdelta ,textbackslashgamma )$$-cliques, that resolves a problem of existing definitions in the temporal domain. Our approach, which was inspired by a state-of-the-art two-phase approach, introduces a more efficient initial (stretching) phase. Specifically, we reduce the time complexity of this phase to be linear with respect to the number of temporal edges. Furthermore, we introduce a new approach to the second (bulking) phase, which allows us to efficiently prune search tree branches. Consequently, in experiments we observe significant speed-ups, at times by several orders of magnitude, on various (large) real-world datasets. Our algorithm vastly outperforms the existing state-of-the-art methods for temporal networks, while also extending applicability to weighted networks.

Close

  • https://doi.org/10.1007/s13278-025-01539-3
  • doi:10.1007/s13278-025-01539-3

Close

Stępień, S.; Janik, M.; Nurek, M.; Saxena, A.; Michalski, R.

Fairness in Opinion Dynamics Miscellaneous

2026, (arXiv:2601.03859 [cs]).

Abstract | Links | BibTeX

@misc{stepien_fairness_2026,
title = {Fairness in Opinion Dynamics},
author = {S. Stępień and M. Janik and M. Nurek and A. Saxena and R. Michalski},
url = {http://arxiv.org/abs/2601.03859},
doi = {10.48550/arXiv.2601.03859},
year = {2026},
date = {2026-01-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {Ways in which people's opinions change are, without a doubt, subject to a rich tapestry of differing influences. Factors that affect how one arrives at an opinion reflect how they have been shaped by their environment throughout their lives, education, material status, what belief systems are they subscribed to, and what socio-economic minorities are they a part of. This already complex system is further expanded by the ever-changing nature of one's social network. It is therefore no surprise that many models have a tendency to perform best for the majority of the population and discriminating those people who are members of various marginalized groups . This bias and the study of how to counter it are subject to a rapidly developing field of Fairness in Social Network Analysis (SNA). The focus of this work is to look into how a state-of-the-art model discriminates certain minority groups and whether it is possible to reliably predict for whom it will perform worse. Moreover, is such prediction possible based solely on one's demographic or topological features? To this end, the NetSense dataset, together with a state-of-the-art CoDiNG model for opinion prediction have been employed. Our work explores how three classifier models (Demography-Based, Topology-Based, and Hybrid) perform when assessing for whom this algorithm will provide inaccurate predictions. Finally, through a comprehensive analysis of these experimental results, we identify four key patterns of algorithmic bias. Our findings suggest that no single paradigm provides the best results and that there is a real need for context-aware strategies in fairness-oriented social network analysis. We conclude that a multi-faceted approach, incorporating both individual attributes and network structures, is essential for reducing algorithmic bias and promoting inclusive decision-making.},
note = {arXiv:2601.03859 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Ways in which people's opinions change are, without a doubt, subject to a rich tapestry of differing influences. Factors that affect how one arrives at an opinion reflect how they have been shaped by their environment throughout their lives, education, material status, what belief systems are they subscribed to, and what socio-economic minorities are they a part of. This already complex system is further expanded by the ever-changing nature of one's social network. It is therefore no surprise that many models have a tendency to perform best for the majority of the population and discriminating those people who are members of various marginalized groups . This bias and the study of how to counter it are subject to a rapidly developing field of Fairness in Social Network Analysis (SNA). The focus of this work is to look into how a state-of-the-art model discriminates certain minority groups and whether it is possible to reliably predict for whom it will perform worse. Moreover, is such prediction possible based solely on one's demographic or topological features? To this end, the NetSense dataset, together with a state-of-the-art CoDiNG model for opinion prediction have been employed. Our work explores how three classifier models (Demography-Based, Topology-Based, and Hybrid) perform when assessing for whom this algorithm will provide inaccurate predictions. Finally, through a comprehensive analysis of these experimental results, we identify four key patterns of algorithmic bias. Our findings suggest that no single paradigm provides the best results and that there is a real need for context-aware strategies in fairness-oriented social network analysis. We conclude that a multi-faceted approach, incorporating both individual attributes and network structures, is essential for reducing algorithmic bias and promoting inclusive decision-making.

Close

  • http://arxiv.org/abs/2601.03859
  • doi:10.48550/arXiv.2601.03859

Close

Macedo, M.; Saxena, A.

Gender biases in online communication: A case study of soccer Journal Article

In: Applied Intelligence, vol. 56, no. 1, pp. 33, 2026, ISSN: 0924-669X, 1573-7497.

Links | BibTeX

@article{macedo_gender_2026,
title = {Gender biases in online communication: A case study of soccer},
author = {M. Macedo and A. Saxena},
url = {https://link.springer.com/10.1007/s10489-025-06988-z},
doi = {10.1007/s10489-025-06988-z},
issn = {0924-669X, 1573-7497},
year = {2026},
date = {2026-01-01},
urldate = {2026-01-27},
journal = {Applied Intelligence},
volume = {56},
number = {1},
pages = {33},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

  • https://link.springer.com/10.1007/s10489-025-06988-z
  • doi:10.1007/s10489-025-06988-z

Close

Bosch, O.; Loo, M. P. J.

Statistical open source software for official statistics: State of play and future directions Journal Article

In: Statistical Journal of the IAOS, pp. 18747655251411424, 2026, ISSN: 1874-7655, (Publisher: SAGE Publications).

Abstract | Links | BibTeX

@article{ten_bosch_statistical_2026,
title = {Statistical open source software for official statistics: State of play and future directions},
author = {O. Bosch and M. P. J. Loo},
url = {https://doi.org/10.1177/18747655251411424},
doi = {10.1177/18747655251411424},
issn = {1874-7655},
year = {2026},
date = {2026-01-01},
urldate = {2026-01-01},
journal = {Statistical Journal of the IAOS},
pages = {18747655251411424},
abstract = {Statistical organizations worldwide are increasingly adopting open source technologies for producing official statistics. This shift is motivated by the potential of open source tools to increase transparency, improve efficiency, and enhance reproducibility. Moreover, young professionals in statistics and data science enter the labour market with strong skills in open source tools. The adoption of open source software signifies a change in how statistical organizations operate and collaborate.This paper provides an overview of the state of open source adoption in official statistics. It details the open source movement among statistical organizations, the experiences of Statistics Netherlands with open source adoption and the creation of R-packages implementing common statistical methods. It also describes the development and use of the “awesome list of official statistics software” and discusses a set of principles for open source in official statistics, derived from best practices across various organizations. These principles have been endorsed (June 2025) by the Conference of European Statisticians (CES).Furthermore, it explores future directions for maturing this community, including metrics for assessing maturity, such as true independence of software modules, support for uncertainty propagation, and privacy by design. Moreover it presents ideas on redesigning the statistical open source landscape.},
note = {Publisher: SAGE Publications},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Statistical organizations worldwide are increasingly adopting open source technologies for producing official statistics. This shift is motivated by the potential of open source tools to increase transparency, improve efficiency, and enhance reproducibility. Moreover, young professionals in statistics and data science enter the labour market with strong skills in open source tools. The adoption of open source software signifies a change in how statistical organizations operate and collaborate.This paper provides an overview of the state of open source adoption in official statistics. It details the open source movement among statistical organizations, the experiences of Statistics Netherlands with open source adoption and the creation of R-packages implementing common statistical methods. It also describes the development and use of the “awesome list of official statistics software” and discusses a set of principles for open source in official statistics, derived from best practices across various organizations. These principles have been endorsed (June 2025) by the Conference of European Statisticians (CES).Furthermore, it explores future directions for maturing this community, including metrics for assessing maturity, such as true independence of software modules, support for uncertainty propagation, and privacy by design. Moreover it presents ideas on redesigning the statistical open source landscape.

Close

  • https://doi.org/10.1177/18747655251411424
  • doi:10.1177/18747655251411424

Close

2025

Menyhért, M.; Bokányi, E.; Corten, R.; Heemskerk, E. M.; Kazmina, Y.; Takes, F. W.

Connectivity and community structure of online and register-based social networks Journal Article

In: EPJ Data Science, vol. 14, no. 1, pp. 8, 2025, ISSN: 2193-1127, (Publisher: Springer Berlin Heidelberg).

Abstract | Links | BibTeX

@article{menyhert_connectivity_2025,
title = {Connectivity and community structure of online and register-based social networks},
author = {M. Menyhért and E. Bokányi and R. Corten and E. M. Heemskerk and Y. Kazmina and F. W. Takes},
url = {https://epjds.epj.org/articles/epjdata/abs/2025/01/13688_2025_Article_522/13688_2025_Article_522.html},
doi = {10.1140/epjds/s13688-025-00522-4},
issn = {2193-1127},
year = {2025},
date = {2025-12-01},
urldate = {2026-01-27},
journal = {EPJ Data Science},
volume = {14},
number = {1},
pages = {8},
abstract = {The dominance of online social media data as a source for large-scale social network studies has recently been challenged by networks constructed from state-curated register data. In this paper focused on the cross-comparison of the network structures, we investigate the similarities and differences of the Dutch online social network (OSN) Hyves and a register-based social network (RSN) of the Netherlands. First and foremost, we find that node metrics and the connectivity of the two population-scale networks are similar, with more long-distance ties captured by the OSN, and with the OSN ties proving to be predictive of RSN ties. These results hold when correcting for population size and geographical distance, notwithstanding that these two factors appear to be the main drivers of connectivity. Second, we show using multiple algorithms that the community structure of the two networks is similar and that neither follows strict administrative geographical delineations (e.g., provinces). Instead, communities appear to either center around large metropolitan areas or, outside of the country’s most urbanized area, comprise large blocks of interdependent municipalities. Beyond population and distance-related patterns, communities also highlight the persistence of deeply rooted sociocultural communities such as the Dutch Bible belt. The findings presented in this work aid in interpreting results from future studies in which register-based social networks are used to obtain insights into the social network structure of an entire population.},
note = {Publisher: Springer Berlin Heidelberg},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

The dominance of online social media data as a source for large-scale social network studies has recently been challenged by networks constructed from state-curated register data. In this paper focused on the cross-comparison of the network structures, we investigate the similarities and differences of the Dutch online social network (OSN) Hyves and a register-based social network (RSN) of the Netherlands. First and foremost, we find that node metrics and the connectivity of the two population-scale networks are similar, with more long-distance ties captured by the OSN, and with the OSN ties proving to be predictive of RSN ties. These results hold when correcting for population size and geographical distance, notwithstanding that these two factors appear to be the main drivers of connectivity. Second, we show using multiple algorithms that the community structure of the two networks is similar and that neither follows strict administrative geographical delineations (e.g., provinces). Instead, communities appear to either center around large metropolitan areas or, outside of the country’s most urbanized area, comprise large blocks of interdependent municipalities. Beyond population and distance-related patterns, communities also highlight the persistence of deeply rooted sociocultural communities such as the Dutch Bible belt. The findings presented in this work aid in interpreting results from future studies in which register-based social networks are used to obtain insights into the social network structure of an entire population.

Close

  • https://epjds.epj.org/articles/epjdata/abs/2025/01/13688_2025_Article_522/13688_[...]
  • doi:10.1140/epjds/s13688-025-00522-4

Close

Pena, C. B.; O’Sullivan, D. J. P.; MacCarron, P.; Saxena, A.

Dynamics of temporal influence in polarised networks Journal Article

In: PLOS ONE, vol. 20, no. 12, pp. e0337753, 2025, ISSN: 1932-6203, (Publisher: Public Library of Science).

Abstract | Links | BibTeX

@article{pena_dynamics_2025,
title = {Dynamics of temporal influence in polarised networks},
author = {C. B. Pena and D. J. P. O’Sullivan and P. MacCarron and A. Saxena},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0337753},
doi = {10.1371/journal.pone.0337753},
issn = {1932-6203},
year = {2025},
date = {2025-12-01},
urldate = {2026-01-27},
journal = {PLOS ONE},
volume = {20},
number = {12},
pages = {e0337753},
abstract = {In social networks, it is often of interest to identify the most influential users who can successfully spread information to others. This is particularly important for marketing (e.g., targeting influencers for a marketing campaign) and to understand the dynamics of information diffusion (e.g., who is the most central user in the spreading of a certain type of information). However, different opinions often split the audience and make the network polarised, with fragmented structure. In polarised networks, information becomes siloed within communities in the network, and the most influential user within a network might not be the most influential across all communities. Additionally, influential users and their influence may change over time as users may change their opinion or choose to decrease or halt their engagement on the subject. In this work, we aim to study the temporal dynamics of users’ influence in fragmented social networks. We compare the stability of influence ranking using temporal centrality measures, while extending them to account for community structure across a number of network evolution behaviours. We show that we can successfully aggregate nodes into influence bands, and how to aggregate centrality scores to analyse the influence of communities over time. A modified version of the temporal independent cascade model and the temporal degree centrality perform the best in this setting, as they are able to reliably isolate nodes into their bands.},
note = {Publisher: Public Library of Science},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

In social networks, it is often of interest to identify the most influential users who can successfully spread information to others. This is particularly important for marketing (e.g., targeting influencers for a marketing campaign) and to understand the dynamics of information diffusion (e.g., who is the most central user in the spreading of a certain type of information). However, different opinions often split the audience and make the network polarised, with fragmented structure. In polarised networks, information becomes siloed within communities in the network, and the most influential user within a network might not be the most influential across all communities. Additionally, influential users and their influence may change over time as users may change their opinion or choose to decrease or halt their engagement on the subject. In this work, we aim to study the temporal dynamics of users’ influence in fragmented social networks. We compare the stability of influence ranking using temporal centrality measures, while extending them to account for community structure across a number of network evolution behaviours. We show that we can successfully aggregate nodes into influence bands, and how to aggregate centrality scores to analyse the influence of communities over time. A modified version of the temporal independent cascade model and the temporal degree centrality perform the best in this setting, as they are able to reliably isolate nodes into their bands.

Close

  • https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0337753
  • doi:10.1371/journal.pone.0337753

Close

Vink, E.; Takes, F. W.; Saxena, A.

Measuring group fairness in community detection Journal Article

In: PLOS ONE, vol. 20, no. 11, pp. e0336212, 2025, ISSN: 1932-6203, (Publisher: Public Library of Science).

Abstract | Links | BibTeX

@article{vink_measuring_2025,
title = {Measuring group fairness in community detection},
author = {E. Vink and F. W. Takes and A. Saxena},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0336212},
doi = {10.1371/journal.pone.0336212},
issn = {1932-6203},
year = {2025},
date = {2025-11-01},
urldate = {2026-01-27},
journal = {PLOS ONE},
volume = {20},
number = {11},
pages = {e0336212},
abstract = {Understanding community structures is crucial for analyzing networks, as nodes join communities that collectively shape large-scale networks. In real-world settings, the formation of communities is often impacted by several social factors, such as ethnicity, gender, wealth, or other attributes. These factors may introduce structural inequalities; for instance, real-world networks can have a few majority groups and many minority groups. Community detection algorithms, which identify communities based on network topology, may generate unfair outcomes if they fail to account for existing structural inequalities, particularly affecting underrepresented groups. In this work, we propose a set of novel group fairness metrics to assess the fairness of community detection methods. Additionally, we conduct a comparative evaluation of the most common community detection methods, analyzing the trade-off between performance and fairness. Experiments are performed on synthetic networks generated using LFR, ABCD, and HICH-BA benchmark models, as well as on real-world networks. Our results demonstrate that the fairness-performance trade-off varies widely across methods, with no single class of approaches consistently excelling in both aspects. We observe that Infomap and Significance methods are high-performing and fair with respect to different types of communities across most networks. The proposed metrics and findings provide valuable insights for designing fair and effective community detection algorithms.},
note = {Publisher: Public Library of Science},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Understanding community structures is crucial for analyzing networks, as nodes join communities that collectively shape large-scale networks. In real-world settings, the formation of communities is often impacted by several social factors, such as ethnicity, gender, wealth, or other attributes. These factors may introduce structural inequalities; for instance, real-world networks can have a few majority groups and many minority groups. Community detection algorithms, which identify communities based on network topology, may generate unfair outcomes if they fail to account for existing structural inequalities, particularly affecting underrepresented groups. In this work, we propose a set of novel group fairness metrics to assess the fairness of community detection methods. Additionally, we conduct a comparative evaluation of the most common community detection methods, analyzing the trade-off between performance and fairness. Experiments are performed on synthetic networks generated using LFR, ABCD, and HICH-BA benchmark models, as well as on real-world networks. Our results demonstrate that the fairness-performance trade-off varies widely across methods, with no single class of approaches consistently excelling in both aspects. We observe that Infomap and Significance methods are high-performing and fair with respect to different types of communities across most networks. The proposed metrics and findings provide valuable insights for designing fair and effective community detection algorithms.

Close

  • https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0336212
  • doi:10.1371/journal.pone.0336212

Close

Saxena, A.; Yadav, H. K.; Rutten, B.; Jha, S. S.

DQ4FairIM: Fairness-aware Influence Maximization using Deep Reinforcement Learning Miscellaneous

2025, (arXiv:2512.00545 [cs]).

Abstract | Links | BibTeX

@misc{saxena_dq4fairim_2025,
title = {DQ4FairIM: Fairness-aware Influence Maximization using Deep Reinforcement Learning},
author = {A. Saxena and H. K. Yadav and B. Rutten and S. S. Jha},
url = {http://arxiv.org/abs/2512.00545},
doi = {10.48550/arXiv.2512.00545},
year = {2025},
date = {2025-11-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {The Influence Maximization (IM) problem aims to select a set of seed nodes within a given budget to maximize the spread of influence in a social network. However, real-world social networks have several structural inequalities, such as dominant majority groups and underrepresented minority groups. If these inequalities are not considered while designing IM algorithms, the outcomes might be biased, disproportionately benefiting majority groups while marginalizing minorities. In this work, we address this gap by designing a fairness-aware IM method using Reinforcement Learning (RL) that ensures equitable influence outreach across all communities, regardless of protected attributes. Fairness is incorporated using a maximin fairness objective, which prioritizes improving the outreach of the least-influenced group, pushing the solution toward an equitable influence distribution. We propose a novel fairness-aware deep RL method, called DQ4FairIM, that maximizes the expected number of influenced nodes by learning an RL policy. The learnt policy ensures that minority groups formulate the IM problem as a Markov Decision Process (MDP) and use deep Q-learning, combined with the Structure2Vec network embedding, earning together with Structure2Vec network embedding to solve the MDP. We perform extensive experiments on synthetic benchmarks and real-world networks to compare our method with fairness-agnostic and fairness-aware baselines. The results show that our method achieves a higher level of fairness while maintaining a better fairness-performance trade-off than baselines. Additionally, our approach learns effective seeding policies that generalize across problem instances without retraining, such as varying the network size or the number of seed nodes.},
note = {arXiv:2512.00545 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

The Influence Maximization (IM) problem aims to select a set of seed nodes within a given budget to maximize the spread of influence in a social network. However, real-world social networks have several structural inequalities, such as dominant majority groups and underrepresented minority groups. If these inequalities are not considered while designing IM algorithms, the outcomes might be biased, disproportionately benefiting majority groups while marginalizing minorities. In this work, we address this gap by designing a fairness-aware IM method using Reinforcement Learning (RL) that ensures equitable influence outreach across all communities, regardless of protected attributes. Fairness is incorporated using a maximin fairness objective, which prioritizes improving the outreach of the least-influenced group, pushing the solution toward an equitable influence distribution. We propose a novel fairness-aware deep RL method, called DQ4FairIM, that maximizes the expected number of influenced nodes by learning an RL policy. The learnt policy ensures that minority groups formulate the IM problem as a Markov Decision Process (MDP) and use deep Q-learning, combined with the Structure2Vec network embedding, earning together with Structure2Vec network embedding to solve the MDP. We perform extensive experiments on synthetic benchmarks and real-world networks to compare our method with fairness-agnostic and fairness-aware baselines. The results show that our method achieves a higher level of fairness while maintaining a better fairness-performance trade-off than baselines. Additionally, our approach learns effective seeding policies that generalize across problem instances without retraining, such as varying the network size or the number of seed nodes.

Close

  • http://arxiv.org/abs/2512.00545
  • doi:10.48550/arXiv.2512.00545

Close

Kumar, G.; Saxena, A.; Meena, C.

Perplexity-Homophily Index: Homophily through Diversity in Hypergraphs Miscellaneous

2025, (arXiv:2511.19170 [cs]).

Abstract | Links | BibTeX

@misc{kumar_perplexity-homophily_2025,
title = {Perplexity-Homophily Index: Homophily through Diversity in Hypergraphs},
author = {G. Kumar and A. Saxena and C. Meena},
url = {http://arxiv.org/abs/2511.19170},
doi = {10.48550/arXiv.2511.19170},
year = {2025},
date = {2025-11-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {Real-world complex systems are often better modeled as hypergraphs, where edges represent group interactions involving multiple entities. Understanding and quantifying homophily (similarity-driven association) in such networks is essential for analyzing community formation and information flow. We propose a hyperedge-centric framework to quantify homophily in hypergraphs. Each interaction is represented as a hyperedge, and its interaction perplexity measures the effective number of distinct attributes it contains. Comparing this observed perplexity with a degree-preserving random baseline defines the diversity gap, which quantifies how diverse an interaction is than expected by chance. The global homophily score for a network, called Perplexity-Homophily Index, is computed by averaging the normalized diversity gap across all hyperedges. Experiments on synthetic and real-world datasets show that the proposed index captures the full distribution of homophily and reveals how homophilic and heterophilic tendencies vary with interaction size in hypergraphs.},
note = {arXiv:2511.19170 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Real-world complex systems are often better modeled as hypergraphs, where edges represent group interactions involving multiple entities. Understanding and quantifying homophily (similarity-driven association) in such networks is essential for analyzing community formation and information flow. We propose a hyperedge-centric framework to quantify homophily in hypergraphs. Each interaction is represented as a hyperedge, and its interaction perplexity measures the effective number of distinct attributes it contains. Comparing this observed perplexity with a degree-preserving random baseline defines the diversity gap, which quantifies how diverse an interaction is than expected by chance. The global homophily score for a network, called Perplexity-Homophily Index, is computed by averaging the normalized diversity gap across all hyperedges. Experiments on synthetic and real-world datasets show that the proposed index captures the full distribution of homophily and reveals how homophilic and heterophilic tendencies vary with interaction size in hypergraphs.

Close

  • http://arxiv.org/abs/2511.19170
  • doi:10.48550/arXiv.2511.19170

Close

Pisani, N.; Boekhout, H. D.; Heemskerk, E. M.; Takes, F. W.

China's rise as global scientific powerhouse: A trajectory of international collaboration and specialization in high-impact research Journal Article

In: Research Policy, vol. 54, no. 8, pp. 105288, 2025, ISSN: 0048-7333.

Abstract | Links | BibTeX

@article{pisani_chinas_2025,
title = {China's rise as global scientific powerhouse: A trajectory of international collaboration and specialization in high-impact research},
author = {N. Pisani and H. D. Boekhout and E. M. Heemskerk and F. W. Takes},
url = {https://www.sciencedirect.com/science/article/pii/S0048733325001179},
doi = {10.1016/j.respol.2025.105288},
issn = {0048-7333},
year = {2025},
date = {2025-10-01},
urldate = {2025-10-01},
journal = {Research Policy},
volume = {54},
number = {8},
pages = {105288},
abstract = {The recent and rapid ascent of China into today's scientific powerhouse is increasingly debated in circles of economic politics and policy making. Yet, we still know relatively little how such rise has materialized in terms of Chinese scientists' openness to international collaborations and relative focus on high-impact research, particularly in relation to the U.S. Leveraging a unique, curated database of over 25 million scientific publications from 2008 until 2020, we aim to fill this gap and empirically investigate: (1) the extent to which collaboration of China-based researchers with scientists from other countries has materialized; (2) how competition in producing high-impact research has evolved for China, especially vis-à-vis the U.S., in the global production of science; and (3) whether specialization in well-defined fields has characterized China's ascent in science and, if so, in which areas. Our findings show that China's rise as a leading player in global science has importantly built on opening its knowledge production to collaboration, both domestically and internationally. This has been paired with a remarkable focus on high-impact research. Recently, China has entirely closed the gap with the U.S. in terms of contribution to the global top 1 % high-impact scientific production, specializing in four key fields – engineering/electrical/electronic, materials science, physics, and chemistry. Our study sheds new light on the changing landscape of global scientific production and opens several avenues for future research.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

The recent and rapid ascent of China into today's scientific powerhouse is increasingly debated in circles of economic politics and policy making. Yet, we still know relatively little how such rise has materialized in terms of Chinese scientists' openness to international collaborations and relative focus on high-impact research, particularly in relation to the U.S. Leveraging a unique, curated database of over 25 million scientific publications from 2008 until 2020, we aim to fill this gap and empirically investigate: (1) the extent to which collaboration of China-based researchers with scientists from other countries has materialized; (2) how competition in producing high-impact research has evolved for China, especially vis-à-vis the U.S., in the global production of science; and (3) whether specialization in well-defined fields has characterized China's ascent in science and, if so, in which areas. Our findings show that China's rise as a leading player in global science has importantly built on opening its knowledge production to collaboration, both domestically and internationally. This has been paired with a remarkable focus on high-impact research. Recently, China has entirely closed the gap with the U.S. in terms of contribution to the global top 1 % high-impact scientific production, specializing in four key fields – engineering/electrical/electronic, materials science, physics, and chemistry. Our study sheds new light on the changing landscape of global scientific production and opens several avenues for future research.

Close

  • https://www.sciencedirect.com/science/article/pii/S0048733325001179
  • doi:10.1016/j.respol.2025.105288

Close

Fajardo, S.; Mohammadi, S.; Souza, J. G.; Ardila, C.; Baltar, A. Tapscott; Heidgen, S.; Hernández, M. I. Mayorga; de Oliveira, S. Mota; Montejo, F.; Moderato, M.; Peripato, V.; Puche, K.; Reina, C.; Vargas, J. C.; Takes, F. W.; Madella, M.

Ecological Legacies of Pre-Columbian Settlements Evident in Palm Clusters of Neotropical Mountain Forests Miscellaneous

2025, (arXiv:2507.06949 [cs]).

Abstract | Links | BibTeX

@misc{fajardo_ecological_2025,
title = {Ecological Legacies of Pre-Columbian Settlements Evident in Palm Clusters of Neotropical Mountain Forests},
author = {S. Fajardo and S. Mohammadi and J. G. Souza and C. Ardila and A. Tapscott Baltar and S. Heidgen and M. I. Mayorga Hernández and S. Mota de Oliveira and F. Montejo and M. Moderato and V. Peripato and K. Puche and C. Reina and J. C. Vargas and F. W. Takes and M. Madella},
url = {http://arxiv.org/abs/2507.06949},
doi = {10.48550/arXiv.2507.06949},
year = {2025},
date = {2025-09-01},
urldate = {2025-09-01},
publisher = {arXiv},
abstract = {Ancient populations markedly transformed Neotropical forests, yet the spatial extent of their ecological influence remains underexplored at high resolution. Here we present a deep learning and remote sensing based approach to estimate areas of pre-Columbian forest modification based on modern vegetation. We apply this method to high-resolution satellite imagery from the Sierra Nevada de Santa Marta, Colombia, as a demonstration of a scalable approach, to evaluate palm tree distributions in relation to archaeological infrastructure. Palms were significantly more abundant near archaeological sites with large infrastructure investment. The extent of the largest palm cluster indicates that ancient human-managed areas linked to major infrastructure sites may be up to two orders of magnitude bigger than indicated by current archaeological evidence alone. Our findings suggest that pre-Columbian populations influenced vegetation, fostering conditions conducive to palm proliferation, leaving a lasting ecological footprint. This may have lowered the logistical costs of establishing infrastructure-heavy settlements in less accessible locations.},
note = {arXiv:2507.06949 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Ancient populations markedly transformed Neotropical forests, yet the spatial extent of their ecological influence remains underexplored at high resolution. Here we present a deep learning and remote sensing based approach to estimate areas of pre-Columbian forest modification based on modern vegetation. We apply this method to high-resolution satellite imagery from the Sierra Nevada de Santa Marta, Colombia, as a demonstration of a scalable approach, to evaluate palm tree distributions in relation to archaeological infrastructure. Palms were significantly more abundant near archaeological sites with large infrastructure investment. The extent of the largest palm cluster indicates that ancient human-managed areas linked to major infrastructure sites may be up to two orders of magnitude bigger than indicated by current archaeological evidence alone. Our findings suggest that pre-Columbian populations influenced vegetation, fostering conditions conducive to palm proliferation, leaving a lasting ecological footprint. This may have lowered the logistical costs of establishing infrastructure-heavy settlements in less accessible locations.

Close

  • http://arxiv.org/abs/2507.06949
  • doi:10.48550/arXiv.2507.06949

Close

Saxena, A.; Kumar, G.; Meena, C.

Homophily in Complex Networks: Measures, Models, and Applications Miscellaneous

2025, (arXiv:2509.18289 [cs]).

Abstract | Links | BibTeX

@misc{saxena_homophily_2025,
title = {Homophily in Complex Networks: Measures, Models, and Applications},
author = {A. Saxena and G. Kumar and C. Meena},
url = {http://arxiv.org/abs/2509.18289},
doi = {10.48550/arXiv.2509.18289},
year = {2025},
date = {2025-09-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {Homophily, the tendency of individuals to connect with others who share similar attributes, is a defining feature of social networks. Understanding how groups interact, both within and across, is crucial for uncovering the dynamics of network evolution and the emergence of structural inequalities in these network. This tutorial offers a comprehensive overview of homophily, covering its various definitions, key properties, and the limitations of widely used metrics. Extending beyond traditional pairwise interactions, we will discuss homophily in higher-order network structures such as hypergraphs and simplicial complexes. We will further discuss network generating models capable of producing different types of homophilic networks with tunable levels of homophily and highlight their relevance in real-world contexts. The tutorial concludes with a discussion of open challenges, emerging directions, and opportunities for further research in this area.},
note = {arXiv:2509.18289 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Homophily, the tendency of individuals to connect with others who share similar attributes, is a defining feature of social networks. Understanding how groups interact, both within and across, is crucial for uncovering the dynamics of network evolution and the emergence of structural inequalities in these network. This tutorial offers a comprehensive overview of homophily, covering its various definitions, key properties, and the limitations of widely used metrics. Extending beyond traditional pairwise interactions, we will discuss homophily in higher-order network structures such as hypergraphs and simplicial complexes. We will further discuss network generating models capable of producing different types of homophilic networks with tunable levels of homophily and highlight their relevance in real-world contexts. The tutorial concludes with a discussion of open challenges, emerging directions, and opportunities for further research in this area.

Close

  • http://arxiv.org/abs/2509.18289
  • doi:10.48550/arXiv.2509.18289

Close

Kazmina, Y.; Heemskerk, E. M.; Kooij, E.; Bokányi, E.; Takes, F. W.

Can social capital remedy structural inequality? Economic mobility in a longitudinal population-scale social network Miscellaneous

2025, (arXiv:2508.05275 [physics]).

Abstract | Links | BibTeX

@misc{kazmina_can_2025,
title = {Can social capital remedy structural inequality? Economic mobility in a longitudinal population-scale social network},
author = {Y. Kazmina and E. M. Heemskerk and E. Kooij and E. Bokányi and F. W. Takes},
url = {http://arxiv.org/abs/2508.05275},
doi = {10.48550/arXiv.2508.05275},
year = {2025},
date = {2025-08-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {The promise of equal opportunity is a cornerstone of modern societies, yet upward economic mobility remains out of reach for many. Using a decade of population-scale social network data from the Netherlands, covering over a billion family, school, workplace, and neighborhood ties, we examine how structural inequality and social capital jointly shape economic trajectories. Parental background is a strong early predictor of economic outcomes, but its influence fades over time. In contrast, bridging social capital is what positively predicts long-term mobility, particularly for economically disadvantaged groups. Reducing the dimensionality of an individual's network composition, we identify two key dimensions: exposure to affluent contacts and socioeconomic diversity of one's network. These are sufficient to capture the core aspects of social capital that matter for economic mobility. Overall, our findings demonstrate that while inherited advantage shapes the starting point of economic trajectory, social capital can powerfully reshape it, especially for the poor.},
note = {arXiv:2508.05275 [physics]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

The promise of equal opportunity is a cornerstone of modern societies, yet upward economic mobility remains out of reach for many. Using a decade of population-scale social network data from the Netherlands, covering over a billion family, school, workplace, and neighborhood ties, we examine how structural inequality and social capital jointly shape economic trajectories. Parental background is a strong early predictor of economic outcomes, but its influence fades over time. In contrast, bridging social capital is what positively predicts long-term mobility, particularly for economically disadvantaged groups. Reducing the dimensionality of an individual's network composition, we identify two key dimensions: exposure to affluent contacts and socioeconomic diversity of one's network. These are sufficient to capture the core aspects of social capital that matter for economic mobility. Overall, our findings demonstrate that while inherited advantage shapes the starting point of economic trajectory, social capital can powerfully reshape it, especially for the poor.

Close

  • http://arxiv.org/abs/2508.05275
  • doi:10.48550/arXiv.2508.05275

Close

Bonello, S.; de Jong, R. G.; Bäck, T. H. W.; Takes, F. W.

Utility-aware Social Network Anonymization using Genetic Algorithms Proceedings Article

In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 775–778, Association for Computing Machinery, New York, NY, USA, 2025, ISBN: 9798400714641.

Abstract | Links | BibTeX

@inproceedings{bonello_utility-aware_2025,
title = {Utility-aware Social Network Anonymization using Genetic Algorithms},
author = {S. Bonello and R. G. de Jong and T. H. W. Bäck and F. W. Takes},
url = {https://dl.acm.org/doi/10.1145/3712255.3726678},
doi = {10.1145/3712255.3726678},
isbn = {9798400714641},
year = {2025},
date = {2025-08-01},
urldate = {2025-08-01},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
pages = {775–778},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
series = {GECCO '25 Companion},
abstract = {Social networks may contain privacy-sensitive information about individuals. The objective of the network anonymization problem is to alter a given social network dataset such that the number of anonymous nodes in the social graph is maximized. Here, a node is anonymous if it does not have a unique surrounding network structure. At the same time, the aim is to ensure data utility, i.e., preserve topological network properties and retain good performance on downstream network analysis tasks. We propose two versions of a genetic algorithm tailored to this problem: one generic GA and a uniqueness-aware GA (UGA). The latter aims to target edges more effectively during mutation by avoiding edges connected to already anonymous nodes. After hyperparameter tuning, we compare the proposed GAs against two existing baseline algorithms on several real-world network datasets. Results show that the proposed genetic algorithms manage to anonymize on average 14 times more nodes than the best baseline algorithm. Additionally, data utility experiments demonstrate how the UGA requires fewer edge deletions, and how our GAs and the baselines retain performance on downstream tasks equally well. Overall, our results suggest that genetic algorithms are a promising approach for finding solutions to the network anonymization problem.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Social networks may contain privacy-sensitive information about individuals. The objective of the network anonymization problem is to alter a given social network dataset such that the number of anonymous nodes in the social graph is maximized. Here, a node is anonymous if it does not have a unique surrounding network structure. At the same time, the aim is to ensure data utility, i.e., preserve topological network properties and retain good performance on downstream network analysis tasks. We propose two versions of a genetic algorithm tailored to this problem: one generic GA and a uniqueness-aware GA (UGA). The latter aims to target edges more effectively during mutation by avoiding edges connected to already anonymous nodes. After hyperparameter tuning, we compare the proposed GAs against two existing baseline algorithms on several real-world network datasets. Results show that the proposed genetic algorithms manage to anonymize on average 14 times more nodes than the best baseline algorithm. Additionally, data utility experiments demonstrate how the UGA requires fewer edge deletions, and how our GAs and the baselines retain performance on downstream tasks equally well. Overall, our results suggest that genetic algorithms are a promising approach for finding solutions to the network anonymization problem.

Close

  • https://dl.acm.org/doi/10.1145/3712255.3726678
  • doi:10.1145/3712255.3726678

Close

Ceria, A.; Takes, F. W.

The relevance of higher-order ties Journal Article

In: EPJ Data Science, vol. 14, no. 1, pp. 62, 2025, ISSN: 2193-1127.

Abstract | Links | BibTeX

@article{ceria_relevance_2025,
title = {The relevance of higher-order ties},
author = {A. Ceria and F. W. Takes},
url = {https://doi.org/10.1140/epjds/s13688-025-00577-3},
doi = {10.1140/epjds/s13688-025-00577-3},
issn = {2193-1127},
year = {2025},
date = {2025-08-01},
urldate = {2026-01-27},
journal = {EPJ Data Science},
volume = {14},
number = {1},
pages = {62},
abstract = {Higher-order networks effectively represent complex systems with group interactions. Existing methods usually overlook the relative contribution of group interactions (hyperedges) of different sizes to the overall network structure. Yet, this has many important applications, especially when the network has meaningful node labels. In this work, we propose a methodology to precisely measure the contribution of different orders to topological network properties. First, we propose the order contribution measure, which quantifies the contribution of hyperedges of different orders to the link weights (local scale), number of triangles (mesoscale) and size of the largest connected component (global scale) of the pairwise weighted network. Second, we propose the measure of order relevance, which gives insights in how hyperedges of different orders contribute to the considered network property. Most interestingly, it enables an assessment of whether this contribution is synergistic or redundant with respect to that of hyperedges of other orders. Third, to account for labels, we propose a metric of label group balance to assess how hyperedges of different orders connect label-induced groups of nodes. We applied these metrics to a large-scale board interlock network and scientific collaboration network, in which node labels correspond to geographical location of the nodes. Experiments including a comparison with randomized null models reveal how from the global level perspective, we observe synergistic contributions of orders in the board interlock network, whereas in the collaboration network orders contribute more redundantly. The findings shed new light on social scientific debates on the role of busy directors in global business networks and the connective effects of large author teams in scientific collaboration networks.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Higher-order networks effectively represent complex systems with group interactions. Existing methods usually overlook the relative contribution of group interactions (hyperedges) of different sizes to the overall network structure. Yet, this has many important applications, especially when the network has meaningful node labels. In this work, we propose a methodology to precisely measure the contribution of different orders to topological network properties. First, we propose the order contribution measure, which quantifies the contribution of hyperedges of different orders to the link weights (local scale), number of triangles (mesoscale) and size of the largest connected component (global scale) of the pairwise weighted network. Second, we propose the measure of order relevance, which gives insights in how hyperedges of different orders contribute to the considered network property. Most interestingly, it enables an assessment of whether this contribution is synergistic or redundant with respect to that of hyperedges of other orders. Third, to account for labels, we propose a metric of label group balance to assess how hyperedges of different orders connect label-induced groups of nodes. We applied these metrics to a large-scale board interlock network and scientific collaboration network, in which node labels correspond to geographical location of the nodes. Experiments including a comparison with randomized null models reveal how from the global level perspective, we observe synergistic contributions of orders in the board interlock network, whereas in the collaboration network orders contribute more redundantly. The findings shed new light on social scientific debates on the role of busy directors in global business networks and the connective effects of large author teams in scientific collaboration networks.

Close

  • https://doi.org/10.1140/epjds/s13688-025-00577-3
  • doi:10.1140/epjds/s13688-025-00577-3

Close

Boekhout, H. D.; Heemskerk, E. M.; Pisani, N.; Takes, F. W.

Freshness, Persistence and Success of Scientific Teams Miscellaneous

2025, (arXiv:2507.12255 [cs]).

Abstract | Links | BibTeX

@misc{boekhout_freshness_2025,
title = {Freshness, Persistence and Success of Scientific Teams},
author = {H. D. Boekhout and E. M. Heemskerk and N. Pisani and F. W. Takes},
url = {http://arxiv.org/abs/2507.12255},
doi = {10.48550/arXiv.2507.12255},
year = {2025},
date = {2025-07-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {Team science dominates scientific knowledge production, but what makes academic teams successful? Using temporal data on 25.2 million publications and 31.8 million authors, we propose a novel network-driven approach to identify and study the success of persistent teams. Challenging the idea that persistence alone drives success, we find that team freshness - new collaborations built on prior experience - is key to success. High impact research tends to emerge early in a team's lifespan. Analyzing complex team overlap, we find that teams open to new collaborative ties consistently produce better science. Specifically, team re-combinations that introduce new freshness impulses sustain success, while persistence impulses from experienced teams are linked to earlier impact. Together, freshness and persistence shape team success across collaboration stages.},
note = {arXiv:2507.12255 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Team science dominates scientific knowledge production, but what makes academic teams successful? Using temporal data on 25.2 million publications and 31.8 million authors, we propose a novel network-driven approach to identify and study the success of persistent teams. Challenging the idea that persistence alone drives success, we find that team freshness - new collaborations built on prior experience - is key to success. High impact research tends to emerge early in a team's lifespan. Analyzing complex team overlap, we find that teams open to new collaborative ties consistently produce better science. Specifically, team re-combinations that introduce new freshness impulses sustain success, while persistence impulses from experienced teams are linked to earlier impact. Together, freshness and persistence shape team success across collaboration stages.

Close

  • http://arxiv.org/abs/2507.12255
  • doi:10.48550/arXiv.2507.12255

Close

de Jong, R. G.; Loo, M. P. J.; Takes, F. W.

A systematic comparison of measures for publishing k-anonymous social network data Miscellaneous

2025, (arXiv:2407.02290 [cs]).

Abstract | Links | BibTeX

@misc{jong_systematic_2025,
title = {A systematic comparison of measures for publishing k-anonymous social network data},
author = {R. G. de Jong and M. P. J. Loo and F. W. Takes},
url = {http://arxiv.org/abs/2407.02290},
doi = {10.48550/arXiv.2407.02290},
year = {2025},
date = {2025-06-01},
urldate = {2025-06-01},
publisher = {arXiv},
abstract = {Sharing or publishing social network data while accounting for privacy of individuals is a difficult task due to the interconnectedness of nodes in networks. A key question in k-anonymity, a widely studied notion of privacy, is how to measure the anonymity of an individual, as this determines the attacker scenarios one protects against. In this paper, we systematically compare the most prominent anonymity measures from the literature in terms of the completeness and reach of the structural information they take into account. We present a theoretical characterization and a distance-parametrized strictness ordering of the existing measures for k-anonymity in networks. In addition, we conduct empirical experiments on a wide range of real-world network datasets with up to millions of edges. Our findings reveal that the choice of the measure significantly impacts the measured level of anonymity and hence the effectiveness of the corresponding attacker scenario, the privacy vs. utility trade-off, and computational cost. Surprisingly, we find that the anonymity measure representing the most effective attacker scenario considers a greater node vicinity yet utilizes only limited structural information and therewith minimal computational resources. Overall, the insights provided in this work offer researchers and practitioners practical guidance for selecting appropriate anonymity measures when sharing or publishing social network data under privacy constraints.},
note = {arXiv:2407.02290 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Sharing or publishing social network data while accounting for privacy of individuals is a difficult task due to the interconnectedness of nodes in networks. A key question in k-anonymity, a widely studied notion of privacy, is how to measure the anonymity of an individual, as this determines the attacker scenarios one protects against. In this paper, we systematically compare the most prominent anonymity measures from the literature in terms of the completeness and reach of the structural information they take into account. We present a theoretical characterization and a distance-parametrized strictness ordering of the existing measures for k-anonymity in networks. In addition, we conduct empirical experiments on a wide range of real-world network datasets with up to millions of edges. Our findings reveal that the choice of the measure significantly impacts the measured level of anonymity and hence the effectiveness of the corresponding attacker scenario, the privacy vs. utility trade-off, and computational cost. Surprisingly, we find that the anonymity measure representing the most effective attacker scenario considers a greater node vicinity yet utilizes only limited structural information and therewith minimal computational resources. Overall, the insights provided in this work offer researchers and practitioners practical guidance for selecting appropriate anonymity measures when sharing or publishing social network data under privacy constraints.

Close

  • http://arxiv.org/abs/2407.02290
  • doi:10.48550/arXiv.2407.02290

Close

Vink, E.; Takes, F. W.; Saxena, A.

Quantifying Group Fairness in Community Detection Miscellaneous

2025, (arXiv:2504.11059 [cs]).

Abstract | Links | BibTeX

@misc{vink_quantifying_2025,
title = {Quantifying Group Fairness in Community Detection},
author = {E. Vink and F. W. Takes and A. Saxena},
url = {http://arxiv.org/abs/2504.11059},
doi = {10.48550/arXiv.2504.11059},
year = {2025},
date = {2025-04-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {Understanding community structures is crucial for analyzing networks, as nodes join communities that collectively shape large-scale networks. In real-world settings, the formation of communities is often impacted by several social factors, such as ethnicity, gender, wealth, or other attributes. These factors may introduce structural inequalities; for instance, real-world networks can have a few majority groups and many minority groups. Community detection algorithms, which identify communities based on network topology, may generate unfair outcomes if they fail to account for existing structural inequalities, particularly affecting underrepresented groups. In this work, we propose a set of novel group fairness metrics to assess the fairness of community detection methods. Additionally, we conduct a comparative evaluation of the most common community detection methods, analyzing the trade-off between performance and fairness. Experiments are performed on synthetic networks generated using LFR, ABCD, and HICH-BA benchmark models, as well as on real-world networks. Our results demonstrate that the fairness-performance trade-off varies widely across methods, with no single class of approaches consistently excelling in both aspects. We observe that Infomap and Significance methods are high-performing and fair with respect to different types of communities across most networks. The proposed metrics and findings provide valuable insights for designing fair and effective community detection algorithms.},
note = {arXiv:2504.11059 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Understanding community structures is crucial for analyzing networks, as nodes join communities that collectively shape large-scale networks. In real-world settings, the formation of communities is often impacted by several social factors, such as ethnicity, gender, wealth, or other attributes. These factors may introduce structural inequalities; for instance, real-world networks can have a few majority groups and many minority groups. Community detection algorithms, which identify communities based on network topology, may generate unfair outcomes if they fail to account for existing structural inequalities, particularly affecting underrepresented groups. In this work, we propose a set of novel group fairness metrics to assess the fairness of community detection methods. Additionally, we conduct a comparative evaluation of the most common community detection methods, analyzing the trade-off between performance and fairness. Experiments are performed on synthetic networks generated using LFR, ABCD, and HICH-BA benchmark models, as well as on real-world networks. Our results demonstrate that the fairness-performance trade-off varies widely across methods, with no single class of approaches consistently excelling in both aspects. We observe that Infomap and Significance methods are high-performing and fair with respect to different types of communities across most networks. The proposed metrics and findings provide valuable insights for designing fair and effective community detection algorithms.

Close

  • http://arxiv.org/abs/2504.11059
  • doi:10.48550/arXiv.2504.11059

Close

Panchendrarajan, R.; Saxena, H.; Saxena, A.

Social Media and Academia: How Gender Influences Online Scholarly Discourse Miscellaneous

2025, (arXiv:2505.03773 [cs]).

Abstract | Links | BibTeX

@misc{panchendrarajan_social_2025,
title = {Social Media and Academia: How Gender Influences Online Scholarly Discourse},
author = {R. Panchendrarajan and H. Saxena and A. Saxena},
url = {http://arxiv.org/abs/2505.03773},
doi = {10.48550/arXiv.2505.03773},
year = {2025},
date = {2025-04-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {This study investigates gender-based differences in online communication patterns of academics, focusing on how male and female academics represent themselves and how users interact with them on the social media platform X (formerly Twitter). We collect historical Twitter data of academics in computer science at the top 20 USA universities and analyze their tweets, retweets, and replies to uncover systematic patterns such as discussed topics, engagement disparities, and the prevalence of negative language or harassment. The findings indicate that while both genders discuss similar topics, men tend to post more tweets about AI innovation, current USA society, machine learning, and personal perspectives, whereas women post slightly more on engaging AI events and workshops. Women express stronger positive and negative sentiments about various events compared to men. However, the average emotional expression remains consistent across genders, with certain emotions being more strongly associated with specific topics. Writing-style analysis reveals that female academics show more empathy and are more likely to discuss personal problems and experiences, with no notable differences in other factors, such as self-praise, politeness, and stereotypical comments. Analyzing audience responses indicates that female academics are more frequently subjected to severe toxic and threatening replies. Our findings highlight the impact of gender in shaping the online communication of academics and emphasize the need for a more inclusive environment for scholarly engagement.},
note = {arXiv:2505.03773 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

This study investigates gender-based differences in online communication patterns of academics, focusing on how male and female academics represent themselves and how users interact with them on the social media platform X (formerly Twitter). We collect historical Twitter data of academics in computer science at the top 20 USA universities and analyze their tweets, retweets, and replies to uncover systematic patterns such as discussed topics, engagement disparities, and the prevalence of negative language or harassment. The findings indicate that while both genders discuss similar topics, men tend to post more tweets about AI innovation, current USA society, machine learning, and personal perspectives, whereas women post slightly more on engaging AI events and workshops. Women express stronger positive and negative sentiments about various events compared to men. However, the average emotional expression remains consistent across genders, with certain emotions being more strongly associated with specific topics. Writing-style analysis reveals that female academics show more empathy and are more likely to discuss personal problems and experiences, with no notable differences in other factors, such as self-praise, politeness, and stereotypical comments. Analyzing audience responses indicates that female academics are more frequently subjected to severe toxic and threatening replies. Our findings highlight the impact of gender in shaping the online communication of academics and emphasize the need for a more inclusive environment for scholarly engagement.

Close

  • http://arxiv.org/abs/2505.03773
  • doi:10.48550/arXiv.2505.03773

Close

Lengyel, B.; Bokányi, E.; Juhász, S.

The geography of segregated online social networks in the largest US cities Book Section

In: Handbook on Big Data, Artificial Intelligence and Cities, pp. 92–109, Edward Elgar Publishing, 2025, ISBN: 978-1-80392-805-0, (Section: Handbook on Big Data, Artificial Intelligence and Cities).

Abstract | Links | BibTeX

@incollection{lengyel_geography_2025,
title = {The geography of segregated online social networks in the largest US cities},
author = {B. Lengyel and E. Bokányi and S. Juhász},
url = {https://www.elgaronline.com/edcollchap/book/9781803928050/chapter6.xml},
isbn = {978-1-80392-805-0},
year = {2025},
date = {2025-04-01},
urldate = {2026-01-27},
booktitle = {Handbook on Big Data, Artificial Intelligence and Cities},
pages = {92–109},
publisher = {Edward Elgar Publishing},
abstract = {Cities are known for high levels of segregation that manifest in both online and offline environments. Far-reaching consequences of urban segregation include rising disparities, political polarization, and exposure to economic crises. In this chapter, we review the wide amount of literature on the emerging urban science discussion to provide an overview of modern data sources that can reveal the nature of segregation in cities. Our focus is on social interaction in urban spaces through mobility and social networks. Using geolocated Twitter messages in the 50 largest metropolitan areas in the US, we demonstrate how the data can quantify social networks around home and as a function of commuting to work. We argue that geotagged social media data enable us to better understand the spatial scale of segregation and the potential mitigation of inequalities through mixing and inclusion. The chapter presents the benefits and limitations of these data sources and discusses potential future research.},
note = {Section: Handbook on Big Data, Artificial Intelligence and Cities},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}

Close

Cities are known for high levels of segregation that manifest in both online and offline environments. Far-reaching consequences of urban segregation include rising disparities, political polarization, and exposure to economic crises. In this chapter, we review the wide amount of literature on the emerging urban science discussion to provide an overview of modern data sources that can reveal the nature of segregation in cities. Our focus is on social interaction in urban spaces through mobility and social networks. Using geolocated Twitter messages in the 50 largest metropolitan areas in the US, we demonstrate how the data can quantify social networks around home and as a function of commuting to work. We argue that geotagged social media data enable us to better understand the spatial scale of segregation and the potential mitigation of inequalities through mixing and inclusion. The chapter presents the benefits and limitations of these data sources and discusses potential future research.

Close

  • https://www.elgaronline.com/edcollchap/book/9781803928050/chapter6.xml

Close

Aiello, L. M.; Vybornova, A.; Juhász, S.; Szell, M.; Bokányi, E.

Urban highways are barriers to social ties Journal Article

In: Proceedings of the National Academy of Sciences, vol. 122, no. 10, pp. e2408937122, 2025, (Publisher: Proceedings of the National Academy of Sciences).

Abstract | Links | BibTeX

@article{aiello_urban_2025,
title = {Urban highways are barriers to social ties},
author = {L. M. Aiello and A. Vybornova and S. Juhász and M. Szell and E. Bokányi},
url = {https://www.pnas.org/doi/abs/10.1073/pnas.2408937122},
doi = {10.1073/pnas.2408937122},
year = {2025},
date = {2025-03-01},
urldate = {2026-01-27},
journal = {Proceedings of the National Academy of Sciences},
volume = {122},
number = {10},
pages = {e2408937122},
abstract = {Urban highways are common, especially in the United States, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here, we define a Barrier Score that relates massive, geolocated online social network data to highways in the 50 largest US cities. At the granularity of individual social ties, we show that urban highways are associated with decreased social connectivity. This barrier effect is especially strong for short distances and consistent with historical cases of highways that were built to purposefully disrupt or isolate Black neighborhoods. By combining spatial infrastructure with social tie data, our method adds a dimension to demographic studies of social segregation. Our study can inform reparative planning for an evidence-based reduction of spatial inequality, and more generally, support a better integration of the social fabric in urban planning.},
note = {Publisher: Proceedings of the National Academy of Sciences},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Urban highways are common, especially in the United States, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here, we define a Barrier Score that relates massive, geolocated online social network data to highways in the 50 largest US cities. At the granularity of individual social ties, we show that urban highways are associated with decreased social connectivity. This barrier effect is especially strong for short distances and consistent with historical cases of highways that were built to purposefully disrupt or isolate Black neighborhoods. By combining spatial infrastructure with social tie data, our method adds a dimension to demographic studies of social segregation. Our study can inform reparative planning for an evidence-based reduction of spatial inequality, and more generally, support a better integration of the social fabric in urban planning.

Close

  • https://www.pnas.org/doi/abs/10.1073/pnas.2408937122
  • doi:10.1073/pnas.2408937122

Close

Loo, M. P. J.

Split-Apply-Combine with Dynamic Grouping Journal Article

In: Journal of Statistical Software, vol. 112, pp. 1–21, 2025, ISSN: 1548-7660.

Abstract | Links | BibTeX

@article{loo_split-apply-combine_2025,
title = {Split-Apply-Combine with Dynamic Grouping},
author = {M. P. J. Loo},
url = {https://doi.org/10.18637/jss.v112.i04},
doi = {10.18637/jss.v112.i04},
issn = {1548-7660},
year = {2025},
date = {2025-03-01},
urldate = {2025-03-01},
journal = {Journal of Statistical Software},
volume = {112},
pages = {1–21},
abstract = {Partitioning a data set by one or more of its attributes and computing an aggregate for each part is one of the most common operations in data analyses. There are use cases where the partitioning is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the computed aggregate. These use cases are not supported by software implementing split-apply-combine types of operations. This paper presents the R package accumulate that offers convenient interfaces for defining grouped aggregation where the grouping itself is dynamically determined, based on user-defined conditions on subsets, and a user-defined subset collapsing scheme. The formal underlying algorithm is described and analyzed as well.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Partitioning a data set by one or more of its attributes and computing an aggregate for each part is one of the most common operations in data analyses. There are use cases where the partitioning is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the computed aggregate. These use cases are not supported by software implementing split-apply-combine types of operations. This paper presents the R package accumulate that offers convenient interfaces for defining grouped aggregation where the grouping itself is dynamically determined, based on user-defined conditions on subsets, and a user-defined subset collapsing scheme. The formal underlying algorithm is described and analyzed as well.

Close

  • https://doi.org/10.18637/jss.v112.i04
  • doi:10.18637/jss.v112.i04

Close

Candogan, O.; König, M. D.; Marray, K.; Takes, F. W.

Network Rewiring and Spatial Targeting: Optimal Disease Mitigation in Multilayer Social Networks Miscellaneous

2025.

Abstract | Links | BibTeX

@misc{candogan_network_2025,
title = {Network Rewiring and Spatial Targeting: Optimal Disease Mitigation in Multilayer Social Networks},
author = {O. Candogan and M. D. König and K. Marray and F. W. Takes},
url = {https://papers.ssrn.com/abstract=5106505},
doi = {10.2139/ssrn.5106505},
year = {2025},
date = {2025-01-01},
urldate = {2026-01-27},
publisher = {Social Science Research Network},
address = {Rochester, NY},
abstract = {We study disease spread on a social network where individuals adjust contacts to avoid infection. Susceptible individuals rewire links from infectious individuals to other susceptibles, reducing infections and causing the disease to only become endemic at higher infection rates. We formulate the planner’s problem of implementing targeted lockdowns to control endemic disease as a semidefinite program that is computationally tractable even with many groups. Rewiring complements policy by allowing more intergroup contact as the rewiring rate increases. We apply our model to compute optimal spatially-targeted lockdowns for the Netherlands during Covid-19 using a population-level contact network for 17.26 million individuals. Our findings indicate that, with rewiring, a targeted lockdown policy permits 12% more contacts compared to one without rewiring, underscoring the significance of accounting for network endogeneity in effective policy design.},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

We study disease spread on a social network where individuals adjust contacts to avoid infection. Susceptible individuals rewire links from infectious individuals to other susceptibles, reducing infections and causing the disease to only become endemic at higher infection rates. We formulate the planner’s problem of implementing targeted lockdowns to control endemic disease as a semidefinite program that is computationally tractable even with many groups. Rewiring complements policy by allowing more intergroup contact as the rewiring rate increases. We apply our model to compute optimal spatially-targeted lockdowns for the Netherlands during Covid-19 using a population-level contact network for 17.26 million individuals. Our findings indicate that, with rewiring, a targeted lockdown policy permits 12% more contacts compared to one without rewiring, underscoring the significance of accounting for network endogeneity in effective policy design.

Close

  • https://papers.ssrn.com/abstract=5106505
  • doi:10.2139/ssrn.5106505

Close

Vink, E.; Saxena, A.

Group Fairness Metrics for Community Detection Methods in Social Networks Proceedings Article

In: Cherifi, H.; Donduran, M.; Rocha, L. M.; Cherifi, C.; Varol, O. (Ed.): Complex Networks & Their Applications XIII, pp. 43–56, Springer Nature Switzerland, Cham, 2025, ISBN: 978-3-031-82435-7.

Abstract | Links | BibTeX

@inproceedings{de_vink_group_2025,
title = {Group Fairness Metrics for Community Detection Methods in Social Networks},
author = {E. Vink and A. Saxena},
editor = {H. Cherifi and M. Donduran and L. M. Rocha and C. Cherifi and O. Varol},
doi = {10.1007/978-3-031-82435-7_4},
isbn = {978-3-031-82435-7},
year = {2025},
date = {2025-01-01},
booktitle = {Complex Networks & Their Applications XIII},
pages = {43–56},
publisher = {Springer Nature Switzerland},
address = {Cham},
abstract = {Understanding community structure has played an essential role in explaining network evolution, as nodes join communities which connect further to form large-scale complex networks. In real-world networks, nodes are often organized into communities based on ethnicity, gender, race, or wealth, leading to structural biases and inequalities. Community detection (CD) methods use network structure and nodes’ attributes to identify communities, and can produce biased outcomes if they fail to account for structural inequalities, especially affecting minority groups. In this work, we propose group fairness metrics ($$textbackslashvarPhi ˆF*_p$$ΦpF∗) to evaluate CD methods from a fairness perspective. We also conduct a comparative analysis of existing CD methods, focusing on the performance-fairness trade-off, to determine whether certain methods favor specific types of communities based on their size, density, or conductance. Our findings reveal that the trade-off varies significantly across methods, with no specific type of method consistently outperforming others. The proposed metrics and insights will help develop and evaluate fair and high performing CD methods.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Understanding community structure has played an essential role in explaining network evolution, as nodes join communities which connect further to form large-scale complex networks. In real-world networks, nodes are often organized into communities based on ethnicity, gender, race, or wealth, leading to structural biases and inequalities. Community detection (CD) methods use network structure and nodes’ attributes to identify communities, and can produce biased outcomes if they fail to account for structural inequalities, especially affecting minority groups. In this work, we propose group fairness metrics ($$textbackslashvarPhi ˆF*_p$$ΦpF∗) to evaluate CD methods from a fairness perspective. We also conduct a comparative analysis of existing CD methods, focusing on the performance-fairness trade-off, to determine whether certain methods favor specific types of communities based on their size, density, or conductance. Our findings reveal that the trade-off varies significantly across methods, with no specific type of method consistently outperforming others. The proposed metrics and insights will help develop and evaluate fair and high performing CD methods.

Close

  • doi:10.1007/978-3-031-82435-7_4

Close

Bel, V.; Hank, K.; Leopold, T.; Bokányi, E.

A parallel kinship universe? A replication of Kolk et al. (2023) with Dutch register data on kinship networks Journal Article

In: Demographic Research, vol. 52, no. 28, pp. 915–938, 2025, (_eprint: https://www.demographic-research.org/volumes/vol52/28/52-28.pdf).

Abstract | Links | BibTeX

@article{de_bel_parallel_2025,
title = {A parallel kinship universe? A replication of Kolk et al. (2023) with Dutch register data on kinship networks},
author = {V. Bel and K. Hank and T. Leopold and E. Bokányi},
url = {https://www.demographic-research.org/volumes/vol52/28/},
doi = {10.4054/DemRes.2025.52.28},
year = {2025},
date = {2025-01-01},
journal = {Demographic Research},
volume = {52},
number = {28},
pages = {915–938},
abstract = {Background: Kolk et al. (2023) use Swedish register data to provide a detailed numerical account of biological kinship. Applying their approach in other countries is challenging due to high data requirements. Objective: We examine whether Kolk et al.’s (2023) findings generalize to another demographically advanced country, the Netherlands, and assess how differences in cohort fertility and divorce rates influence the prevalence of different kin types. Methods: We analyze kinship network data for the entire Dutch population in 2018, focusing on ties to grandchildren, children, nieces, nephews, siblings, cousins, parents, aunts, uncles, and grandparents. Results: First, we find strong similarities between Dutch and Swedish kinship structures, extending the picture drawn by Kolk et al. (2023) to another demographically advanced Western context. Second, we show how the Dutch baby boom has trickled down across generations, leading to larger numbers of aunts, uncles, and cousins. Third, we show how differences in other family-related behaviors – specifically divorce and separation – shape the composition of kinship networks and cross-national differences, evident in a substantially lower number of half-siblings in the Netherlands than in Sweden. Contribution: This replication underlines the benefits of empirically validating kinship statistics derived from microsimulations and aggregate demographic data.},
note = {_eprint: https://www.demographic-research.org/volumes/vol52/28/52-28.pdf},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Background: Kolk et al. (2023) use Swedish register data to provide a detailed numerical account of biological kinship. Applying their approach in other countries is challenging due to high data requirements. Objective: We examine whether Kolk et al.’s (2023) findings generalize to another demographically advanced country, the Netherlands, and assess how differences in cohort fertility and divorce rates influence the prevalence of different kin types. Methods: We analyze kinship network data for the entire Dutch population in 2018, focusing on ties to grandchildren, children, nieces, nephews, siblings, cousins, parents, aunts, uncles, and grandparents. Results: First, we find strong similarities between Dutch and Swedish kinship structures, extending the picture drawn by Kolk et al. (2023) to another demographically advanced Western context. Second, we show how the Dutch baby boom has trickled down across generations, leading to larger numbers of aunts, uncles, and cousins. Third, we show how differences in other family-related behaviors – specifically divorce and separation – shape the composition of kinship networks and cross-national differences, evident in a substantially lower number of half-siblings in the Netherlands than in Sweden. Contribution: This replication underlines the benefits of empirically validating kinship statistics derived from microsimulations and aggregate demographic data.

Close

  • https://www.demographic-research.org/volumes/vol52/28/
  • doi:10.4054/DemRes.2025.52.28

Close

2024

Fajardo, S.; Argüello, P.

Sociopolitical evolution, population clustering, and technology among early sedentary communities in northeastern Andes, Colombia Journal Article

In: Journal of Anthropological Archaeology, vol. 76, no. December, 2024.

Abstract | Links | BibTeX

@article{fajardo_sociopolitical_2024,
title = {Sociopolitical evolution, population clustering, and technology among early sedentary communities in northeastern Andes, Colombia},
author = {S. Fajardo and P. Argüello},
url = {https://www.sciencedirect.com/science/article/pii/S027841652400059X},
doi = {10.1016/j.jaa.2024.101628},
year = {2024},
date = {2024-12-01},
urldate = {2024-12-01},
journal = {Journal of Anthropological Archaeology},
volume = {76},
number = {December},
address = {Rochester, NY},
abstract = {Several prehistoric societies did not develop robust hierarchical systems even after centuries of population clustering and advancements in constructing structural earthworks and crafting materials like ceramics and alloys. What social dynamics characterized these non-state complex societies and how did they influence technological production? Here we analyze population clustering and hierarchical structures through two regional settlement studies in the northeastern Andes of Colombia. Employing both a traditional Inverse Distance Weighting interpolation (IDW) approach and an unsupervised machine learning method, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), we identify settlement clusters within the pre-Columbian sedentary settlement sequence. Analyzing rank-size distribution and A-coefficients based on identified clusters, we discern differences in hierarchical systems between the two regions. Results reveal that these early sedentary communities did not establish strong settlement hierarchies over centuries of clustering. Our findings suggest that the lack of robust hierarchical systems in Muisca societies may be attributed to slow and non-linear settlement clustering and limited site specialization. We compare this with evidence for technologies in the Muisca area, arguing that the emergence of strong and permanent settlement clustering is a threshold for early communities before developing information-storage technologies, such as standardized representations for counting or writing.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Several prehistoric societies did not develop robust hierarchical systems even after centuries of population clustering and advancements in constructing structural earthworks and crafting materials like ceramics and alloys. What social dynamics characterized these non-state complex societies and how did they influence technological production? Here we analyze population clustering and hierarchical structures through two regional settlement studies in the northeastern Andes of Colombia. Employing both a traditional Inverse Distance Weighting interpolation (IDW) approach and an unsupervised machine learning method, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), we identify settlement clusters within the pre-Columbian sedentary settlement sequence. Analyzing rank-size distribution and A-coefficients based on identified clusters, we discern differences in hierarchical systems between the two regions. Results reveal that these early sedentary communities did not establish strong settlement hierarchies over centuries of clustering. Our findings suggest that the lack of robust hierarchical systems in Muisca societies may be attributed to slow and non-linear settlement clustering and limited site specialization. We compare this with evidence for technologies in the Muisca area, arguing that the emergence of strong and permanent settlement clustering is a threshold for early communities before developing information-storage technologies, such as standardized representations for counting or writing.

Close

  • https://www.sciencedirect.com/science/article/pii/S027841652400059X
  • doi:10.1016/j.jaa.2024.101628

Close

Dieles, T. J.; Mattsson, C. E. S.; Takes, F. W.

Identifying successful football teams in the European player transfer network Journal Article

In: Applied Network Science, vol. 9, iss. 65, 2024.

Abstract | Links | BibTeX

@article{Dieles2024,
title = {Identifying successful football teams in the European player transfer network},
author = {T. J. Dieles and C. E. S. Mattsson and F. W. Takes},
doi = {https://doi.org/10.1007/s41109-024-00675-7},
year = {2024},
date = {2024-10-18},
journal = {Applied Network Science},
volume = {9},
issue = {65},
abstract = {This paper considers the European transfer market for professional football players as a network to study the relation between a team’s position in this network and performance in its domestic league. Our analysis is centered on eight top European leagues. The market in each season is represented as a weighted directed network capturing the transfers of players to or from the teams in these leagues, and we also consider the cumulative network over the past 28 years. We find that the overall structure of this transfer market network has properties commonly observed in real-world networks, such as a skewed degree distribution, high clustering, and small-world characteristics. To assess football teams we first construct a measure of within-league performance that is comparable across leagues. Regression analysis is used to relate league performance with both the network position and level of engagement of the team in the transfer market, under two complimentary setups. Network position variables include, e.g., betweenness centrality, closeness centrality and node clustering coefficient, whereas market engagement variables capture a team’s activity in the transfer market, e.g., total number of player transfers and total paid for players. For the season snapshots, the number of transfers correspond to weighted in- and out-degree. Our analysis first corroborates several recent findings relating aspects of market engagement with teams’ league performance. A higher number of incoming transfers indicates worse performance and better resourced teams perform better. Then, and across specifications, we find that network position variables remain salient even when engagement variables are already considered. This substantiates the notion in the existing literature that a high degree corresponds to better team performance and suggests that network aspects of trading strategy may affect a team’s success in their respective domestic league (or vice versa). In this sense, the approach and findings presented in this paper may in the future guide team’s player acquisition policies.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

This paper considers the European transfer market for professional football players as a network to study the relation between a team’s position in this network and performance in its domestic league. Our analysis is centered on eight top European leagues. The market in each season is represented as a weighted directed network capturing the transfers of players to or from the teams in these leagues, and we also consider the cumulative network over the past 28 years. We find that the overall structure of this transfer market network has properties commonly observed in real-world networks, such as a skewed degree distribution, high clustering, and small-world characteristics. To assess football teams we first construct a measure of within-league performance that is comparable across leagues. Regression analysis is used to relate league performance with both the network position and level of engagement of the team in the transfer market, under two complimentary setups. Network position variables include, e.g., betweenness centrality, closeness centrality and node clustering coefficient, whereas market engagement variables capture a team’s activity in the transfer market, e.g., total number of player transfers and total paid for players. For the season snapshots, the number of transfers correspond to weighted in- and out-degree. Our analysis first corroborates several recent findings relating aspects of market engagement with teams’ league performance. A higher number of incoming transfers indicates worse performance and better resourced teams perform better. Then, and across specifications, we find that network position variables remain salient even when engagement variables are already considered. This substantiates the notion in the existing literature that a high degree corresponds to better team performance and suggests that network aspects of trading strategy may affect a team’s success in their respective domestic league (or vice versa). In this sense, the approach and findings presented in this paper may in the future guide team’s player acquisition policies.

Close

  • doi:https://doi.org/10.1007/s41109-024-00675-7

Close

Groot, A.; Fletcher, G.; Manen, G.; Saxena, A.; Serebrenik, A.; Taylor, L. E. M.

A canon is a blunt force instrument: data science, canons, and generative frictions Book Section

In: Jarke, J.; Bates, J. (Ed.): Dialogues in Data Power Shifting Response-abilities in a Datafied World, Bristol University Press, 2024, ISBN: 1-5292-3830-7 978-1-5292-3830-3.

Abstract | BibTeX

@incollection{de_groot_canon_2024,
title = {A canon is a blunt force instrument: data science, canons, and generative frictions},
author = {A. Groot and G. Fletcher and G. Manen and A. Saxena and A. Serebrenik and L. E. M. Taylor},
editor = {J. Jarke and J. Bates},
isbn = {1-5292-3830-7 978-1-5292-3830-3},
year = {2024},
date = {2024-09-01},
urldate = {2024-09-01},
booktitle = {Dialogues in Data Power Shifting Response-abilities in a Datafied World},
publisher = {Bristol University Press},
abstract = {Spatially close, though worlds apart. The contributors to this commentary - ‘we’; ‘us’ - conduct research and teach on data and technology-related issues at three Dutch universities. Some of us work at the same departments, and teach in the same programmes. We bump into one another during our daily commutes, and replenish our energy levels with the help of the same coffee machines after our lectures. We talk, and sometimes even discuss our research with one another. But do we also understand each other? What would that even mean? When we talk about ‘data’, do we talk about the same thing? Is that even necessary? What does ‘science’ for each of us entail? What does this mean for the education we collectively provide? What is the direction - scientifically, ethically, politically - the bachelor programmes we are all involved in head toward? National science policy in the Netherlands, as well as at the level of universities themselves, tends to prioritise in various ways computer and computational sciences over the social sciences and humanities (Taylor et al., 2023). We feel that the oppositions that are produced and reinforced through such policies are both false and unproductive, and this collective uneasiness motivated some of us to initiate a conversation about what it would mean to think and work together. How do our academic lives ‘hang together’ (Mol, 2014) beyond our encounters near coffee machines in the hallways, and our names on the timetables the students would find when logging in to their university pages?When asking these and many other questions, we realised that we lacked the language, a common vocabulary, to not only answer the questions with which we started, but also ask them. Not only did many of key concepts used in our research and education - data, algorithm, ethics, ontology, law - mean and do different things for all of us, but concepts indispensable to some - e.g. justice -, would be nonexistent in the disciplinary universe of others.We therefore needed to take a step back and reflect on how to have a conversation without sharing a common language. Our provisional solution was to take what we dubbed as ‘canonical objects’ as the focal points in our discussions. We borrow the notion of the canon from literary criticism, where it is used to mean a body of literature that over time comes to be taught as defining a particular culture (Bloom, 1994). For this reason, the canon has also been the focus of decolonial critics, who argue that we should critically interrogate the hegemonic discourses of Western culture (Spivak, 1990).Based on this notion, we started to analyse concepts which each of us consider conceptually stable enough in our different disciplines that they might be taught on a bachelor’s-level course. We, in other words, took our disciplinary backgrounds and educational responsibilities as conversational starting points. Our roughly defined meta-question was how our disciplinary backgrounds produced different conceptions of the same terms, how these differences could be generative or problematic, and how our disciplines become invested in a particular interpretation?What we called canonical objects is also strongly related to how some of us used and understood the notion of boundary objects. A classic definition of boundary objects is that these “have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation.” (Star & Griesemer, 1989). Boundary objects thus allow different ‘social worlds’ to work together without requiring them to be able to (completely) understand one another. If our canonical objects would indeed function like boundary objects, we would have to find out and explicate in what way we would be working together, and how these concepts help us do that.As part of our exploration we also include answers from the generative large language model ChatGPT3.5. This LLM draws on internet content, and therefore offers a generalised and social version of the canon, replicating the most common tropes about our chosen objects of study available online.Furthermore, interesting both conceptually and practically, was and still is, our attempt to create some level of mutual understanding (Gadamer, 2014), potentially with the help of boundary objects whose functioning depends on a lack of mutual understanding. How does our attempt to foster understanding about how we hang together or not, change our collaborations? What does this attempt do to the canonical objects that we used as conversational lubricants? How, to put that differently, does discussion and explicating our disciplinary divisions, change our capacities to e.g. teach together? And subsequently, what are generative but also less and non-generative ways of disagreeing with one another?In this contribution we present the results of the conversation we have had so far about two canonical concepts: ‘AI’ and ‘trust’. Together we made a list of potential canonical concepts (see the Appendix) - so concepts that would be taught in a BSc/BA program/course - and from this list picked two of those with the most multifaceted disciplinary usage to discuss here. Each of us was asked to briefly explain how from their (disciplinary) point of view the concept was understood and taught in our undergraduate programmes. These brief reflections are accompanied by statements about our own positionality (Harding, 1989; Haraway, 1991) in which each of us situates him/herself in the academic tradition in which they were educated. We have included these because we presumed that academic disciplines (and what have been termed signature pedagogies, Poole, 2009) were and still are the key factors that influence the types of academic social worlds most of us live in. In the discussion we present some of the themes that emerged in our conversation, and that help to understand how our academic activities hang together - or not.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}

Close

Spatially close, though worlds apart. The contributors to this commentary - ‘we’; ‘us’ - conduct research and teach on data and technology-related issues at three Dutch universities. Some of us work at the same departments, and teach in the same programmes. We bump into one another during our daily commutes, and replenish our energy levels with the help of the same coffee machines after our lectures. We talk, and sometimes even discuss our research with one another. But do we also understand each other? What would that even mean? When we talk about ‘data’, do we talk about the same thing? Is that even necessary? What does ‘science’ for each of us entail? What does this mean for the education we collectively provide? What is the direction - scientifically, ethically, politically - the bachelor programmes we are all involved in head toward? National science policy in the Netherlands, as well as at the level of universities themselves, tends to prioritise in various ways computer and computational sciences over the social sciences and humanities (Taylor et al., 2023). We feel that the oppositions that are produced and reinforced through such policies are both false and unproductive, and this collective uneasiness motivated some of us to initiate a conversation about what it would mean to think and work together. How do our academic lives ‘hang together’ (Mol, 2014) beyond our encounters near coffee machines in the hallways, and our names on the timetables the students would find when logging in to their university pages?When asking these and many other questions, we realised that we lacked the language, a common vocabulary, to not only answer the questions with which we started, but also ask them. Not only did many of key concepts used in our research and education - data, algorithm, ethics, ontology, law - mean and do different things for all of us, but concepts indispensable to some - e.g. justice -, would be nonexistent in the disciplinary universe of others.We therefore needed to take a step back and reflect on how to have a conversation without sharing a common language. Our provisional solution was to take what we dubbed as ‘canonical objects’ as the focal points in our discussions. We borrow the notion of the canon from literary criticism, where it is used to mean a body of literature that over time comes to be taught as defining a particular culture (Bloom, 1994). For this reason, the canon has also been the focus of decolonial critics, who argue that we should critically interrogate the hegemonic discourses of Western culture (Spivak, 1990).Based on this notion, we started to analyse concepts which each of us consider conceptually stable enough in our different disciplines that they might be taught on a bachelor’s-level course. We, in other words, took our disciplinary backgrounds and educational responsibilities as conversational starting points. Our roughly defined meta-question was how our disciplinary backgrounds produced different conceptions of the same terms, how these differences could be generative or problematic, and how our disciplines become invested in a particular interpretation?What we called canonical objects is also strongly related to how some of us used and understood the notion of boundary objects. A classic definition of boundary objects is that these “have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation.” (Star & Griesemer, 1989). Boundary objects thus allow different ‘social worlds’ to work together without requiring them to be able to (completely) understand one another. If our canonical objects would indeed function like boundary objects, we would have to find out and explicate in what way we would be working together, and how these concepts help us do that.As part of our exploration we also include answers from the generative large language model ChatGPT3.5. This LLM draws on internet content, and therefore offers a generalised and social version of the canon, replicating the most common tropes about our chosen objects of study available online.Furthermore, interesting both conceptually and practically, was and still is, our attempt to create some level of mutual understanding (Gadamer, 2014), potentially with the help of boundary objects whose functioning depends on a lack of mutual understanding. How does our attempt to foster understanding about how we hang together or not, change our collaborations? What does this attempt do to the canonical objects that we used as conversational lubricants? How, to put that differently, does discussion and explicating our disciplinary divisions, change our capacities to e.g. teach together? And subsequently, what are generative but also less and non-generative ways of disagreeing with one another?In this contribution we present the results of the conversation we have had so far about two canonical concepts: ‘AI’ and ‘trust’. Together we made a list of potential canonical concepts (see the Appendix) - so concepts that would be taught in a BSc/BA program/course - and from this list picked two of those with the most multifaceted disciplinary usage to discuss here. Each of us was asked to briefly explain how from their (disciplinary) point of view the concept was understood and taught in our undergraduate programmes. These brief reflections are accompanied by statements about our own positionality (Harding, 1989; Haraway, 1991) in which each of us situates him/herself in the academic tradition in which they were educated. We have included these because we presumed that academic disciplines (and what have been termed signature pedagogies, Poole, 2009) were and still are the key factors that influence the types of academic social worlds most of us live in. In the discussion we present some of the themes that emerged in our conversation, and that help to understand how our academic activities hang together - or not.

Close

Kazmina, Y.; Heemskerk, E. M.; Bokányi, E.; Takes, F. W.

Socio-economic segregation in a population-scale social network Journal Article

In: Social Networks, vol. 78, pp. 279–291, 2024, ISSN: 0378-8733.

Abstract | Links | BibTeX

@article{kazmina_socio-economic_2024,
title = {Socio-economic segregation in a population-scale social network},
author = {Y. Kazmina and E. M. Heemskerk and E. Bokányi and F. W. Takes},
url = {https://www.sciencedirect.com/science/article/pii/S0378873324000157},
doi = {10.1016/j.socnet.2024.02.005},
issn = {0378-8733},
year = {2024},
date = {2024-07-01},
urldate = {2024-05-13},
journal = {Social Networks},
volume = {78},
pages = {279–291},
abstract = {We propose a social network-aware approach to study socio-economic segregation. The key question that we address is whether patterns of segregation are more pronounced in social networks than in the common spatial neighborhood-focused manifestations of segregation. We, therefore, conduct a population-scale social network analysis to study socio-economic segregation at a comprehensive and highly granular social network level. For this, we utilize social network data from Statistics Netherlands on 17.2 million registered residents of the Netherlands that are connected through around 1.3 billion ties distributed over five distinct tie types. We take income assortativity as a measure of socio-economic segregation, compare a social network and spatial neighborhood approach, and find that the social network structure exhibits two times as much segregation. As such, this work complements the spatial perspective on segregation in both literature and policymaking. While at a widely used unit of spatial aggregation (e.g., the geographical neighborhood), patterns of socio-economic segregation may appear relatively minimal, they may in fact persist in the underlying social network structure. Furthermore, we discover higher social network segregation in larger cities, shedding a different light on the common view of cities as hubs for diverse socio-economic mixing. A population-scale social network perspective hence offers a way to uncover hitherto “hidden” segregation that extends beyond spatial neighborhoods and infiltrates multiple aspects of human life.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

We propose a social network-aware approach to study socio-economic segregation. The key question that we address is whether patterns of segregation are more pronounced in social networks than in the common spatial neighborhood-focused manifestations of segregation. We, therefore, conduct a population-scale social network analysis to study socio-economic segregation at a comprehensive and highly granular social network level. For this, we utilize social network data from Statistics Netherlands on 17.2 million registered residents of the Netherlands that are connected through around 1.3 billion ties distributed over five distinct tie types. We take income assortativity as a measure of socio-economic segregation, compare a social network and spatial neighborhood approach, and find that the social network structure exhibits two times as much segregation. As such, this work complements the spatial perspective on segregation in both literature and policymaking. While at a widely used unit of spatial aggregation (e.g., the geographical neighborhood), patterns of socio-economic segregation may appear relatively minimal, they may in fact persist in the underlying social network structure. Furthermore, we discover higher social network segregation in larger cities, shedding a different light on the common view of cities as hubs for diverse socio-economic mixing. A population-scale social network perspective hence offers a way to uncover hitherto “hidden” segregation that extends beyond spatial neighborhoods and infiltrates multiple aspects of human life.

Close

  • https://www.sciencedirect.com/science/article/pii/S0378873324000157
  • doi:10.1016/j.socnet.2024.02.005

Close

Boekhout, H. D.; Blokland, A. A. J.; Takes, F. W.

Early warning signals for predicting cryptomarket vendor success using dark net forum networks Journal Article

In: Scientific Reports, vol. 14, no. 1, pp. 16336, 2024, ISSN: 2045-2322.

Abstract | Links | BibTeX

@article{boekhout_early_2024,
title = {Early warning signals for predicting cryptomarket vendor success using dark net forum networks},
author = {H. D. Boekhout and A. A. J. Blokland and F. W. Takes},
url = {https://doi.org/10.1038/s41598-024-67115-5},
doi = {10.1038/s41598-024-67115-5},
issn = {2045-2322},
year = {2024},
date = {2024-07-01},
urldate = {2024-07-01},
journal = {Scientific Reports},
volume = {14},
number = {1},
pages = {16336},
abstract = {In this work we focus on identifying key players in dark net cryptomarkets that facilitate online trade of illegal goods. Law enforcement aims to disrupt criminal activity conducted through these markets by targeting key players vital to the market’s existence and success. We particularly focus on detecting successful vendors responsible for the majority of illegal trade. Our methodology aims to uncover whether the task of key player identification should center around plainly measuring user and forum activity, or that it requires leveraging specific patterns of user communication. We focus on a large-scale dataset from the Evolution cryptomarket, which we model as an evolving communication network. Results indicate that user and forum activity, measured through topic engagement, is best able to identify successful vendors. Interestingly, considering users with higher betweenness centrality in the communication network further improves performance, also identifying successful vendors with moderate activity on the forum. But more importantly, analyzing the forum data over time, we find evidence that attaining a high betweenness score comes before vendor success. This suggests that the proposed network-driven approach of modelling user communication might prove useful as an early warning signal for key player identification.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

In this work we focus on identifying key players in dark net cryptomarkets that facilitate online trade of illegal goods. Law enforcement aims to disrupt criminal activity conducted through these markets by targeting key players vital to the market’s existence and success. We particularly focus on detecting successful vendors responsible for the majority of illegal trade. Our methodology aims to uncover whether the task of key player identification should center around plainly measuring user and forum activity, or that it requires leveraging specific patterns of user communication. We focus on a large-scale dataset from the Evolution cryptomarket, which we model as an evolving communication network. Results indicate that user and forum activity, measured through topic engagement, is best able to identify successful vendors. Interestingly, considering users with higher betweenness centrality in the communication network further improves performance, also identifying successful vendors with moderate activity on the forum. But more importantly, analyzing the forum data over time, we find evidence that attaining a high betweenness score comes before vendor success. This suggests that the proposed network-driven approach of modelling user communication might prove useful as an early warning signal for key player identification.

Close

  • https://doi.org/10.1038/s41598-024-67115-5
  • doi:10.1038/s41598-024-67115-5

Close

Kazmina, Y.; Heemskerk, E. M.; Bokányi, E.; Takes, F. W.

From Contact to Threat: A Social Network Perspective on Perceptions of Immigration Miscellaneous

2024, (arXiv:2407.06820 [physics]).

Abstract | Links | BibTeX

@misc{kazmina_contact_2024,
title = {From Contact to Threat: A Social Network Perspective on Perceptions of Immigration},
author = {Y. Kazmina and E. M. Heemskerk and E. Bokányi and F. W. Takes},
url = {http://arxiv.org/abs/2407.06820},
doi = {10.48550/arXiv.2407.06820},
year = {2024},
date = {2024-07-01},
urldate = {2026-01-27},
publisher = {arXiv},
abstract = {Our perceptions are shaped by the social networks we are embedded in. Despite the acknowledged influence of close contacts on how we perceive the world, the role of the broader social environment remains opaque. Here, we leverage a unique combination of population-scale social network and survey data on perceptions of immigration. We find that both direct contacts and a wider social network exposure to migrants matter. Notably, for natives, network exposure shows a shift from positive to negative association with perceptions of immigration beyond a certain exposure threshold. The multi-layer nature of our data highlights this tipping point for next-door neighbors, with private social contexts exhibiting a positive relationship between exposure and immigration perceptions. Furthermore, it shows that contacts spanning multiple contexts also strengthen this relationship. The provided insights on the interplay between network composition and attitudes toward immigration highlight generic patterns shaping public opinion on pressing societal issues.},
note = {arXiv:2407.06820 [physics]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Our perceptions are shaped by the social networks we are embedded in. Despite the acknowledged influence of close contacts on how we perceive the world, the role of the broader social environment remains opaque. Here, we leverage a unique combination of population-scale social network and survey data on perceptions of immigration. We find that both direct contacts and a wider social network exposure to migrants matter. Notably, for natives, network exposure shows a shift from positive to negative association with perceptions of immigration beyond a certain exposure threshold. The multi-layer nature of our data highlights this tipping point for next-door neighbors, with private social contexts exhibiting a positive relationship between exposure and immigration perceptions. Furthermore, it shows that contacts spanning multiple contexts also strengthen this relationship. The provided insights on the interplay between network composition and attitudes toward immigration highlight generic patterns shaping public opinion on pressing societal issues.

Close

  • http://arxiv.org/abs/2407.06820
  • doi:10.48550/arXiv.2407.06820

Close

Fajardo, S.; Zeekaf, J.; Andel, T.; Maombe, C.; Nyambe, T.; Mudenda, G.; Aleo, A.; Kayuni, M. N.; Langejans, G. H. J.

Traditional adhesive production systems in Zambia and their archaeological implications Journal Article

In: Journal of Anthropological Archaeology, vol. 74, pp. 101586, 2024, ISSN: 0278-4165.

Abstract | Links | BibTeX

@article{fajardo_traditional_2024,
title = {Traditional adhesive production systems in Zambia and their archaeological implications},
author = {S. Fajardo and J. Zeekaf and T. Andel and C. Maombe and T. Nyambe and G. Mudenda and A. Aleo and M. N. Kayuni and G. H. J. Langejans},
url = {https://www.sciencedirect.com/science/article/pii/S0278416524000175},
doi = {10.1016/j.jaa.2024.101586},
issn = {0278-4165},
year = {2024},
date = {2024-06-01},
urldate = {2024-05-13},
journal = {Journal of Anthropological Archaeology},
volume = {74},
pages = {101586},
abstract = {This study explores traditional adhesives using an ethnobiological approach within a multisocioecological context in Zambia. Through semi-structured interviews, videotaped demonstrations, and herbarium collections, we investigated the traditional adhesives people know and use, the flexibility of production processes, resource usage, and knowledge transmission in adhesive production. Our findings reveal flexibility in adhesive production systems. People use a wide range of organic and inorganic materials in their adhesive recipes. Recipes are flexible, demonstrating the ability to adapt to changes and substitute materials as needed to achieve the desired end product. Additionally, our study reveals a variety of redundant pathways for knowledge transmission typically confined within individual population groups. These include same-sex vertical transmission and distinct learning spaces and processes. Also, we identified material procurement zones showing that people are prepared to travel 70 km for ingredients. We use our findings to review the archaeology and we discuss the identification of archaeological adhesives, the functional roles of adhesive materials, adhesive storage, and the sustained human interaction with species from families such as Euphorbiaceae and Apiade. Our findings underscore the diversity and adaptability of traditional adhesive production and suggest that further research on adhesives would reveal similar diversity within the archaeological record.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

This study explores traditional adhesives using an ethnobiological approach within a multisocioecological context in Zambia. Through semi-structured interviews, videotaped demonstrations, and herbarium collections, we investigated the traditional adhesives people know and use, the flexibility of production processes, resource usage, and knowledge transmission in adhesive production. Our findings reveal flexibility in adhesive production systems. People use a wide range of organic and inorganic materials in their adhesive recipes. Recipes are flexible, demonstrating the ability to adapt to changes and substitute materials as needed to achieve the desired end product. Additionally, our study reveals a variety of redundant pathways for knowledge transmission typically confined within individual population groups. These include same-sex vertical transmission and distinct learning spaces and processes. Also, we identified material procurement zones showing that people are prepared to travel 70 km for ingredients. We use our findings to review the archaeology and we discuss the identification of archaeological adhesives, the functional roles of adhesive materials, adhesive storage, and the sustained human interaction with species from families such as Euphorbiaceae and Apiade. Our findings underscore the diversity and adaptability of traditional adhesive production and suggest that further research on adhesives would reveal similar diversity within the archaeological record.

Close

  • https://www.sciencedirect.com/science/article/pii/S0278416524000175
  • doi:10.1016/j.jaa.2024.101586

Close

Fajri, R. M.; Saxena, A.; Pei, Y.; Pechenizkiy, M.

FAL-CUR: Fair Active Learning using Uncertainty and Representativeness on Fair Clustering Journal Article

In: Expert Systems with Applications, vol. 242, pp. 122842, 2024, ISSN: 0957-4174.

Abstract | Links | BibTeX

@article{fajri_fal-cur_2024,
title = {FAL-CUR: Fair Active Learning using Uncertainty and Representativeness on Fair Clustering},
author = {R. M. Fajri and A. Saxena and Y. Pei and M. Pechenizkiy},
url = {https://www.sciencedirect.com/science/article/pii/S0957417423033444},
doi = {10.1016/j.eswa.2023.122842},
issn = {0957-4174},
year = {2024},
date = {2024-05-01},
urldate = {2024-05-13},
journal = {Expert Systems with Applications},
volume = {242},
pages = {122842},
abstract = {Active Learning (AL) techniques have proven to be highly effective in reducing data labeling costs across a range of machine learning tasks. Nevertheless, one known challenge of these methods is their potential to introduce unfairness towards sensitive attributes. Although recent approaches have focused on enhancing fairness in AL, they tend to reduce the model’s accuracy. To address this issue, we propose a novel strategy, named Fair Active Learning using fair Clustering, Uncertainty, and Representativeness (FAL-CUR), to improve fairness in AL. FAL-CUR tackles the fairness problem in AL by combining fair clustering with an acquisition function that determines which samples to query based on their uncertainty and representativeness scores. We evaluate the performance of FAL-CUR on four real-world datasets, and the results demonstrate that FAL-CUR achieves a 15%–20% improvement in fairness compared to the best state-of-the-art method in terms of equalized odds while maintaining stable accuracy scores. Furthermore, an ablation study highlights the crucial roles of fair clustering in preserving fairness and the acquisition function in stabilizing the accuracy performance.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Active Learning (AL) techniques have proven to be highly effective in reducing data labeling costs across a range of machine learning tasks. Nevertheless, one known challenge of these methods is their potential to introduce unfairness towards sensitive attributes. Although recent approaches have focused on enhancing fairness in AL, they tend to reduce the model’s accuracy. To address this issue, we propose a novel strategy, named Fair Active Learning using fair Clustering, Uncertainty, and Representativeness (FAL-CUR), to improve fairness in AL. FAL-CUR tackles the fairness problem in AL by combining fair clustering with an acquisition function that determines which samples to query based on their uncertainty and representativeness scores. We evaluate the performance of FAL-CUR on four real-world datasets, and the results demonstrate that FAL-CUR achieves a 15%–20% improvement in fairness compared to the best state-of-the-art method in terms of equalized odds while maintaining stable accuracy scores. Furthermore, an ablation study highlights the crucial roles of fair clustering in preserving fairness and the acquisition function in stabilizing the accuracy performance.

Close

  • https://www.sciencedirect.com/science/article/pii/S0957417423033444
  • doi:10.1016/j.eswa.2023.122842

Close

Fajardo, S.; Kozowyk, P. R. B.; Langejans, G. H. J.

Reply to: Problems with two recent Petri net analyses of Neanderthal adhesive technology Journal Article

In: Scientific Reports, vol. 14, no. 1, pp. 10489, 2024, ISSN: 2045-2322, (Publisher: Nature Publishing Group).

Links | BibTeX

@article{fajardo_reply_2024,
title = {Reply to: Problems with two recent Petri net analyses of Neanderthal adhesive technology},
author = {S. Fajardo and P. R. B. Kozowyk and G. H. J. Langejans},
url = {https://www.nature.com/articles/s41598-024-60674-7},
doi = {10.1038/s41598-024-60674-7},
issn = {2045-2322},
year = {2024},
date = {2024-05-01},
urldate = {2024-05-13},
journal = {Scientific Reports},
volume = {14},
number = {1},
pages = {10489},
note = {Publisher: Nature Publishing Group},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

  • https://www.nature.com/articles/s41598-024-60674-7
  • doi:10.1038/s41598-024-60674-7

Close

Saxena, A.; Fletcher, G.; Pechenizkiy, M.

FairSNA: Algorithmic Fairness in Social Network Analysis Journal Article

In: ACM Computing Surveys, vol. 56, no. 8, pp. 213:1–213:45, 2024, ISSN: 0360-0300.

Abstract | Links | BibTeX

@article{saxena_fairsna_2024,
title = {FairSNA: Algorithmic Fairness in Social Network Analysis},
author = {A. Saxena and G. Fletcher and M. Pechenizkiy},
url = {https://dl.acm.org/doi/10.1145/3653711},
doi = {10.1145/3653711},
issn = {0360-0300},
year = {2024},
date = {2024-04-01},
urldate = {2024-04-01},
journal = {ACM Computing Surveys},
volume = {56},
number = {8},
pages = {213:1–213:45},
abstract = {In recent years, designing fairness-aware methods has received much attention in various domains, including machine learning, natural language processing, and information retrieval. However, in social network analysis (SNA), designing fairness-aware methods for various research problems by considering structural bias and inequalities of large-scale social networks has not received much attention. In this work, we highlight how the structural bias of social networks impacts the fairness of different SNA methods. We further discuss fairness aspects that should be considered while proposing network structure-based solutions for different SNA problems, such as link prediction, influence maximization, centrality ranking, and community detection. This survey-cum-vision clearly highlights that very few works have considered fairness and bias while proposing solutions; even these works are mainly focused on some research topics, such as link prediction, influence maximization, and PageRank. However, fairness has not yet been addressed for other research topics, such as influence blocking and community detection. We review the state of the art for different research topics in SNA, including the considered fairness constraints, their limitations, and our vision. This survey also covers evaluation metrics, available datasets and synthetic network generating models used in such studies. Finally, we highlight various open research directions that require researchers’ attention to bridge the gap between fairness and SNA.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

In recent years, designing fairness-aware methods has received much attention in various domains, including machine learning, natural language processing, and information retrieval. However, in social network analysis (SNA), designing fairness-aware methods for various research problems by considering structural bias and inequalities of large-scale social networks has not received much attention. In this work, we highlight how the structural bias of social networks impacts the fairness of different SNA methods. We further discuss fairness aspects that should be considered while proposing network structure-based solutions for different SNA problems, such as link prediction, influence maximization, centrality ranking, and community detection. This survey-cum-vision clearly highlights that very few works have considered fairness and bias while proposing solutions; even these works are mainly focused on some research topics, such as link prediction, influence maximization, and PageRank. However, fairness has not yet been addressed for other research topics, such as influence blocking and community detection. We review the state of the art for different research topics in SNA, including the considered fairness constraints, their limitations, and our vision. This survey also covers evaluation metrics, available datasets and synthetic network generating models used in such studies. Finally, we highlight various open research directions that require researchers’ attention to bridge the gap between fairness and SNA.

Close

  • https://dl.acm.org/doi/10.1145/3653711
  • doi:10.1145/3653711

Close

Mannion, S.; MacCarron, P.; Saxena, A.; Takes, F. W.

Fast degree-preserving rewiring of complex networks Miscellaneous

2024, (arXiv:2401.12047 [physics]).

Abstract | Links | BibTeX

@misc{mannion_fast_2024,
title = {Fast degree-preserving rewiring of complex networks},
author = {S. Mannion and P. MacCarron and A. Saxena and F. W. Takes},
url = {http://arxiv.org/abs/2401.12047},
doi = {10.48550/arXiv.2401.12047},
year = {2024},
date = {2024-04-01},
urldate = {2024-05-13},
publisher = {arXiv},
abstract = {In this paper we introduce a new, fast, degree-preserving rewiring algorithm for altering the assortativity of complex networks, which we call textbackslashtextitFast total link (FTL) rewiring algorithm. Commonly used existing algorithms require a large number of iterations, in particular in the case of large dense networks. This can especially be problematic when we wish to study ensembles of networks. In this work we aim to overcome aforementioned scalability problems by performing a rewiring of all edges at once to achieve a very high assortativity value before rewiring samples of edges at once to reduce this high assortativity value to the target value. The proposed method performs better than existing methods by several orders of magnitude for a range of structurally diverse complex networks, both in terms of the number of iterations taken, and time taken to reach a given assortativity value. Here we test our proposed algorithm on networks with up to $100,000$ nodes and around $750,000$ edges and find that the relative improvements in speed remain, showing that the algorithm is both efficient and scalable.},
note = {arXiv:2401.12047 [physics]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

In this paper we introduce a new, fast, degree-preserving rewiring algorithm for altering the assortativity of complex networks, which we call textbackslashtextitFast total link (FTL) rewiring algorithm. Commonly used existing algorithms require a large number of iterations, in particular in the case of large dense networks. This can especially be problematic when we wish to study ensembles of networks. In this work we aim to overcome aforementioned scalability problems by performing a rewiring of all edges at once to achieve a very high assortativity value before rewiring samples of edges at once to reduce this high assortativity value to the target value. The proposed method performs better than existing methods by several orders of magnitude for a range of structurally diverse complex networks, both in terms of the number of iterations taken, and time taken to reach a given assortativity value. Here we test our proposed algorithm on networks with up to $100,000$ nodes and around $750,000$ edges and find that the relative improvements in speed remain, showing that the algorithm is both efficient and scalable.

Close

  • http://arxiv.org/abs/2401.12047
  • doi:10.48550/arXiv.2401.12047

Close

Aiello, L. M.; Vybornova, A.; Juhász, S.; Szell, M.; Bokányi, E.

Urban highways are barriers to social ties Miscellaneous

2024, (arXiv:2404.11596 [physics]).

Abstract | Links | BibTeX

@misc{aiello_urban_2024,
title = {Urban highways are barriers to social ties},
author = {L. M. Aiello and A. Vybornova and S. Juhász and M. Szell and E. Bokányi},
url = {http://arxiv.org/abs/2404.11596},
doi = {10.48550/arXiv.2404.11596},
year = {2024},
date = {2024-04-01},
urldate = {2024-04-01},
publisher = {arXiv},
abstract = {Urban highways are common, especially in the US, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here we define a Barrier Score that relates massive, geolocated online social network data to highways in the 50 largest US cities. At the unprecedented granularity of individual social ties, we show that urban highways are associated with decreased social connectivity. This barrier effect is especially strong for short distances and consistent with historical cases of highways that were built to purposefully disrupt or isolate Black neighborhoods. By combining spatial infrastructure with social tie data, our method adds a new dimension to demographic studies of social segregation. Our study can inform reparative planning for an evidence-based reduction of spatial inequality, and more generally, support a better integration of the social fabric in urban planning.},
note = {arXiv:2404.11596 [physics]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Urban highways are common, especially in the US, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here we define a Barrier Score that relates massive, geolocated online social network data to highways in the 50 largest US cities. At the unprecedented granularity of individual social ties, we show that urban highways are associated with decreased social connectivity. This barrier effect is especially strong for short distances and consistent with historical cases of highways that were built to purposefully disrupt or isolate Black neighborhoods. By combining spatial infrastructure with social tie data, our method adds a new dimension to demographic studies of social segregation. Our study can inform reparative planning for an evidence-based reduction of spatial inequality, and more generally, support a better integration of the social fabric in urban planning.

Close

  • http://arxiv.org/abs/2404.11596
  • doi:10.48550/arXiv.2404.11596

Close

Setia, S.; Chhabra, A.; Arjun Verma, A.; Saxena, A.

Mediating effects of NLP-based parameters on the readability of crowdsourced wikipedia articles Journal Article

In: Applied Intelligence, vol. 54, no. 5, pp. 4370–4391, 2024, ISSN: 1573-7497.

Abstract | Links | BibTeX

@article{setia_mediating_2024,
title = {Mediating effects of NLP-based parameters on the readability of crowdsourced wikipedia articles},
author = {S. Setia and A. Chhabra and A. Arjun Verma and A. Saxena},
url = {https://doi.org/10.1007/s10489-024-05399-w},
doi = {10.1007/s10489-024-05399-w},
issn = {1573-7497},
year = {2024},
date = {2024-03-01},
urldate = {2024-05-13},
journal = {Applied Intelligence},
volume = {54},
number = {5},
pages = {4370–4391},
abstract = {In this era of information and communication technology, a large population relies on the Internet to gather information. One of the most popular information sources on the Internet is Wikipedia. Wikipedia is a free encyclopedia that provides a wide range of information to its users. However, there have been concerns about the readability of information on Wikipedia time and again. The readability of the text is defined as the ease of understanding the underlying text. Past studies have analyzed the readability of Wikipedia articles with the help of conventional readability metrics, such as the Flesch-Kincaid readability score and the Automatic Readability Index (ARI). Such metrics only consider the surface-level parameters, such as the number of words, sentences, and paragraphs in the text, to quantify the readability. However, the readability of the text must also take into account the quality of the text. In this study, we consider many new NLP-based parameters capturing the quality of the text, such as lexical diversity, semantic diversity, lexical complexity, and semantic complexity and analyze their impact on the readability of Wikipedia articles using artificial neural networks. Besides NLP parameters, the crowdsourced parameters also affect the readability, and therefore, we also analyze the impact of crowdsourced parameters and observe that the crowdsourced parameters not only influence the readability scores but also affect the NLP parameters of the text. Additionally, we investigate the mediating effect of NLP parameters that connect the crowdsourced parameters to the readability of the text. The results show that the impact of crowdsourced parameters on readability is partially due to the profound effect of NLP-based parameters.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

In this era of information and communication technology, a large population relies on the Internet to gather information. One of the most popular information sources on the Internet is Wikipedia. Wikipedia is a free encyclopedia that provides a wide range of information to its users. However, there have been concerns about the readability of information on Wikipedia time and again. The readability of the text is defined as the ease of understanding the underlying text. Past studies have analyzed the readability of Wikipedia articles with the help of conventional readability metrics, such as the Flesch-Kincaid readability score and the Automatic Readability Index (ARI). Such metrics only consider the surface-level parameters, such as the number of words, sentences, and paragraphs in the text, to quantify the readability. However, the readability of the text must also take into account the quality of the text. In this study, we consider many new NLP-based parameters capturing the quality of the text, such as lexical diversity, semantic diversity, lexical complexity, and semantic complexity and analyze their impact on the readability of Wikipedia articles using artificial neural networks. Besides NLP parameters, the crowdsourced parameters also affect the readability, and therefore, we also analyze the impact of crowdsourced parameters and observe that the crowdsourced parameters not only influence the readability scores but also affect the NLP parameters of the text. Additionally, we investigate the mediating effect of NLP parameters that connect the crowdsourced parameters to the readability of the text. The results show that the impact of crowdsourced parameters on readability is partially due to the profound effect of NLP-based parameters.

Close

  • https://doi.org/10.1007/s10489-024-05399-w
  • doi:10.1007/s10489-024-05399-w

Close

Macedo, M.; Saxena, A.

Gender differences in online communication: A case study of Soccer Miscellaneous

2024, (arXiv:2403.11051 [cs]).

Abstract | Links | BibTeX

@misc{macedo_gender_2024,
title = {Gender differences in online communication: A case study of Soccer},
author = {M. Macedo and A. Saxena},
url = {http://arxiv.org/abs/2403.11051},
doi = {10.48550/arXiv.2403.11051},
year = {2024},
date = {2024-03-01},
urldate = {2024-05-13},
publisher = {arXiv},
abstract = {Social media and digital platforms allow us to express our opinions freely and easily to a vast number of people. In this study, we examine whether there are gender-based differences in how communication happens via Twitter in regard to soccer. Soccer is one of the most popular sports, and therefore, on social media, it engages a diverse audience regardless of their technical knowledge. We collected Twitter data for three months (March-June) for English and Portuguese that contains 9.5 million Tweets related to soccer, and only 18.38% tweets were identified as belonging to women, highlighting a possible gender gap already in the number of people who participated actively in this topic. We then conduct a fine-grained text-level and network-level analysis to identify the gender differences that might exist while communicating on Twitter. Our results show that women express their emotions more intensely than men, regardless of the differences in volume. The network generated from Portuguese has lower homophily than English. However, this difference in homophily does not impact how females express their emotions and sentiments, suggesting that these aspects are inherent norms or characteristics of genders. Our study unveils more gaps through qualitative and quantitative analyses, highlighting the importance of examining and reporting gender gaps in online communication to create a more inclusive space where people can openly share their opinions.},
note = {arXiv:2403.11051 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

Social media and digital platforms allow us to express our opinions freely and easily to a vast number of people. In this study, we examine whether there are gender-based differences in how communication happens via Twitter in regard to soccer. Soccer is one of the most popular sports, and therefore, on social media, it engages a diverse audience regardless of their technical knowledge. We collected Twitter data for three months (March-June) for English and Portuguese that contains 9.5 million Tweets related to soccer, and only 18.38% tweets were identified as belonging to women, highlighting a possible gender gap already in the number of people who participated actively in this topic. We then conduct a fine-grained text-level and network-level analysis to identify the gender differences that might exist while communicating on Twitter. Our results show that women express their emotions more intensely than men, regardless of the differences in volume. The network generated from Portuguese has lower homophily than English. However, this difference in homophily does not impact how females express their emotions and sentiments, suggesting that these aspects are inherent norms or characteristics of genders. Our study unveils more gaps through qualitative and quantitative analyses, highlighting the importance of examining and reporting gender gaps in online communication to create a more inclusive space where people can openly share their opinions.

Close

  • http://arxiv.org/abs/2403.11051
  • doi:10.48550/arXiv.2403.11051

Close

Pandey, P. K.; Arya, A.; Saxena, A.

X-distribution: Retraceable Power-law Exponent of Complex Networks Journal Article

In: ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 5, pp. 117:1–117:12, 2024, ISSN: 1556-4681.

Abstract | Links | BibTeX

@article{pandey_x-distribution_2024,
title = {X-distribution: Retraceable Power-law Exponent of Complex Networks},
author = {P. K. Pandey and A. Arya and A. Saxena},
url = {https://dl.acm.org/doi/10.1145/3639413},
doi = {10.1145/3639413},
issn = {1556-4681},
year = {2024},
date = {2024-02-01},
urldate = {2024-05-13},
journal = {ACM Transactions on Knowledge Discovery from Data},
volume = {18},
number = {5},
pages = {117:1–117:12},
abstract = {Network modeling has been explored extensively by means of theoretical analysis as well as numerical simulations for Network Reconstruction (NR). The network reconstruction problem requires the estimation of the power-law exponent (γ) of a given input network. Thus, the effectiveness of the NR solution depends on the accuracy of the calculation of γ. In this article, we re-examine the degree distribution-based estimation of γ, which is not very accurate due to approximations. We propose X-distribution, which is more accurate than degree distribution. Various state-of-the-art network models, including CPM, NRM, RefOrCite2, BA, CDPAM, and DMS, are considered for simulation purposes, and simulated results support the proposed claim. Further, we apply X-distribution over several real-world networks to calculate their power-law exponents, which differ from those calculated using respective degree distributions. It is observed that X-distributions exhibit more linearity (straight line) on the log-log scale than degree distributions. Thus, X-distribution is more suitable for the evaluation of power-law exponent using linear fitting (on the log-log scale). The MATLAB implementation of power-law exponent (γ) calculation using X-distribution for different network models and the real-world datasets used in our experiments are available at https://github.com/Aikta-Arya/X-distribution-Retraceable-Power-Law-Exponent-of-Complex-Networks.git.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Network modeling has been explored extensively by means of theoretical analysis as well as numerical simulations for Network Reconstruction (NR). The network reconstruction problem requires the estimation of the power-law exponent (γ) of a given input network. Thus, the effectiveness of the NR solution depends on the accuracy of the calculation of γ. In this article, we re-examine the degree distribution-based estimation of γ, which is not very accurate due to approximations. We propose X-distribution, which is more accurate than degree distribution. Various state-of-the-art network models, including CPM, NRM, RefOrCite2, BA, CDPAM, and DMS, are considered for simulation purposes, and simulated results support the proposed claim. Further, we apply X-distribution over several real-world networks to calculate their power-law exponents, which differ from those calculated using respective degree distributions. It is observed that X-distributions exhibit more linearity (straight line) on the log-log scale than degree distributions. Thus, X-distribution is more suitable for the evaluation of power-law exponent using linear fitting (on the log-log scale). The MATLAB implementation of power-law exponent (γ) calculation using X-distribution for different network models and the real-world datasets used in our experiments are available at https://github.com/Aikta-Arya/X-distribution-Retraceable-Power-Law-Exponent-of-Complex-Networks.git.

Close

  • https://dl.acm.org/doi/10.1145/3639413
  • doi:10.1145/3639413

Close

de Jong, R. G.; Loo, M. P. J.; Takes, F. W.

The effect of distant connections on node anonymity in complex networks Journal Article

In: Scientific Reports, vol. 14, no. 1, pp. 1156, 2024, ISSN: 2045-2322.

Abstract | Links | BibTeX

@article{de_jong_effect_2024,
title = {The effect of distant connections on node anonymity in complex networks},
author = {R. G. de Jong and M. P. J. Loo and F. W. Takes},
url = {https://doi.org/10.1038/s41598-023-50617-z},
doi = {10.1038/s41598-023-50617-z},
issn = {2045-2322},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {Scientific Reports},
volume = {14},
number = {1},
pages = {1156},
abstract = {Ensuring privacy of individuals is of paramount importance to social network analysis research. Previous work assessed anonymity in a network based on the non-uniqueness of a node’s ego network. In this work, we show that this approach does not adequately account for the strong de-anonymizing effect of distant connections. We first propose the use of d-k-anonymity, a novel measure that takes knowledge up to distance d of a considered node into account. Second, we introduce anonymity-cascade, which exploits the so-called infectiousness of uniqueness: mere information about being connected to another unique node can make a given node uniquely identifiable. These two approaches, together with relevant “twin node” processing steps in the underlying graph structure, offer practitioners flexible solutions, tunable in precision and computation time. This enables the assessment of anonymity in large-scale networks with up to millions of nodes and edges. Experiments on graph models and a wide range of real-world networks show drastic decreases in anonymity when connections at distance 2 are considered. Moreover, extending the knowledge beyond the ego network with just one extra link often already decreases overall anonymity by over 50%. These findings have important implications for privacy-aware sharing of sensitive network data.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Ensuring privacy of individuals is of paramount importance to social network analysis research. Previous work assessed anonymity in a network based on the non-uniqueness of a node’s ego network. In this work, we show that this approach does not adequately account for the strong de-anonymizing effect of distant connections. We first propose the use of d-k-anonymity, a novel measure that takes knowledge up to distance d of a considered node into account. Second, we introduce anonymity-cascade, which exploits the so-called infectiousness of uniqueness: mere information about being connected to another unique node can make a given node uniquely identifiable. These two approaches, together with relevant “twin node” processing steps in the underlying graph structure, offer practitioners flexible solutions, tunable in precision and computation time. This enables the assessment of anonymity in large-scale networks with up to millions of nodes and edges. Experiments on graph models and a wide range of real-world networks show drastic decreases in anonymity when connections at distance 2 are considered. Moreover, extending the knowledge beyond the ego network with just one extra link often already decreases overall anonymity by over 50%. These findings have important implications for privacy-aware sharing of sensitive network data.

Close

  • https://doi.org/10.1038/s41598-023-50617-z
  • doi:10.1038/s41598-023-50617-z

Close

Sánchez-Olivares, E.; Boekhout, H. D.; Saxena, A.; Takes, F. W.

A Framework for Empirically Evaluating Pretrained Link Prediction Models Proceedings Article

In: Cherifi, H.; Rocha, L. M.; Cherifi, C.; Donduran, M. (Ed.): Complex Networks & Their Applications XII. Proceedings of the 12th International Conference on Complex Networks (Complex Networks 2023), pp. 150–161, Springer Nature Switzerland, Cham, 2024, ISBN: 978-3-031-53468-3.

Abstract | Links | BibTeX

@inproceedings{sanchez_olivares_framework_2024,
title = {A Framework for Empirically Evaluating Pretrained Link Prediction Models},
author = {E. Sánchez-Olivares and H. D. Boekhout and A. Saxena and F. W. Takes},
editor = {H. Cherifi and L. M. Rocha and C. Cherifi and M. Donduran},
doi = {10.1007/978-3-031-53468-3_13},
isbn = {978-3-031-53468-3},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Complex Networks & Their Applications XII. Proceedings of the 12th International Conference on Complex Networks (Complex Networks 2023)},
pages = {150–161},
publisher = {Springer Nature Switzerland},
address = {Cham},
abstract = {This paper proposes a novel framework for empirically assessing the effect of network characteristics on the performance of pretrained link prediction models. In link prediction, the task is to predict missing or future links in a given network dataset. We focus on the pretrained setting, in which such a predictive model is trained on one dataset, and employed on another dataset. The framework allows one to overcome a number of nontrivial challenges in adequately testing the performance of such a pretrained model in a proper cross-validated setting. Experiments are performed on a corpus of 49 structurally diverse real-world complex network datasets from various domains with up to hundreds of thousands of nodes and edges. Overall results indicate that the extent to which a network is clustered is strongly related to whether this network is a suitable candidate to create a pretrained model on. Moreover, we systematically assessed the relationship between topological similarity and performance difference of pretrained models and a model trained on the same data. We find that similar network pairs in terms of clustering coefficient, and to a lesser extent degree assortativity and gini coefficient, yield minimal performance difference. The findings presented in this work pave the way for automated model selection based on topological similarity of the networks, as well as larger-scale deployment of pretrained link prediction models for transfer learning.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

This paper proposes a novel framework for empirically assessing the effect of network characteristics on the performance of pretrained link prediction models. In link prediction, the task is to predict missing or future links in a given network dataset. We focus on the pretrained setting, in which such a predictive model is trained on one dataset, and employed on another dataset. The framework allows one to overcome a number of nontrivial challenges in adequately testing the performance of such a pretrained model in a proper cross-validated setting. Experiments are performed on a corpus of 49 structurally diverse real-world complex network datasets from various domains with up to hundreds of thousands of nodes and edges. Overall results indicate that the extent to which a network is clustered is strongly related to whether this network is a suitable candidate to create a pretrained model on. Moreover, we systematically assessed the relationship between topological similarity and performance difference of pretrained models and a model trained on the same data. We find that similar network pairs in terms of clustering coefficient, and to a lesser extent degree assortativity and gini coefficient, yield minimal performance difference. The findings presented in this work pave the way for automated model selection based on topological similarity of the networks, as well as larger-scale deployment of pretrained link prediction models for transfer learning.

Close

  • doi:10.1007/978-3-031-53468-3_13

Close

Liang, Z.; Li, Y.; Huang, T.; Saxena, A.; Pei, Y.; Pechenizkiy, M.

Heterophily-Based Graph Neural Network for Imbalanced Classification Proceedings Article

In: Cherifi, H.; Rocha, L. M.; Cherifi, C.; Donduran, M. (Ed.): Complex Networks & Their Applications XII, pp. 74–86, Springer Nature Switzerland, Cham, 2024, ISBN: 978-3-031-53468-3.

Abstract | Links | BibTeX

@inproceedings{liang_heterophily-based_2024,
title = {Heterophily-Based Graph Neural Network for Imbalanced Classification},
author = {Z. Liang and Y. Li and T. Huang and A. Saxena and Y. Pei and M. Pechenizkiy},
editor = {H. Cherifi and L. M. Rocha and C. Cherifi and M. Donduran},
doi = {10.1007/978-3-031-53468-3_7},
isbn = {978-3-031-53468-3},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Complex Networks & Their Applications XII},
pages = {74–86},
publisher = {Springer Nature Switzerland},
address = {Cham},
abstract = {Graph neural networks (GNNs) have shown promise in addressing graph-related problems, including node classification. However, in real-world scenarios, data often exhibits an imbalanced, sometimes highly-skewed, distribution with dominant classes representing the majority, where certain classes are severely underrepresented. This leads to a suboptimal performance of standard GNNs on imbalanced graphs. In this paper, we introduce a unique approach that tackles imbalanced classification on graphs by considering graph heterophily. We investigate the intricate relationship between class imbalance and graph heterophily, revealing that minority classes not only exhibit a scarcity of samples but also manifest lower levels of homophily, facilitating the propagation of erroneous information among neighboring nodes. Drawing upon this insight, we propose an efficient method, called Fast Im-GBK, which integrates an imbalance classification strategy with heterophily-aware GNNs to effectively address the class imbalance problem while significantly reducing training time. Our experiments on real-world graphs demonstrate our model’s superiority in classification performance and efficiency for node classification tasks compared to existing baselines.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Graph neural networks (GNNs) have shown promise in addressing graph-related problems, including node classification. However, in real-world scenarios, data often exhibits an imbalanced, sometimes highly-skewed, distribution with dominant classes representing the majority, where certain classes are severely underrepresented. This leads to a suboptimal performance of standard GNNs on imbalanced graphs. In this paper, we introduce a unique approach that tackles imbalanced classification on graphs by considering graph heterophily. We investigate the intricate relationship between class imbalance and graph heterophily, revealing that minority classes not only exhibit a scarcity of samples but also manifest lower levels of homophily, facilitating the propagation of erroneous information among neighboring nodes. Drawing upon this insight, we propose an efficient method, called Fast Im-GBK, which integrates an imbalance classification strategy with heterophily-aware GNNs to effectively address the class imbalance problem while significantly reducing training time. Our experiments on real-world graphs demonstrate our model’s superiority in classification performance and efficiency for node classification tasks compared to existing baselines.

Close

  • doi:10.1007/978-3-031-53468-3_7

Close

Jung-Muller, M.; Ceria, A.; Wang, H.

Higher-Order Temporal Network Prediction Proceedings Article

In: Cherifi, H.; Rocha, L. M.; Cherifi, C.; Donduran, M. (Ed.): Complex Networks & Their Applications XII, pp. 461–472, Springer Nature Switzerland, Cham, 2024, ISBN: 978-3-031-53503-1.

Abstract | Links | BibTeX

@inproceedings{jung-muller_higher-order_2024,
title = {Higher-Order Temporal Network Prediction},
author = {M. Jung-Muller and A. Ceria and H. Wang},
editor = {H. Cherifi and L. M. Rocha and C. Cherifi and M. Donduran},
doi = {10.1007/978-3-031-53503-1_38},
isbn = {978-3-031-53503-1},
year = {2024},
date = {2024-01-01},
booktitle = {Complex Networks & Their Applications XII},
pages = {461–472},
publisher = {Springer Nature Switzerland},
address = {Cham},
abstract = {A social interaction (so-called higher-order event/interaction) can be regarded as the activation of the hyperlink among the corresponding individuals. Social interactions can be, thus, represented as higher-order temporal networks, that record the higher-order events occurring at each time step over time. The prediction of higher-order interactions is usually overlooked in traditional temporal network prediction methods, where a higher-order interaction is regarded as a set of pairwise interactions. We propose a memory-based model that predicts the higher-order temporal network (or events) one step ahead, based on the network observed in the past and a baseline utilizing pairwise temporal network prediction method. In eight real-world networks, we find that our model consistently outperforms the baseline. Importantly, our model reveals how past interactions of the target hyperlink and different types of hyperlinks that overlap with the target hyperlinks contribute to the prediction of the activation of the target link in the future.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

A social interaction (so-called higher-order event/interaction) can be regarded as the activation of the hyperlink among the corresponding individuals. Social interactions can be, thus, represented as higher-order temporal networks, that record the higher-order events occurring at each time step over time. The prediction of higher-order interactions is usually overlooked in traditional temporal network prediction methods, where a higher-order interaction is regarded as a set of pairwise interactions. We propose a memory-based model that predicts the higher-order temporal network (or events) one step ahead, based on the network observed in the past and a baseline utilizing pairwise temporal network prediction method. In eight real-world networks, we find that our model consistently outperforms the baseline. Importantly, our model reveals how past interactions of the target hyperlink and different types of hyperlinks that overlap with the target hyperlinks contribute to the prediction of the activation of the target link in the future.

Close

  • doi:10.1007/978-3-031-53503-1_38

Close

de Jong, R. G.; Loo, M. P. J.; Takes, F. W.

The anonymization problem in social networks Miscellaneous

2024.

Links | BibTeX

@misc{dejong2024anonymizationproblemsocialnetworks,
title = {The anonymization problem in social networks},
author = {R. G. de Jong and M. P. J. Loo and F. W. Takes},
url = {https://arxiv.org/abs/2409.16163},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}

Close

  • https://arxiv.org/abs/2409.16163

Close

2023

Arya, A.; Pandey, P. K.; Saxena, A.

Balanced and Unbalanced Triangle Count in Signed Networks Journal Article

In: IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12491–12496, 2023, ISSN: 1558-2191, (Conference Name: IEEE Transactions on Knowledge and Data Engineering).

Abstract | Links | BibTeX

@article{arya_balanced_2023,
title = {Balanced and Unbalanced Triangle Count in Signed Networks},
author = {A. Arya and P. K. Pandey and A. Saxena},
url = {https://ieeexplore.ieee.org/abstract/document/10115002},
doi = {10.1109/TKDE.2023.3272657},
issn = {1558-2191},
year = {2023},
date = {2023-12-01},
urldate = {2024-05-13},
journal = {IEEE Transactions on Knowledge and Data Engineering},
volume = {35},
number = {12},
pages = {12491–12496},
abstract = {Triangle count is a frequently used network statistic, possessing high computational cost. Moreover, this task gets even more complex in the case of signed networks which consist of unbalanced and balanced triangles. In this work, we propose a fast Incremental Triangle Counting (ITC) algorithm for counting all types of triangles, including balanced and unbalanced. The proposed algorithm updates the count of different types of triangles for newly added nodes and edges only instead of recalculating the same triangle multiple times for the entire network repeatedly. Thus, the proposed ITC algorithm also works for dynamic networks. The experimental results show that the proposed method is practically efficient having run time complexity of O(m k_textbackslashmax)O(mkmax), where mm represents the number of edges and k_textbackslashmaxkmax represents the maximum degree of the given signed network.},
note = {Conference Name: IEEE Transactions on Knowledge and Data Engineering},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Triangle count is a frequently used network statistic, possessing high computational cost. Moreover, this task gets even more complex in the case of signed networks which consist of unbalanced and balanced triangles. In this work, we propose a fast Incremental Triangle Counting (ITC) algorithm for counting all types of triangles, including balanced and unbalanced. The proposed algorithm updates the count of different types of triangles for newly added nodes and edges only instead of recalculating the same triangle multiple times for the entire network repeatedly. Thus, the proposed ITC algorithm also works for dynamic networks. The experimental results show that the proposed method is practically efficient having run time complexity of O(m k_textbackslashmax)O(mkmax), where mm represents the number of edges and k_textbackslashmaxkmax represents the maximum degree of the given signed network.

Close

  • https://ieeexplore.ieee.org/abstract/document/10115002
  • doi:10.1109/TKDE.2023.3272657

Close

Bokányi, E.; Vizi, Z.; Koltai, J.; Röst, G.; Karsai, M.

Real-time estimation of the effective reproduction number of COVID-19 from behavioral data Journal Article

In: Scientific Reports, vol. 13, no. 1, pp. 21452, 2023, ISSN: 2045-2322, (Publisher: Nature Publishing Group).

Abstract | Links | BibTeX

@article{bokanyi_real-time_2023,
title = {Real-time estimation of the effective reproduction number of COVID-19 from behavioral data},
author = {E. Bokányi and Z. Vizi and J. Koltai and G. Röst and M. Karsai},
url = {https://www.nature.com/articles/s41598-023-46418-z},
doi = {10.1038/s41598-023-46418-z},
issn = {2045-2322},
year = {2023},
date = {2023-12-01},
urldate = {2024-05-13},
journal = {Scientific Reports},
volume = {13},
number = {1},
pages = {21452},
abstract = {Monitoring the effective reproduction number $$R_t$$of a rapidly unfolding pandemic in real-time is key to successful mitigation and prevention strategies. However, existing methods based on case numbers, hospital admissions or fatalities suffer from multiple measurement biases and temporal lags due to high test positivity rates or delays in symptom development or administrative reporting. Alternative methods such as web search and social media tracking are less directly indicating epidemic prevalence over time. We instead record age-stratified anonymous contact matrices at a daily resolution using a longitudinal online-offline survey in Hungary during the first two waves of the COVID-19 pandemic. This approach is innovative, cheap, and provides information in near real-time for estimating $$R_t$$at a daily resolution. Moreover, it allows to complement traditional surveillance systems by signaling periods when official monitoring infrastructures are unreliable due to observational biases.},
note = {Publisher: Nature Publishing Group},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Monitoring the effective reproduction number $$R_t$$of a rapidly unfolding pandemic in real-time is key to successful mitigation and prevention strategies. However, existing methods based on case numbers, hospital admissions or fatalities suffer from multiple measurement biases and temporal lags due to high test positivity rates or delays in symptom development or administrative reporting. Alternative methods such as web search and social media tracking are less directly indicating epidemic prevalence over time. We instead record age-stratified anonymous contact matrices at a daily resolution using a longitudinal online-offline survey in Hungary during the first two waves of the COVID-19 pandemic. This approach is innovative, cheap, and provides information in near real-time for estimating $$R_t$$at a daily resolution. Moreover, it allows to complement traditional surveillance systems by signaling periods when official monitoring infrastructures are unreliable due to observational biases.

Close

  • https://www.nature.com/articles/s41598-023-46418-z
  • doi:10.1038/s41598-023-46418-z

Close

Saxena, A.; Bierbooms, C. Gutiérrez; Pechenizkiy, M.

Fairness-aware fake news mitigation using counter information propagation Journal Article

In: Applied Intelligence, vol. 53, no. 22, pp. 27483–27504, 2023, ISSN: 1573-7497.

Abstract | Links | BibTeX

@article{saxena_fairness-aware_2023,
title = {Fairness-aware fake news mitigation using counter information propagation},
author = {A. Saxena and C. Gutiérrez Bierbooms and M. Pechenizkiy},
url = {https://doi.org/10.1007/s10489-023-04928-3},
doi = {10.1007/s10489-023-04928-3},
issn = {1573-7497},
year = {2023},
date = {2023-11-01},
urldate = {2024-05-13},
journal = {Applied Intelligence},
volume = {53},
number = {22},
pages = {27483–27504},
abstract = {Given the adverse impact of fake news propagation on Social media, fake news mitigation has been one of the main research directions. However, existing approaches neglect fairness towards each community while minimizing the adverse impact of fake news propagation. This results in the exclusion of some minor and underrepresented communities from the benefits of the intervention, which can have important societal repercussions. This research proposes a fairness-aware truth-campaigning method, called FWRRS (Fairness-aware Weighted Reversible Reachable System), which focuses on blocking the influence propagation of a competing entity, in this case, with the use case of fake news mitigation. The proposed method employs weighted reversible reachable trees and maximin fairness to achieve its goals. Experimental analysis shows that FWRRS outperforms fairness-oblivious and fairness-aware methods in terms of both total outreach and fairness. The results show that in the proposed approach, such fairness does not come at a cost in efficiency, and in fact, in most cases, it works as a catalyst for achieving better effectiveness in the future. In real-world networks, we observe up to $$textbackslashsim $$10% improvement in the saved nodes and $$textbackslashsim $$57% improvement in maximin fairness as compared to the second best-performing baseline, which varies for each network.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Given the adverse impact of fake news propagation on Social media, fake news mitigation has been one of the main research directions. However, existing approaches neglect fairness towards each community while minimizing the adverse impact of fake news propagation. This results in the exclusion of some minor and underrepresented communities from the benefits of the intervention, which can have important societal repercussions. This research proposes a fairness-aware truth-campaigning method, called FWRRS (Fairness-aware Weighted Reversible Reachable System), which focuses on blocking the influence propagation of a competing entity, in this case, with the use case of fake news mitigation. The proposed method employs weighted reversible reachable trees and maximin fairness to achieve its goals. Experimental analysis shows that FWRRS outperforms fairness-oblivious and fairness-aware methods in terms of both total outreach and fairness. The results show that in the proposed approach, such fairness does not come at a cost in efficiency, and in fact, in most cases, it works as a catalyst for achieving better effectiveness in the future. In real-world networks, we observe up to $$textbackslashsim $$10% improvement in the saved nodes and $$textbackslashsim $$57% improvement in maximin fairness as compared to the second best-performing baseline, which varies for each network.

Close

  • https://doi.org/10.1007/s10489-023-04928-3
  • doi:10.1007/s10489-023-04928-3

Close

Zou, L.; Ceria, A.; Wang, H.

Short- and long-term temporal network prediction based on network memory Journal Article

In: Applied Network Science, vol. 8, no. 1, pp. 76, 2023, ISSN: 2364-8228.

Abstract | Links | BibTeX

@article{zou_short-_2023,
title = {Short- and long-term temporal network prediction based on network memory},
author = {L. Zou and A. Ceria and H. Wang},
url = {https://doi.org/10.1007/s41109-023-00597-w},
doi = {10.1007/s41109-023-00597-w},
issn = {2364-8228},
year = {2023},
date = {2023-11-01},
urldate = {2024-05-13},
journal = {Applied Network Science},
volume = {8},
number = {1},
pages = {76},
abstract = {Temporal networks are networks whose topology changes over time. Two nodes in a temporal network are connected at a discrete time step only if they have a contact/interaction at that time. The classic temporal network prediction problem aims to predict the temporal network one time step ahead based on the network observed in the past of a given duration. This problem has been addressed mostly via machine learning algorithms, at the expense of high computational costs and limited interpretation of the underlying mechanisms that form the networks. Hence, we propose to predict the connection of each node pair one step ahead based on the connections of this node pair itself and of node pairs that share a common node with this target node pair in the past. The concrete design of our two prediction models is based on the analysis of the memory property of real-world physical networks, i.e., to what extent two snapshots of a network at different times are similar in topology (or overlap). State-of-the-art prediction methods that allow interpretation are considered as baseline models. In seven real-world physical contact networks, our methods are shown to outperform the baselines in both prediction accuracy and computational complexity. They perform better in networks with stronger memory. Importantly, our models reveal how the connections of different types of node pairs in the past contribute to the connection estimation of a target node pair. Predicting temporal networks like physical contact networks in the long-term future beyond short-term i.e., one step ahead is crucial to forecast and mitigate the spread of epidemics and misinformation on the network. This long-term prediction problem has been seldom explored. Therefore, we propose basic methods that adapt each aforementioned prediction model to address classic short-term network prediction problem for long-term network prediction task. The prediction quality of all adapted models is evaluated via the accuracy in predicting each network snapshot and in reproducing key network properties. The prediction based on one of our models tends to have the highest accuracy and lowest computational complexity.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Temporal networks are networks whose topology changes over time. Two nodes in a temporal network are connected at a discrete time step only if they have a contact/interaction at that time. The classic temporal network prediction problem aims to predict the temporal network one time step ahead based on the network observed in the past of a given duration. This problem has been addressed mostly via machine learning algorithms, at the expense of high computational costs and limited interpretation of the underlying mechanisms that form the networks. Hence, we propose to predict the connection of each node pair one step ahead based on the connections of this node pair itself and of node pairs that share a common node with this target node pair in the past. The concrete design of our two prediction models is based on the analysis of the memory property of real-world physical networks, i.e., to what extent two snapshots of a network at different times are similar in topology (or overlap). State-of-the-art prediction methods that allow interpretation are considered as baseline models. In seven real-world physical contact networks, our methods are shown to outperform the baselines in both prediction accuracy and computational complexity. They perform better in networks with stronger memory. Importantly, our models reveal how the connections of different types of node pairs in the past contribute to the connection estimation of a target node pair. Predicting temporal networks like physical contact networks in the long-term future beyond short-term i.e., one step ahead is crucial to forecast and mitigate the spread of epidemics and misinformation on the network. This long-term prediction problem has been seldom explored. Therefore, we propose basic methods that adapt each aforementioned prediction model to address classic short-term network prediction problem for long-term network prediction task. The prediction quality of all adapted models is evaluated via the accuracy in predicting each network snapshot and in reproducing key network properties. The prediction based on one of our models tends to have the highest accuracy and lowest computational complexity.

Close

  • https://doi.org/10.1007/s41109-023-00597-w
  • doi:10.1007/s41109-023-00597-w

Close

Fajardo, S.; Kozowyk, P. R. B.; Langejans, G. H. J.

Measuring ancient technological complexity and its cognitive implications using Petri nets Journal Article

In: Scientific Reports, vol. 13, no. 1, pp. 14961, 2023, ISSN: 2045-2322, (Publisher: Nature Publishing Group).

Abstract | Links | BibTeX

@article{fajardo_measuring_2023,
title = {Measuring ancient technological complexity and its cognitive implications using Petri nets},
author = {S. Fajardo and P. R. B. Kozowyk and G. H. J. Langejans},
url = {https://www.nature.com/articles/s41598-023-42078-1},
doi = {10.1038/s41598-023-42078-1},
issn = {2045-2322},
year = {2023},
date = {2023-09-01},
urldate = {2024-05-13},
journal = {Scientific Reports},
volume = {13},
number = {1},
pages = {14961},
abstract = {We implement a method from computer sciences to address a challenge in Paleolithic archaeology: how to infer cognition differences from material culture. Archaeological material culture is linked to cognition, and more complex ancient technologies are assumed to have required complex cognition. We present an application of Petri net analysis to compare Neanderthal tar production technologies and tie the results to cognitive requirements. We applied three complexity metrics, each relying on their own unique definitions of complexity, to the modeled production processes. Based on the results, we propose that Neanderthal technical cognition may have been analogous to that of contemporary modern humans. This method also enables us to distinguish the high-order cognitive functions combining traits like planning, inhibitory control, and learning that were likely required by different ancient technological processes. The Petri net approach can contribute to our understanding of technology and cognitive evolution as it can be used on different materials and technologies, across time and species.},
note = {Publisher: Nature Publishing Group},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

We implement a method from computer sciences to address a challenge in Paleolithic archaeology: how to infer cognition differences from material culture. Archaeological material culture is linked to cognition, and more complex ancient technologies are assumed to have required complex cognition. We present an application of Petri net analysis to compare Neanderthal tar production technologies and tie the results to cognitive requirements. We applied three complexity metrics, each relying on their own unique definitions of complexity, to the modeled production processes. Based on the results, we propose that Neanderthal technical cognition may have been analogous to that of contemporary modern humans. This method also enables us to distinguish the high-order cognitive functions combining traits like planning, inhibitory control, and learning that were likely required by different ancient technological processes. The Petri net approach can contribute to our understanding of technology and cognitive evolution as it can be used on different materials and technologies, across time and species.

Close

  • https://www.nature.com/articles/s41598-023-42078-1
  • doi:10.1038/s41598-023-42078-1

Close

Publications from 2020 and earlier can be found here.

© Leiden Computational Network Science Group
Proudly powered by WordPress | Theme: Doo by ThemeVS.