TY - GEN
T1 - A Comparative Study of Variational and Vector Encoders in Graph User Matching
AU - Winckelmans, Joeri
AU - De Clerck, Bart
AU - Steckel, Jan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Cross-Platform User Identification (CPUI) aims to identify social media accounts belonging to the same real-world user across different platforms. This task is vital for combating cybercrime, where malicious users create multiple accounts, and for enhancing user modeling in fields such as sociology, economics, and epidemiology. Prior research suggests that vector-based encoding of a local network graph may fall short when faced with real-world inconsistencies such as platform dependency and data sparsity. In response, variational encoding, which models the data as normal distributions explicitly, has been proposed as a more robust alternative. In this paper we present a comparative study of vector and variational encoding approaches in the context of a binary CPUI classification task. For this goal, we constructed a synthetic heterogeneous graph derived from 277 research papers authored within an engineering department. Using vector-embedding of the textual context of the papers as features, various models were trained to evaluate the advantage of variational encoding in CPUI. Experimental results show that the standard vector encoding consistently outperforms the variational models in terms of accuracy, F1-score, and AUC-ROC. While all models achieved high performance (accuracy around 90%), there was no empirical advantage to using variational encoding in our experiments. These findings suggest that the benefits of variational encoding may depend on the presence of real-world data inconsistencies that our synthetic dataset lacks.
AB - Cross-Platform User Identification (CPUI) aims to identify social media accounts belonging to the same real-world user across different platforms. This task is vital for combating cybercrime, where malicious users create multiple accounts, and for enhancing user modeling in fields such as sociology, economics, and epidemiology. Prior research suggests that vector-based encoding of a local network graph may fall short when faced with real-world inconsistencies such as platform dependency and data sparsity. In response, variational encoding, which models the data as normal distributions explicitly, has been proposed as a more robust alternative. In this paper we present a comparative study of vector and variational encoding approaches in the context of a binary CPUI classification task. For this goal, we constructed a synthetic heterogeneous graph derived from 277 research papers authored within an engineering department. Using vector-embedding of the textual context of the papers as features, various models were trained to evaluate the advantage of variational encoding in CPUI. Experimental results show that the standard vector encoding consistently outperforms the variational models in terms of accuracy, F1-score, and AUC-ROC. While all models achieved high performance (accuracy around 90%), there was no empirical advantage to using variational encoding in our experiments. These findings suggest that the benefits of variational encoding may depend on the presence of real-world data inconsistencies that our synthetic dataset lacks.
UR - https://www.scopus.com/pages/publications/105035298730
U2 - 10.1007/978-981-95-7075-1_45
DO - 10.1007/978-981-95-7075-1_45
M3 - Conference contribution
AN - SCOPUS:105035298730
SN - 9789819570744
T3 - Lecture Notes in Computer Science
SP - 671
EP - 685
BT - PRICAI 2025
A2 - Mei, Yi
A2 - Xue, Bing
A2 - Qian, Chao
A2 - Bai, Quan
A2 - Khanna, Sankalp
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd Pacific Rim International Conference on Artificial Intelligence, PRICAI 2025
Y2 - 17 November 2025 through 21 November 2025
ER -