Enikő Csaba – Solving the alignment problem of word embedding vector spaces with Procrustes transformations

2023 Survey Statistics and Data Analytics MSc Supervisor Márton Rakovics

Enikő Csaba

The thesis attempts to compare two corpora of articles from online news portals with different social perspectives by matching word embedding vector spaces in order to define the differences resulting from the different contexts. In addition, a further aim of this thesis is to determine the suitability of the Procrustes transform as a tool for matching vector representations in a common space. By creating different word embeddings, the most suitable model for the task is first selected, and then the Procrustes transformations are implemented and evaluated. After selecting the transformation with the lowest approximation error, the fitted vector space is analysed. The results confirm on the one hand that the Procrustes transform is suitable for dealing with the matching problem due to the mismatch of embeddings, and on the other hand, it identifies topic-specific words that appear in different contexts in the two media.

View Thesis