A forgotten task in modern NLP
Word alignment refers to the task of uncovering word correspondences between translated text pairs. While it might be an unfamiliar concept to the newer generation of NLP practitioners, those with more experience will recognize it as a component of earlier machine translation systems. Although state-of-the-art machine translation models have largely moved away from explicitly using word alignment, this task remains relevant in multiple modern applications such as evaluating translation models and label propagation.
Leveraging Word Alignment to enhance a generative AI templated text system
At Ubisoft La Forge, we used word alignment in a tool for Templated Text, a narrative technique where text templates ("Deliver the <THING> to the <PLACE>") are combined with concept groundings (THING = medkit, PLACE = hospital) to procedurally create text based on gameplay. One challenge in the use of templated text in AAA games is that they are notoriously difficult to localize into other languages, and so we devised a simple but effective method to translate templates that requires cross-lingual label propagation of the template slots.
The ability to automatically infer word alignments makes label propagation straightforward, and in building out this narrative tool we managed to improve the state of the art on this long standing and well-known NLP task. In this article, we explore our research on word alignment and share insights from our recent paper, 'BinaryAlign: Word Alignment as Binary Sequence Labeling,' published in ACL 2024.
Challenges in deploying SOTA Word Alignment techniques
When we set out to deploy SOTA word alignment methods, we met two challenges.
First, from a complexity perspective, the deployment process was far from straightforward. SOTA word alignment methods recommend a different model class depending on the availability of gold alignment training data for a particular language pair. Since our application needed to support both high- and low-resource languages, this variation in preferred methods introduced additional complexity into our deployment pipeline.
Second, from a performance standpoint, we found SOTA word alignment methods are suboptimal. These approaches struggle with complex alignment scenarios such as handling non-contiguous alignments or untranslated words because they view word alignment as a single-label classification problem. This limitation is especially problematic for multilingual applications, as it can result in unexpectedly poor performance for certain language pairs, particularly when complex alignment scenarios are prevalent.
These challenges led us to seek a more unified and efficient solution that could perform consistently across a wide range of languages and alignment situations.
Reformulating Word Alignment as a set of binary sequence labeling tasks
We hypothesised that word alignment is better viewed as a series of binary classifications applied to each possible pair of words rather than a single-label classification as previous SOTA have done.
Building on this idea, we introduce BinaryAlign, a novel word alignment approach leveraging a binary sequence labeling model. BinaryAlign reformulates word alignment as a set of binary classifications tasks.
To illustrate how our method works, the figure below demonstrates the word alignment for the reference word 'sofa,' given the source sentence 'He has a sofa' and the target sentence 'Il a un canapé' using our approach.
BinaryAlign consists of two key components:
-
A Multilingual Pretrained Language Model (Bert style): it takes as input the concatenation of the source and target sentences and returns the contextualized token embeddings for each input tokens.
-
A Token Classifier: It takes as input the contextualized word embeddings returned by the multilingual pretrained language model. For each token in the target sentence, the classifier computes the probability that it aligns with the specific reference word (in our case "sofa").
In this example, we designate the word "sofa" as the reference word by surrounding it with special trainable tokens ("[*]" in the figure) in the input of the pretrained language model.
This previously described process is repeated for each word in the source sentence to generate the word alignment for the entire sentence. For more details on our method please refer to section 3 of our paper.
BinaryAlign sets new state-of-the-art
We evaluate BinaryAlign against previous SOTAs in different degrees of supervision and on five different languages.
As shown in the table above, BinaryAlign reaches state-of-the-art performance across all tested language pairs regardless of the degree of supervision. This success means we reduced the complexity of deploying word alignment models in our templated text application by creating a unified approach that delivers top performance across all language pairs.
A greater ability to handle complex word alignment situations
To verify that BinaryAlign handles complex word alignment situations better than previous methods, we compare it to previous SOTAs in three complex situations: 1) words that are untranslated (2) words that are aligned to multiple words (3) words that are aligned to multiple non-contiguous words.
Our results indicate that our method handles these situations better than previous SOTAs. This is particularly interesting as it means that the prevalence of these situations in each language pair will modulate the performance gain of our method over the others.
Conclusion
By reformulating word alignment as a series of binary sequence labeling tasks, BinaryAlign provides a unified and efficient solution that outperforms existing state-of-the-art methods across a diverse range of languages and alignment scenarios. Most importantly, our research made the word alignment task practical for real word situations by using a single model for both high and low-resource languages.
However, one limitation of our approach is its high inference cost, as it requires a forward pass for each word in both the source and target sentences. This can lead to significant computational overhead, especially when dealing with long sentences at scale. To address this, we have initiated follow-up research focused on developing a more efficient method that maintains BinaryAlign's enhanced performance while requiring only a single forward pass to generate word alignments. This new approach leverages knowledge distillation, using BinaryAlign models as teachers, and incorporates a more efficient architecture that directly uses the contextualized embeddings from multilingual pretrained language models.