In machine translation, we would find encoder-decoder networks. These old recurrent models were typically built from many recurrent units like LSTMs or GRUs. Transformers are indirect descendants of the previous RNN models. In this article, we will explore how these embeddings have been adapted and applied to a range of semantic similarity applications by using a new breed of transformers called ‘sentence transformers’.īefore we dive into sentence transformers, it might help to piece together why transformer embeddings are so much richer - and where the difference lies between a vanilla transformer and a sentence transformer. Clustering - we can cluster our sentences, useful for topic modeling.Enables search to be performed on concepts (rather than specific words). Given a set of sentences, we can search using a ‘query’ sentence and identify the most similar records. Semantic search - information retrieval (IR) using semantic meaning.We may want to identify patterns in datasets, but this is most often used for benchmarking. Semantic textual similarity (STS) - comparison of sentence pairs.These increasingly rich sentence embeddings can be used to quickly compare sentence similarity for various use cases. The dense embeddings created by transformer models are so much richer in information that we get massive performance benefits despite using the same final outward layers. It’s the input to these layers that changed.
The funny thing is, for many tasks, the latter parts of these models are the same as those in RNNs - often a couple of feedforward NNs that output model predictions. These new models can answer questions, write articles (maybe GPT-3 wrote this), enable incredibly intuitive semantic search - and much more. Since the introduction of the first transformer model in the 2017 paper ‘Attention is all you need’, NLP has moved from RNNs to models like BERT and GPT. Before transformers, we had okay translation and language classification thanks to recurrent neural nets (RNNs) - their language comprehension was limited and led to many minor mistakes, and coherence over larger chunks of text was practically impossible. Transformers have wholly rebuilt the landscape of natural language processing (NLP). Sentence Transformers: Meanings in Disguise