This article aims to explain what word embedding means, and how it is used to find information in documents.
We use word embeddings to help computers better understand words and their meanings. Think about it like this; Word embeddings can be likened to putting words into a special code that computers can understand better.
In the game “Snakes and Ladders,” The board has squares with numbers; you can use the dice to know how many squares you move. In this case, the numbers help you keep track of the game’s progress and ultimately determine the winner of the game.
Just like how we used numbers in the game, as explained, computers use numbers to understand words and their relationship with each other.
The following is achieved:
The above opens the possibilities to achieve a lot in natural language processing. Common tasks in NLP, such as language translation and information retrieval, are made possible by word embeddings.
For simplicity, word embedding is a natural language processing (NLP) technique used to represent words as vectors of numbers. This representation allows computers to understand and perform various language-related tasks.
Now that you know the definition of word embeddings, let’s dive deeper into the technical aspect of this concept.
Word embeddings are dense, continuous-valued vectors representing words in a high-dimensional space. The vectors come from a model trained on large corpora of text, designed to capture the semantic meaning of words, including their relationships with other words.
They are typically learned from large amounts of text data using algorithms such as word2vec, GloVe, or fastText. We can also use Sentence Transformers to generate embeddings; an example is “sentence-transformers/all-MiniLM-L6-v2.”
Consider the following words – “king,” “queen,” “man,” and “woman,” with each word represented as a vector in a high-dimensional space.
For simplicity, let’s say that a word embedding model might assign these words the following vectors :
Consider each dimension of the vectors represents a different semantic aspect of the words. For example, one dimension might represent gender, and another dimension might represent social status, and so on.
In the next section, we will understand how to determine the similarity between the embeddings using similarity functions.
Let’s show how we can calculate a similarity score using similarity functions. We can use cosine similarity to give us a similarity score for the given words above.
What is Cosine Similarity?
Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space, which measures the cosine of the angle between them. In NLP, cosine similarity is often used to compare the similarity of documents.
Mathematically, cosine similarity between vectors A and B can be calculated using the dot product of the two vectors divided by the product of the Euclidean length (or magnitude) of the two vectors. The formula for cosine similarity is expressed as:
Let’s calculate the cosine similarity between “king” and “queen”
cosine_similarity([0.1, 0.2, 0.3, 0.4], [0.2, 0.3, 0.4, 0.5]) = 0.99
A cosine similarity of 0.99 indicates a high degree of similarity between the words “king” and “queen.”
Let’s calculate the cosine similarity between “king” and “man”
cosine_similarity([0.1, 0.2, 0.3, 0.4], [0.3, 0.4, 0.5, 0.6]) = 0.83
A cosine similarity of 0.83 indicates a moderate degree of similarity between the words “king” and “man.”
Now, mathematically we’ve been able to understand the meaning of the words and their relationships with each other using the degree of similarity.
Now that we understand word embeddings and the concept of similarity, we can talk about its application in retrieving information(search & question answering).
For simplicity, here is a basic example to understand the applications of word embeddings in finding information in documents;
Say I ask the question: “What is the capital of France?”
A question-answering system might first convert this question into vector representations (one for each word), and it might then aggregate the word embeddings to make a single vector representation of all the word embeddings in the question.
The vector representation of the question might look like this:
“What is the capital of France?”
The QA system would search the corpus of text for relevant information, such as the following passage:
“The capital of France is Paris, which is known for its iconic landmarks such as the Eiffel Tower and the Louvre Museum. Paris often referred to as the “City of Love,” is not only the capital of France but also one of the most popular tourist destinations in the world.”
Note that this corpus of text also goes through the same embedding process as the question above.
The system would then calculate the similarity between the question vector and the text passage, using a similarity measure such as cosine similarity that we described above:
similarity = cosine(question_vector, passage_vector)
If the similarity is high, the system would extract the answer from the passage, in this case, “Paris”, and return it as the answer to the question.
It is more common these days to use sentence embeddings in information retrieval, but for this article, we considered word embeddings in IR. (Watch out for our next article on Sentence Embeddings for IR).
In word embeddings, each word in a sentence has an embedding, and there is often an aggregation step that aggregates all the word embeddings into one sentence embedding. The technique of averaging word embeddings into one sentence embedding has been generally improved by using sentence embeddings for certain tasks.
A popular framework used for sentence embeddings is; the Sentence Transformer framework. These embeddings can be compared using cosine-similarity to find sentences with similar meanings. By calculating the similarity between a query and the sentences in a document, we can determine which documents are most relevant to the query and return them to the user.
The takeaway from this article is the understanding of word embeddings and their application in retrieving information.
To easily access information, we need all the data from multiple knowledge management systems(data sources) in one place; we call this “Step One” here at Mantium.
It is a problem for most organizations. Employees might have to search several databases and review several systems to find information.
You can read our article here, where we called out the challenges of finding information.
At Mantium, we have recognized these issues in making it easy for employees to find information quickly within their organization. We are working on a solution.
Mantium’s solution aims to streamline internal communication and decision-making by consolidating data into a single location, eliminating employees needing to navigate multiple systems to access information. This can lead to increased productivity and the ability for businesses to leverage their internal knowledge fully.
Most recent posts