What is cosine similarity and how is it used in RAG retrieval?
Cosine similarity is a metric used to measure how similar two non-zero vectors are. In the context of Retrieval-Augmented Generation (RAG), it plays a crucial role in the retrieval phase by helping to identify and rank document chunks that are most semantically similar to a given user query.
What is Cosine Similarity?
Cosine similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. It determines similarity by comparing the orientation of the vectors, not their magnitude. The output value ranges from -1 to 1, where:
- 1 indicates that the vectors are identical in direction (maximum similarity).
- 0 indicates that the vectors are orthogonal, meaning there is no directional relationship between them (no similarity).
- -1 indicates that the vectors are diametrically opposed in direction (maximum dissimilarity).
Mathematically, it is calculated using the dot product of the vectors divided by the product of their magnitudes. This normalization makes it robust to vector length variations, focusing solely on the angle.
How is Cosine Similarity Used in RAG Retrieval?
In a RAG system, the goal of the retrieval component is to find relevant pieces of information from a knowledge base that can help an LLM answer a user's query. Cosine similarity is widely used for this purpose, typically with vector embeddings.
Steps in RAG Retrieval with Cosine Similarity:
- Embedding Generation: Both the user's query and all document chunks (from the knowledge base) are converted into dense numerical vectors, called embeddings. This is typically done using pre-trained deep learning models (e.g., BERT, Sentence-BERT, OpenAI Embeddings). These embeddings capture the semantic meaning of the text.
- Similarity Calculation: When a user submits a query, its embedding is compared against the embeddings of all pre-indexed document chunks. Cosine similarity is computed for each query-chunk pair.
- Ranking and Selection: The document chunks are then ranked based on their cosine similarity scores to the query. Chunks with higher scores are considered more semantically relevant. A predefined number of top-ranking chunks are selected and passed to the Language Model (LLM) as context.
- Contextual Augmentation: The selected relevant chunks, along with the original user query, form the augmented prompt that is fed to the LLM. This provides the LLM with up-to-date, factual, and specific information to generate a more accurate and grounded response.
By using cosine similarity on embeddings, RAG systems can effectively identify document segments that are semantically related to a query, even if they don't share exact keywords. This enables more intelligent and context-aware retrieval than traditional keyword-based search.
Conceptual Example (Python)
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Example embeddings (simplified)
query_embedding = np.array([[0.8, 0.2, 0.1, 0.9]])
doc_chunk_1_embedding = np.array([[0.7, 0.3, 0.15, 0.85]]) # Very similar
doc_chunk_2_embedding = np.array([[-0.1, 0.9, 0.2, -0.7]]) # Dissimilar
doc_chunk_3_embedding = np.array([[0.9, 0.1, 0.05, 0.92]]) # Highly similar
# Calculate cosine similarity
similarity_1 = cosine_similarity(query_embedding, doc_chunk_1_embedding)[0][0]
similarity_2 = cosine_similarity(query_embedding, doc_chunk_2_embedding)[0][0]
similarity_3 = cosine_similarity(query_embedding, doc_chunk_3_embedding)[0][0]
print(f"Similarity (Query vs Chunk 1): {similarity_1:.4f}")
print(f"Similarity (Query vs Chunk 2): {similarity_2:.4f}")
print(f"Similarity (Query vs Chunk 3): {similarity_3:.4f}")
# In a RAG system, chunks would be sorted by similarity
# and top N taken.