Securing Vector Databases: Introducing Chirps and Knowledge Vaults – Mantium

Securing Vector Databases: Introducing Chirps and Knowledge Vaults

By Ryan Sevey

July 27, 2023   ·   8 min read


As we continue to navigate the vast landscape of technological evolution, one particular development is making a significant impact on how we store, analyze, and interpret high-dimensional data – Vector databases (VectorDBs). These databases, designed to manage vector data efficiently, offer a unique approach to handling complex structures like audio, text, and image data, revolutionizing various industry applications.

Vector databases have found their niche in a multitude of fields, ranging from recommendation engines, artificial intelligence (AI), machine learning (ML) platforms to bioinformatics, and autonomous vehicles. Their power to process and retrieve high-dimensional vector data at rapid speeds, coupled with high precision, is the primary reason behind their increasing adoption.

Prominent players in the field of vector databases such as Redis Vector Similarity Search (VSS), Pinecone, Milvus, and Qdrant are setting new benchmarks in the industry. These platforms offer enhanced capabilities for handling vector data, providing developers and data scientists the tools they need to generate more sophisticated insights from complex, multidimensional data.

The primary catalyst for vector database adoption has been Retrieval-Augmented Generation (RAG). RAG is a method used in natural language processing that allows users to bring in their own information and have the AI system use that information to generate additional details. It is a game-changer, enabling AI systems to deliver more accurate and contextual answers based on user queries. This technology heavily relies on the capabilities of vector databases to handle and retrieve the high-dimensional data needed for the augmentation process.

However, while the growth and importance of vector databases are undeniable, they do come with their fair share of information security challenges. As these databases become increasingly ingrained in our systems, understanding and tackling these challenges will be crucial to ensuring the secure usage of VectorDBs. As we delve deeper into this topic in the forthcoming sections, we will explore these challenges and potential solutions in greater detail.

Current State

As we turn our attention to the current state of affairs, Langchain, a language model technology, stands out. Using Langchain, an individual can effectively load data from various sources, including Software-as-a-Service (SaaS) applications like Notion. Notion, with its rich offering of capabilities, presents a wealth of data that can be fed into the vector database, processed, and then utilized by the language model to generate more contextual and precise responses.

To put this into perspective, let’s consider a scenario where a user wants to load Notion’s data into a vector database. They would import their files, notes, tasks, and other multidimensional data into the vector database using Langchain. The imported data, now converted into high-dimensional vectors, is stored, ready for retrieval whenever the AI system requires it for generating additional details.

The Problem

While this seems efficient and straightforward, a rather concerning question arises: what exactly does the user have stored in their vector database?

  1. Do you know everything you have access to? – Not all users are fully aware of the extent of the data they have access to within their SaaS applications. In our scenario, the user might unknowingly import sensitive or private information into the vector database.
  2. Are you sure everything now in the VectorDB is okay to share? – The import process might involve data that the user is not supposed to share or expose due to privacy or security constraints. Yet, this data now resides within the vector database, potentially available to any system or person with access to the database.
  3. Are you sure you know what’s now in your VectorDB? – Given the high-dimensional nature of vector data and the significant volumes of data we deal with today, it’s challenging for users to keep track of all the data stored in the VectorDB. This obscurity can lead to security breaches, loss of privacy, and potential misuse of data.

These challenges pose serious information security concerns that demand immediate attention and solutions. In the next sections, we’ll look at potential strategies for addressing these concerns, ensuring the secure and responsible use of vector databases.

Chirps – The Hero

Enter Chirps, an innovative solution to the information security concerns around vector databases. This open-source project is designed to help users maintain transparency and control over their data stored in vector databases.

Chirps offers a simple but effective way to deal with the problem of unintended and unnoticed sensitive data inclusion in vector databases. It provides an interface for users to select their vector database and thoroughly scan it for any sensitive or private data.

The primary function of Chirps is to identify and flag potential information security risks within vector databases. It works by cross-referencing the data in the vector database with a set of predefined sensitive data patterns. This includes personally identifiable information, confidential company data, secure access tokens, and more.

By enabling users to actively scan their vector databases for sensitive data, Chirps brings a new level of transparency and control over the data stored in vector databases. It also helps users to maintain their compliance with data protection laws and regulations.

With Chirps, users now have a powerful tool to ensure they’re not unwittingly sharing sensitive data, giving them the confidence to leverage the power of vector databases without compromising their information security. In the next sections, we will explore the specifics of how Chirps works and how to effectively use it for your data security needs.


Discovering sensitive data in your vector database with Chirps is just the first step. The real question is – what do you do next?

While uncovering sensitive data is cause for concern, it’s essential to note that such information isn’t necessarily misplaced. In many professions, access to sensitive or confidential information is a necessary part of the job. For example, an insurance underwriter may need access to certain health data to accurately assess risk and issue life insurance policies.

The challenge is to protect sensitive information without hindering necessary access to it. This is where Mantium’s Knowledge Vaults come into play.

Knowledge Vaults is a solution that provides fine-grained access control over vector databases. It enables organizations to regulate access to both entire indexes and specific pieces of information within an index. This way, it ensures that sensitive information in vector databases is only accessible to authorized individuals.

Consider a scenario where a vector database holds stock information from Carta. The CEO, who needs an overview of the company’s stocks, should have access to all stock information. However, an employee might only need to access information specific to their shares. Knowledge Vaults can manage this, allowing for differential access levels based on roles and responsibilities.

Therefore, when Chirps identifies sensitive data, Knowledge Vaults offers a way to manage this data effectively, balancing the need for access with the need for security. This approach ensures that sensitive information is handled responsibly, paving the way for the secure use of vector databases. 


As we navigate the complexities of vector databases and the inherent information security challenges, it’s clear that responsible management of sensitive data is not merely optional but an absolute necessity. The increasing adoption of VectorDBs, fuelled by the powerful capabilities of Retrieval-Augmented Generation, calls for robust solutions like Chirps and Knowledge Vaults to address the pressing security concerns.

Chirps serves as a vigilant guard, scanning vector databases to identify any sensitive data, ensuring users maintain transparency and control over their stored data. On the other hand, Knowledge Vaults takes the baton from Chirps, providing a solution to manage the flagged sensitive data effectively, striking a balance between necessary access and security.

To learn more about Knowledge Vaults and how it can bolster your vector database security strategy, we encourage you to visit MantiumAI’s website at Explore how Knowledge Vaults can be tailored to fit your specific needs, ensuring that your vector databases remain a secure asset in your data management arsenal.

For those interested in exploring Chirps and possibly contributing to its open-source project, head over to their Github page at Participate in the ongoing development and enhancement of this crucial tool for managing the information security of vector databases.

In conclusion, while the road to secure vector database usage might seem challenging, tools like Chirps and Knowledge Vaults have paved the way towards a more secure future. Let’s continue to innovate responsibly, ensuring the benefits of these powerful technologies are harnessed without compromising information security.


Ryan Sevey
CEO & Founder Mantium

Enjoy what you're reading?

Subscribe to our blog to keep up on the latest news, releases, thought leadership, and more.