You've built a search feature for your internal knowledge base. Users type in questions, the system scans document titles and keywords, and returns results that are technically correct but practically useless. Someone asks "what's our policy on remote work flexibility?" and gets back a document titled "HR Policy 2023 - Section 4B" rather than the paragraph that actually answers the question. The problem isn't your search logic. The problem is that SQL and traditional keyword search were never designed to understand meaning - only to match text.
This is the gap that vector databases fill, and it's a gap that matters enormously once you start building AI applications at any serious scale.
What a Vector Database Actually Does
A vector database stores data as high-dimensional numerical arrays called embeddings. When you pass a piece of text - or an image, audio clip, or any other content - through an embedding model, that model converts it into a vector: a list of numbers, typically with hundreds or thousands of dimensions, that represents the semantic meaning of that content.
Two sentences that mean the same thing but use different words will produce vectors that sit close together in this high-dimensional space. Two sentences that look similar on the surface but mean different things will sit further apart. This is fundamentally different from keyword matching, where "remote work flexibility" and "working from home arrangements" would be treated as completely unrelated queries.
When you run a similarity search, the database calculates the distance between your query vector and every stored vector, then returns the closest matches. The most common distance metric is cosine similarity, though Euclidean distance and dot product are also used depending on the embedding model and use case.
A practical example: an embedding for "car" might look like this in simplified form:
[0.82, -0.14, 0.67, 0.31, ...] # 1536 dimensions in OpenAI's ada-002
The actual numbers aren't interpretable by humans - they encode relationships learned during model training. What matters is that "car" and "vehicle" will have vectors that are geometrically close, while "car" and "carrot" will not, despite sharing three letters.
Why Traditional Databases Fall Short for AI Workloads
SQL databases are excellent at what they were designed to do: storing structured data, enforcing relationships, and running exact-match or range queries. If you want every customer whose account was created after 1 January 2024, SQL is the right tool. If you want to find documents that are conceptually similar to a user's query, SQL becomes awkward at best and unusable at worst.
You can store vectors as JSON arrays or binary columns in PostgreSQL, but running similarity searches across millions of rows that way means calculating distances row by row. At small scale, this works. At production scale with real-time latency requirements, it falls apart quickly.
Vector databases solve this with specialised indexing algorithms - most commonly HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) - that allow approximate nearest-neighbour searches across billions of vectors in milliseconds. These indexes trade a small amount of accuracy (you might miss 1-2% of the truly closest vectors) for enormous gains in query speed.
This is the technical foundation that makes retrieval-augmented generation (RAG) practical. When your language model needs to answer a question using your organisation's internal documents, it can't load all those documents into its context window. Instead, you embed the documents, store them in a vector database, embed the user's query, and retrieve only the most relevant chunks - typically the top 5-20 - to include in the prompt.
Choosing the Right Vector Database for Your Use Case
The vector database market has expanded rapidly, and the options vary significantly in their architecture, operational model, and suitability for different workloads.
Pinecone is a fully managed cloud service that's easy to get started with and handles infrastructure entirely on your behalf. It's a reasonable choice for teams that want to move quickly and don't have strong opinions about where their data lives. The trade-off is cost at scale and less control over the underlying infrastructure.
Weaviate is open-source and can be self-hosted or used as a managed service. It supports hybrid search - combining vector similarity with traditional keyword filtering - which is useful when you need to restrict results by metadata (for example, only searching documents from a specific department or date range).
Qdrant is another open-source option with a strong focus on performance and filtering capabilities. It's written in Rust, which gives it a lean resource footprint, and it handles complex payload filtering efficiently.
pgvector is a PostgreSQL extension that adds vector storage and similarity search to an existing Postgres database. If your organisation already runs Postgres and your vector workload is moderate - say, under 10 million vectors with non-critical latency requirements - pgvector lets you avoid introducing a new system entirely. It's worth considering before committing to a dedicated vector database.
Chroma is lightweight and designed primarily for development and smaller production workloads. It's a common starting point for prototyping RAG applications.
The right choice depends on your data volume, latency requirements, existing infrastructure, data residency obligations, and whether you need hybrid search. Australian organisations with strict data sovereignty requirements should pay particular attention to where managed services physically host data.
A Practical Scenario: Internal Knowledge Search at a Professional Services Firm
Consider a mid-sized accounting firm with 12 years of client advisory documents, policy manuals, and technical guidance notes stored across SharePoint and a shared drive. Staff regularly spend 20-30 minutes hunting for precedents or checking current guidance before client calls.
The firm implements a RAG-based search tool using the following architecture:
- Document ingestion - PDFs and Word documents are extracted and split into overlapping chunks of roughly 500 tokens each
- Embedding - each chunk is passed through an embedding model (in this case, OpenAI's
text-embedding-3-small) to produce a 1536-dimension vector - Storage - vectors and their associated metadata (document name, date, practice area, author) are stored in a Weaviate instance running on the firm's Azure tenancy
- Query - when a staff member asks a question, the query is embedded using the same model, and the top 8 most similar chunks are retrieved
- Generation - those chunks are passed to a language model with the user's question, and the model synthesises an answer with source citations
The result is that staff can ask "what's the current treatment for trust distributions to adult children under the new rules?" and get a synthesised answer pointing to the specific guidance notes that contain the relevant information - rather than a list of documents to manually search.
The vector database in this scenario isn't doing anything exotic. It's doing one job well: finding semantically relevant content quickly, so the language model has something useful to work with.
Operational Considerations You Shouldn't Ignore
Running vector databases in production introduces some operational realities that are worth planning for before you're dealing with them under pressure.
Embedding consistency matters more than most people expect. If you embed your documents with one model and then switch models, your stored vectors become incompatible with new queries. Migrating means re-embedding your entire corpus. Choose your embedding model carefully and treat it as a dependency you'll live with for a while.
Index tuning affects both speed and recall. HNSW indexes have parameters (ef_construction, M) that control the trade-off between build time, memory usage, and query accuracy. The defaults are reasonable starting points, but production workloads often benefit from tuning these based on your specific data distribution and latency targets.
Chunking strategy significantly affects retrieval quality. Splitting documents into chunks that are too small loses context; chunks that are too large dilute relevance signals. Overlapping chunks (where each chunk shares some content with the previous one) helps prevent answers from falling across chunk boundaries. There's no universal right answer here - it requires experimentation with your specific content.
Metadata filtering is often essential. Pure vector similarity search returns the most semantically similar results globally. In practice, you usually need to combine this with metadata filters - restricting results to documents from a certain time period, business unit, or security classification. Make sure your chosen database handles filtered vector search efficiently, not by filtering after retrieval.
Monitor for embedding drift. As your documents and queries evolve, the distribution of your data changes. Periodically review whether your retrieval quality is holding up, particularly if you've added significant new content or your users' query patterns have shifted.
What to Do Next
If you're building an AI application that involves retrieving information - whether that's a chatbot, a document search tool, a recommendation system, or something more specialised - start by being honest about whether your current data infrastructure can support semantic search. If you're relying on keyword matching or SQL full-text search, you're likely leaving significant quality improvements on the table.
A practical starting point:
- Prototype with pgvector if you already run Postgres and your dataset is under a few million vectors. It removes one system from your stack while you validate the approach.
- Evaluate Weaviate or Qdrant if you need hybrid search, complex filtering, or anticipate significant scale.
- Audit your chunking and embedding strategy before optimising anything else - retrieval quality problems are usually upstream of the database itself.
- Test retrieval quality explicitly using a set of representative queries with known correct answers before moving to production. Don't rely on subjective impressions.
Vector databases aren't a silver bullet, and they introduce real operational complexity. But for AI applications that need to work with unstructured content at scale, they're not optional infrastructure - they're the foundation the rest of the system depends on. Getting this layer right early is substantially cheaper than retrofitting it later.