AI-Powered Internal Search: Replacing Legacy Intranets with Intelligent Retrieval

AI-Powered Internal Search: Replacing Legacy Intranets with Intelligent Retrieval

The Search Problem Nobody Talks About

Your organisation spent six figures on an intranet. It launched with fanfare, a training session, and a SharePoint site nobody bookmarked. Three years later, your staff are still emailing "does anyone know where the procurement policy lives?" to a distribution list of 400 people.

This is not a people problem. It is a retrieval problem.

Legacy intranets were built around folder structures and keyword matching - tools designed for a world where documents were few and staff had time to browse. Neither condition applies today. The average knowledge worker spends roughly 20% of their week searching for information they cannot find, according to McKinsey research. For a team of 50 people, that is the equivalent of 10 full-time employees doing nothing but hunting through SharePoint.

AI-powered internal search changes this equation fundamentally. Not by adding a better search bar, but by changing what "search" means inside an organisation.


What RAG Actually Is (Without the Jargon)

Retrieval-Augmented Generation - RAG - is the technical approach behind most serious AI-powered internal search deployments. The name sounds complex, but the mechanic is straightforward.

When a staff member asks a question, the system does two things in sequence:

  1. Retrieves the most relevant chunks of content from your document corpus - policies, procedures, past proposals, meeting notes, product specs
  2. Generates a direct, coherent answer using a large language model (LLM), grounded in what it just retrieved

The critical difference from a standard LLM like ChatGPT is that the model is not relying on training data. It is reading your documents at query time and synthesising an answer from them. This means the system can answer questions about your specific context - your contracts, your internal processes, your client history - without any of that information needing to be baked into the model itself.

A concrete example: a project manager at an engineering firm asks "what are our standard liability clauses for infrastructure contracts above $2M?" A keyword search returns 47 documents. A RAG system reads the relevant contract templates and procurement policy, then responds: "For infrastructure contracts above $2M, your standard terms require a minimum 10-year liability period and professional indemnity cover of at least $5M. See sections 4.2 and 7.1 of the Master Services Agreement template." It also cites the source documents so the user can verify.

That is the difference between retrieval and intelligent retrieval.


Why Legacy Intranets Fail at This

Traditional intranet search relies on keyword indexing. The system looks for documents containing the words you typed. This works adequately when you know exactly what you are looking for and how it is labelled. It fails in three common situations:

  • Conceptual queries - "what is our approach to subcontractor risk?" does not match a document titled "Third Party Vendor Management Policy v3_FINAL_revised.docx"
  • Cross-document synthesis - answering a question that requires reading three different policy documents and combining the relevant parts
  • Natural language questions - staff ask questions the way they think, not the way documents are titled

Beyond search mechanics, legacy intranets suffer from structural problems that compound over time. Content governance decays. Duplicate documents accumulate. Version control breaks down. A keyword search that surfaces the wrong version of a policy is worse than no search at all.

AI-powered internal search does not fix governance problems automatically - stale data produces stale answers - but it is significantly more tolerant of messy naming conventions and inconsistent metadata because it reads content semantically, not literally.


Building a RAG System for Internal Use: The Practical Components

A production-ready internal RAG deployment involves several components working together. Understanding what each does helps you evaluate vendor claims and make sensible build-versus-buy decisions.

Document Ingestion and Chunking

Your documents need to be converted into a format the system can search efficiently. This involves breaking documents into chunks - typically 200 to 500 tokens each - with enough overlap that context is not lost at boundaries. PDF extraction is often the messiest part of this process, particularly for scanned documents or files with complex tables.

Vector Embeddings and a Vector Database

Each chunk is converted into a numerical representation (an embedding) that captures its semantic meaning. Similar concepts end up close together in this mathematical space, even if they use different words. These embeddings are stored in a vector database - common options include Pinecone, Weaviate, pgvector (PostgreSQL extension), and Azure AI Search.

When a query comes in, it is also converted to an embedding, and the system retrieves the chunks that are mathematically closest to the query.

The Generation Layer

Retrieved chunks are passed to an LLM along with the original query. The model synthesises a response using only what it has been given. The prompt engineering here matters considerably - it determines whether the model stays grounded in the retrieved content or starts hallucinating.

Access Control Integration

This is non-negotiable for enterprise deployments. Your RAG system must respect existing document permissions. A junior staff member should not be able to query their way into executive remuneration data or confidential client files. This typically means integrating with your existing identity provider (Azure AD, Okta) and filtering retrieved chunks based on the authenticated user's permissions before they reach the LLM.


A Real Deployment Pattern: Professional Services Firm

To make this concrete, consider a mid-sized Australian professional services firm with around 200 staff across three offices. Their knowledge base includes proposal templates, engagement letters, methodology documents, regulatory guidance notes, and roughly eight years of project documentation - approximately 40,000 files across SharePoint and a legacy document management system.

Their specific problems were:

  • New staff taking 3-6 months to become independently productive because they could not find relevant precedents
  • Senior staff spending 2-3 hours per week answering questions that were documented somewhere
  • Inconsistent client deliverables because teams were not finding and reusing existing approved content

The RAG implementation they deployed had the following architecture:

  • Ingestion pipeline built in Python using LangChain, running nightly to pick up new and modified documents
  • Chunking strategy tuned to 400 tokens with 50-token overlap, with document metadata (author, date, document type, client sector) attached to each chunk
  • Vector store using Azure AI Search, chosen because it integrated cleanly with their existing Microsoft infrastructure and handled permission inheritance from SharePoint
  • LLM layer using GPT-4o via Azure OpenAI Service, keeping data within Australian data residency boundaries
  • Front end as a Teams bot, so staff could query without leaving their existing workflow

After three months in production, onboarding time for new staff dropped noticeably, and senior staff reported a significant reduction in interruptions for knowledge queries. The system was not perfect - it struggled with highly formatted Excel-based templates and occasionally retrieved outdated project notes before the governance cleanup was completed - but the productivity gains were clear and measurable within the first quarter.


What Good Evaluation Looks Like

Before deploying, and continuously after, you need a way to measure whether the system is actually working. "It feels better" is not a metric.

Useful evaluation approaches for AI-powered internal search include:

  • Retrieval precision - of the chunks retrieved for a test query, what proportion were actually relevant? You need a human-annotated test set to measure this.
  • Answer faithfulness - does the generated answer accurately reflect what is in the retrieved documents, or is the model adding information not present in the source? Tools like RAGAS can automate parts of this evaluation.
  • Answer relevance - does the response actually address what the user asked?
  • Citation accuracy - are the source documents cited correctly and do they genuinely support the answer?

Build a test set of 50-100 representative queries before you go live. Include edge cases: questions with no good answer in the corpus (the system should say so, not hallucinate), questions that span multiple documents, and questions where the correct answer changed over time and old versions exist.

Re-run this evaluation after any significant change to the system - new document batches, prompt changes, model upgrades.


Common Failure Modes to Avoid

Most RAG deployments that underperform do so for predictable reasons:

  • Garbage in, garbage out - if your document corpus is full of duplicates, outdated content, and poorly structured files, the system will retrieve and synthesise from bad material. A content audit before ingestion is not optional.
  • Chunk size mismatches - chunks that are too small lose context; chunks that are too large dilute relevance. Tune this to your document types.
  • No hybrid search - pure vector search misses exact matches (product codes, names, specific regulatory references). Combining vector search with traditional keyword search (BM25) typically outperforms either alone.
  • Ignoring latency - staff will abandon a tool that takes 8 seconds to respond. Optimise your retrieval pipeline and consider caching for common queries.
  • No feedback loop - without a mechanism for users to flag bad answers, you cannot improve the system over time.

What to Do Next

If your organisation is spending meaningful time hunting for information that should be findable in seconds, the path forward is practical rather than complicated.

Start with a scoped audit. Pick one high-value document corpus - your HR policies, your project templates, your technical standards library - and assess its current state. How many documents? How much duplication? How current is the content? This audit tells you both the effort required and the likely return.

Run a proof of concept before committing to production architecture. A RAG prototype over a single SharePoint library can be built in a few days using tools like LangChain or LlamaIndex. It will not be production-ready, but it will tell you whether the retrieval quality is sufficient to justify the investment.

Define your evaluation criteria upfront. Decide what "good" looks like before you build, not after. Time-to-answer, retrieval accuracy, and user adoption are all measurable.

Consider data residency from the start. For Australian organisations handling sensitive data, ensure your chosen LLM provider offers Australian or at minimum Asia-Pacific data residency. Azure OpenAI Service and AWS Bedrock both offer relevant regional options.

If you want to talk through what an AI-powered internal search deployment would look like for your specific environment, the team at Exponential Tech works with Australian organisations on exactly this kind of implementation. We focus on solutions that are operationally sound, not just technically interesting.

Related Service

RAG & Knowledge Systems

Intelligent search and retrieval powered by your own data.

Learn More
Stay informed

Get AI insights delivered

Practical AI implementation tips for IT leaders — no hype, just what works.

Keep reading

Related articles

Ask about our services
Hi! I'm the Exponential Tech assistant. Ask me anything about our AI services — I'm here to help.