AI Site Navigation Bot

Challenge

Build an AI system that knows the website and company inside and out, and can answer questions or navigate users anywhere on the site.

Framing the Unknowns

At the outset, this problem is underspecified by design.

Key unknowns include:

What users actually mean when they ask questions.
Which sources are authoritative versus incidental.
How much trust users will place in the system.
Where failures are acceptable versus where escalation is required.

The initial task is to reduce uncertainty before building an AI feature, validating assumptions early. The system must surface gaps, fail safely, and improve iteratively.

1. Objective and Success Definition

Objective

Provide users with a trustworthy interface that can:

Answer questions about the company and website.
Guide users to the correct page or resource.
Reduce friction in discovery and support.

This is a retrieval and navigation system with an AI interface focused on the site, owned and governed like any other critical piece of product infrastructure. The system’s primary responsibility is correctness and safe guidance, with conversational breadth secondary.

Success Criteria

Users reach the correct destination or answer within 1-2 interactions.
Responses are grounded in authoritative sources.
The system clearly signals uncertainty and limitations.
Measurable reduction in bounce rate or support inquiries.
Incorrect answers are surfaced and corrected through a defined, owned review process.

2. Primary Risks and Tradeoffs (Before Implementation)

Technical Risks

Hallucinated authority: Users may trust incorrect answers if uncertainty is not explicit.
Stale or fragmented knowledge: Websites change faster than AI systems are often updated.
Overconfidence in ambiguous queries: Routing users incorrectly is worse than asking a clarifying question.

Organizational Risks

Misaligned scope creep: A navigation assistant can quietly become an unsupported support channel.

Key Tradeoffs

Accuracy vs. Coverage: Prefer fewer, correct answers over broad but unreliable responses.
Latency vs. Transparency: Citations and explanations increase response time but build trust.
AI Capability vs. Determinism: Use AI for interpretation and synthesis while authoritative sources supply truth.

These risks and tradeoffs drive the feature sequencing in the intended order.

3. Sprint-Oriented Phased Implementation Plan

Each phase intentionally exposes new unknowns and prepares the system for the next phase. Every phase stands on its own, with later phases earned by evidence.

Phase 1. Establish Ground Truth and Failure Safety

Purpose Before navigation or intelligence, the system must know what it is allowed to know.

Capabilities

Index all public website pages.
Store canonical URLs, update timestamps, and content ownership.
Retrieval-augmented answers with mandatory citations.
Explicit uncertainty statements and fallback responses.
When the bot lacks sufficient information to answer, generate a short, copyable handoff note for the contact form that includes the user’s question, any clarifications asked, and which sources were checked so support receives full context.

Stack and packages (Phase 1)

Next.js App Router UI with streaming.
Postgres + pgvector (or SQLite + vector extension) accessed via a thin SQL client (pg or equivalent).
OpenAI SDK (embeddings + GPT-4.1/4o) or local Ollama equivalent.
Sitemap-driven crawler with node-fetch and cheerio for HTML extraction; cron/scheduled job for recrawls.
Lightweight logging/metrics via Postgres tables and structured logs (e.g., pino).

What This Unlocks

Confidence that answers are grounded.
Visibility into content gaps and ambiguity.
Early signal on where users struggle.
Repeated unanswered, low-confidence, or escalated questions are logged as signals for future content improvement.

Risks Addressed

Hallucinations.
False authority.
Silent failures.

Phase 2. Introduce Intent Awareness and Basic Guidance

Purpose Once answers are reliable, begin guiding users without overcommitting.

Capabilities

Intent classification is advisory; access stays open and content remains visible.
High-level intent classification like “learn,” “compare,” “fix,” “contact.”
Ranked page suggestions instead of single answers.
Clarifying questions when confidence is low.

Stack and packages (Phase 2)

Lightweight intent classifier prompt (same LLM) or small hosted model; store intent labels in Postgres.
Navigation graph tables in Postgres (nav_nodes, nav_edges) linked to pages.
Ranking logic using pgvector scores plus navigation graph boosts (SQL-side).

What This Unlocks

Navigation without hard routing.
Insight into user goals rather than just questions.
Safer exploration of ambiguity.

Risks Introduced

Misclassification.
User confusion if suggestions feel arbitrary.

Mitigations

Present multiple options.
Defer to user choice when uncertain.

Phase 3. Trust, Explainability, and UX Hardening

Purpose Make the system safe to rely on repeatedly.

Capabilities

Explanations for why pages or answers were suggested.
Visible source links and last-updated indicators.
Clear escalation paths to human support or static navigation.

Auto-Surfaced FAQ Candidates (Human-Gated)

Purpose

Convert repeated user questions and escalations into durable, reviewed knowledge.
Improve site clarity without expanding AI authority.

Capabilities

Detect recurring unanswered, low-confidence, or escalated questions.
Cluster similar questions by intent and topic.
Generate draft FAQ candidates including:
- Proposed question wording.
- Sources consulted.
- Identified content gaps.
- Confidence or recurrence signal.
Queue all draft FAQs for human review and approval.

Constraints

Publishing occurs only after explicit approval.
AI-generated content becomes user-visible only after human sign-off.
Approved FAQs become authoritative sources for future retrieval.

What This Unlocks

Faster feedback loops between users and content owners.
Reduction of repeated failure modes.
Clear prioritization signals for documentation and product teams.

Risks

Overproduction of low-value FAQs.
Review burden on content teams.

Mitigations

Minimum repetition and impact thresholds before surfacing.
Explicit ownership assignment before review.
Prioritize by user impact over volume.

Stack and packages (Phase 3)

UI components in Next.js for explanations, freshness badges (page timestamps), and confidence visualization.
Retrieval metadata surfaced from Postgres/pgvector queries (scores, sources, timestamps).
Optional feature flagging for gradual rollout (e.g., simple Postgres-backed flags).

Organizational Impact

User trust calibration.
Reduced support burden.
Organizational confidence in the system.

Tradeoffs

Slower responses.
Less perceived “magic.”

Phase 4. Operationalization and Organizational Integration

Purpose Turn the system into maintained infrastructure rather than a novelty.

Capabilities

Content ownership signals per indexed domain.
Alerts when answers degrade or sources change.
Feedback loops to content and product teams.
Review queues and ownership workflows for surfaced FAQ candidates.

Stack and packages (Phase 4)

Ownership metadata tables in Postgres with linked owners/teams.
Alerting via scheduled jobs comparing current retrieval quality to baselines; notify through email/webhook.
Feedback ingestion endpoint writing to Postgres for content/product triage.
Optional observability stack: OpenTelemetry traces + export to existing APM; structured logs with pino.

What This Unlocks

Sustainable accuracy.
Shared accountability.
Alignment with business metrics.

Risks

Ownership ambiguity.
System abandonment.

Mitigations

Explicit owners.
KPIs tied to real outcomes.

4. Technology Stack

Below is a tech stack chosen to match the risks, tradeoffs, and phased rollout above. Each choice reduces a specific unknown and is intentionally selected for the navigation problem.

Stack at a glance

UI: Next.js App Router with streaming.
Data: Postgres + pgvector (SQLite early) for pages, chunks, ownership, metrics.
Retrieval: Sitemap crawler, RAG with citations, hard source allowlists.
Intent/Navigation: Lightweight classifier, navigation graph, ranked suggestions.
Observability: Postgres-backed metrics/logs, optional OTel export.

Why This Stack

Minimizes irreversible decisions.
Makes uncertainty visible.
Supports gradual capability growth.
Stands up to executive scrutiny while solving the specific navigation problem with focused intent.

1) Design Principles Driving the Stack

Deterministic over clever.
Grounding over generation.
Replaceable components over vendor lock-in.
Observable failure over silent success.

2) High-Level Architecture


User
 ↓
Web UI (Chat + Navigation)
 ↓
AI Interface Layer
 ↓
Retrieval Orchestrator
 ↓
Authoritative Knowledge Sources

Each layer is independently swappable.

3) Frontend

UI Framework: Next.js (App Router) for existing hosting, Server Components, streaming support, and MDX-friendly docs integration.
Interaction Layer: Chat-style UI with explicit system states (Answering, Clarifying, Uncertain, Handoff ready). Excludes typing indicators and personality fluff.

4) Content Ingestion and Ground Truth

Crawling: Sitemap-driven crawler; fetch HTML to extract main content, canonical URL, headings, last modified, and content owner.
Storage: Postgres (or SQLite early) with tables for pages, content_chunks, ownership, index_runs to stay human-readable, queryable, and auditable without AI.

5) Knowledge Store and Retrieval

Embeddings: OpenAI text-embedding-3-large or a local equivalent; stable, strong recall, replaceable.
Vector Store: Postgres + pgvector (or SQLite + vector extension) to keep one database with transactional updates and simple metadata joins.
Retrieval Strategy: Hybrid keyword filters (URL, section, freshness), semantic similarity, and a hard source allowlist, using constrained RAG.

6) AI Interface Layer

LLM: Start with GPT-4.1/GPT-4o or a local Ollama model when privacy is critical. Use for query interpretation, answer synthesis, and clarifying question generation. Authoritative sources define truth; the LLM supports interpretation and synthesis.
Prompt Discipline: System prompt enforces citations, uncertainty signaling, and grounded, non-speculative responses. Only responses with sources are accepted.

Intent Classification: Small classifier prompt or lightweight fine-tune with intents: Learn, Compare, Fix, Contact, Unknown. Adjusts confidence and presentation, not hard routing.
Navigation Graph: Explicit parent/child pages, priority paths, and deprecated flags. AI suggests; users choose.

8) Failure Handling and Handoff

Explicit Uncertainty as a Feature: When confidence is low, generate a copyable, structured handoff summary (user question, clarifications asked, sources checked, gaps identified) so support does not restart the conversation.

9) Observability and Trust

Metrics: Retrieval confidence, citation coverage, clarification frequency, navigation success rate, handoff rate.
Logging: Store user intent, retrieved sources, and final answer confidence while excluding raw chat logs by default.
Review Loops: Periodic review of incorrect or escalated answers by a human owner.
Feedback Integration: Reviews feed back into indexing rules, prompts, or content ownership.

10) Phase-Aligned Stack Justification

Phase 1: Postgres + pgvector, sitemap crawler, RAG with citations (minimal, safe, auditable).
Phase 2: Add intent classifier, navigation graph, and ranked suggestions (still deterministic).
Phase 3: Explanation UI, source freshness indicators, and confidence visualization (trust over speed).
Phase 4: Ownership metadata, alerting, and feedback ingestion (infrastructure maturity).

11) Explicit Non-Choices

Prefer deterministic, focused components; defer autonomous agents, early fine-tuning, multi-vector stores, and knowledge graphs until failure modes are understood. Use purpose-built tooling rather than generic chatbot platforms.

5. How I Would Classify This Challenge

This challenge evaluates:

Comfort operating under uncertainty.
Ability to sequence work based on risk, not excitement.
Systems thinking across UX, data, and organizational boundaries.
Discipline around trust, failure modes, and long-term ownership.

Classification and ownership

Classification: Staff+/Principal-level challenge (or a seasoned Tech Lead) because it spans product, data, infra, and org process with explicit failure-mode management.
Why: Success depends on risk sequencing, governance, and operational durability. It is more than building a chatbot.
Lowest level to own without burdening leadership: a strong Senior/Staff engineer or tech lead who can drive cross-functional alignment, with leadership setting guardrails but not micromanaging execution.

This system is designed to resist pressure to over-promise or silently degrade.

AI Site Navigation Bot

Challenge

Framing the Unknowns

1. Objective and Success Definition

Objective

Success Criteria

2. Primary Risks and Tradeoffs (Before Implementation)

Technical Risks

Organizational Risks

Key Tradeoffs

3. Sprint-Oriented Phased Implementation Plan

Phase 1. Establish Ground Truth and Failure Safety

Phase 2. Introduce Intent Awareness and Basic Guidance

Phase 3. Trust, Explainability, and UX Hardening

Phase 4. Operationalization and Organizational Integration

4. Technology Stack

Why This Stack

1) Design Principles Driving the Stack

2) High-Level Architecture

3) Frontend

4) Content Ingestion and Ground Truth

5) Knowledge Store and Retrieval

6) AI Interface Layer

7) Navigation and Intent Awareness

8) Failure Handling and Handoff

9) Observability and Trust

10) Phase-Aligned Stack Justification

11) Explicit Non-Choices

5. How I Would Classify This Challenge