AI Site Navigation Bot
Challenge
Build an AI system that knows the website and company inside and out, and can answer questions or navigate users anywhere on the site.
Framing the Unknowns
At the outset, this problem is underspecified by design.
Key unknowns include:
- What users actually mean when they ask questions.
- Which sources are authoritative versus incidental.
- How much trust users will place in the system.
- Where failures are acceptable versus where escalation is required.
The initial task is to reduce uncertainty before building an AI feature, avoiding premature assumptions. The system must surface gaps, fail safely, and improve iteratively.
1. Objective and Success Definition
Objective
Provide users with a trustworthy interface that can:
- Answer questions about the company and website.
- Guide users to the correct page or resource.
- Reduce friction in discovery and support.
This is a retrieval and navigation system with an AI interface focused on the site, owned and governed like any other critical piece of product infrastructure. The system’s primary responsibility is correctness and safe guidance, with conversational breadth secondary.
Success Criteria
- Users reach the correct destination or answer within 1-2 interactions.
- Responses are grounded in authoritative sources.
- The system clearly signals uncertainty and limitations.
- Measurable reduction in bounce rate or support inquiries.
- Incorrect answers are surfaced and corrected through a defined, owned review process.
2. Primary Risks and Tradeoffs (Before Implementation)
Technical Risks
- Hallucinated authority: Users may trust incorrect answers if uncertainty is not explicit.
- Stale or fragmented knowledge: Websites change faster than AI systems are often updated.
- Overconfidence in ambiguous queries: Routing users incorrectly is worse than asking a clarifying question.
Organizational Risks
- Misaligned scope creep: A navigation assistant can quietly become an unsupported support channel.
Key Tradeoffs
- Accuracy vs. Coverage: Prefer fewer, correct answers over broad but unreliable responses.
- Latency vs. Transparency: Citations and explanations increase response time but build trust.
- AI Capability vs. Determinism: Use AI for interpretation and synthesis while authoritative sources supply truth.
These risks and tradeoffs drive the feature sequencing in the intended order.
3. Sprint-Oriented Phased Implementation Plan
Each phase intentionally exposes new unknowns and prepares the system for the next phase. Every phase stands on its own, with later phases earned by evidence.
Phase 1. Establish Ground Truth and Failure Safety
Purpose Before navigation or intelligence, the system must know what it is allowed to know.
Capabilities
- Index all public website pages.
- Store canonical URLs, update timestamps, and content ownership.
- Retrieval-augmented answers with mandatory citations.
- Explicit “I don’t know” and fallback responses.
- When the bot cannot answer, generate a short, copyable handoff note for the contact form that includes the user’s question, any clarifications asked, and which sources were checked so support receives full context.
Stack and packages (Phase 1)
- Next.js App Router UI with streaming.
- Postgres + pgvector (or SQLite + vector extension) accessed via a thin SQL client (
pgor equivalent). - OpenAI SDK (embeddings + GPT-4.1/4o) or local Ollama equivalent.
- Sitemap-driven crawler with
node-fetchandcheeriofor HTML extraction; cron/scheduled job for recrawls. - Lightweight logging/metrics via Postgres tables and structured logs (e.g.,
pino).
What This Unlocks
- Confidence that answers are grounded.
- Visibility into content gaps and ambiguity.
- Early signal on where users struggle.
- Repeated unanswered, low-confidence, or escalated questions are logged as signals for future content improvement.
Risks Addressed
- Hallucinations.
- False authority.
- Silent failures.
Phase 2. Introduce Intent Awareness and Basic Guidance
Purpose Once answers are reliable, begin guiding users without overcommitting.
Capabilities
- Intent classification is advisory; access stays open and content remains visible.
- High-level intent classification like “learn,” “compare,” “fix,” “contact.”
- Ranked page suggestions instead of single answers.
- Clarifying questions when confidence is low.
Stack and packages (Phase 2)
- Lightweight intent classifier prompt (same LLM) or small hosted model; store intent labels in Postgres.
- Navigation graph tables in Postgres (
nav_nodes,nav_edges) linked to pages. - Ranking logic using pgvector scores plus navigation graph boosts (SQL-side).
What This Unlocks
- Navigation without hard routing.
- Insight into user goals rather than just questions.
- Safer exploration of ambiguity.
Risks Introduced
- Misclassification.
- User confusion if suggestions feel arbitrary.
Mitigations
- Present multiple options.
- Defer to user choice when uncertain.
Phase 3. Trust, Explainability, and UX Hardening
Purpose Make the system safe to rely on repeatedly.
Capabilities
- Explanations for why pages or answers were suggested.
- Visible source links and last-updated indicators.
- Clear escalation paths to human support or static navigation.
Auto-Surfaced FAQ Candidates (Human-Gated)
Purpose
- Convert repeated user questions and escalations into durable, reviewed knowledge.
- Improve site clarity without expanding AI authority.
Capabilities
- Detect recurring unanswered, low-confidence, or escalated questions.
- Cluster similar questions by intent and topic.
- Generate draft FAQ candidates including:
- Proposed question wording.
- Sources consulted.
- Identified content gaps.
- Confidence or recurrence signal.
- Queue all draft FAQs for human review and approval.
Constraints
- No automatic publishing.
- No AI-generated content is user-visible without human sign-off.
- Approved FAQs become authoritative sources for future retrieval.
What This Unlocks
- Faster feedback loops between users and content owners.
- Reduction of repeated failure modes.
- Clear prioritization signals for documentation and product teams.
Risks
- Overproduction of low-value FAQs.
- Review burden on content teams.
Mitigations
- Minimum repetition and impact thresholds before surfacing.
- Explicit ownership assignment before review.
- Prioritization based on user impact, not volume.
Stack and packages (Phase 3)
- UI components in Next.js for explanations, freshness badges (page timestamps), and confidence visualization.
- Retrieval metadata surfaced from Postgres/pgvector queries (scores, sources, timestamps).
- Optional feature flagging for gradual rollout (e.g., simple Postgres-backed flags).
Organizational Impact
- User trust calibration.
- Reduced support burden.
- Organizational confidence in the system.
Tradeoffs
- Slower responses.
- Less perceived “magic.”
Phase 4. Operationalization and Organizational Integration
Purpose Turn the system into maintained infrastructure rather than a novelty.
Capabilities
- Content ownership signals per indexed domain.
- Alerts when answers degrade or sources change.
- Feedback loops to content and product teams.
- Review queues and ownership workflows for surfaced FAQ candidates.
Stack and packages (Phase 4)
- Ownership metadata tables in Postgres with linked owners/teams.
- Alerting via scheduled jobs comparing current retrieval quality to baselines; notify through email/webhook.
- Feedback ingestion endpoint writing to Postgres for content/product triage.
- Optional observability stack: OpenTelemetry traces + export to existing APM; structured logs with
pino.
What This Unlocks
- Sustainable accuracy.
- Shared accountability.
- Alignment with business metrics.
Risks
- Ownership ambiguity.
- System abandonment.
Mitigations
- Explicit owners.
- KPIs tied to real outcomes.
4. Technology Stack
Below is a tech stack chosen to match the risks, tradeoffs, and phased rollout above. Each choice reduces a specific unknown and is intentionally selected for the navigation problem.
Stack at a glance
- UI: Next.js App Router with streaming.
- Data: Postgres + pgvector (SQLite early) for pages, chunks, ownership, metrics.
- Retrieval: Sitemap crawler, RAG with citations, hard source allowlists.
- Intent/Navigation: Lightweight classifier, navigation graph, ranked suggestions.
- Observability: Postgres-backed metrics/logs, optional OTel export.
Why This Stack
- Minimizes irreversible decisions.
- Makes uncertainty visible.
- Supports gradual capability growth.
- Stands up to executive scrutiny while solving the specific navigation problem with focused intent.
1) Design Principles Driving the Stack
- Deterministic over clever.
- Grounding over generation.
- Replaceable components over vendor lock-in.
- Observable failure over silent success.
2) High-Level Architecture
User
↓
Web UI (Chat + Navigation)
↓
AI Interface Layer
↓
Retrieval Orchestrator
↓
Authoritative Knowledge SourcesEach layer is independently swappable.
3) Frontend
- UI Framework: Next.js (App Router) for existing hosting, Server Components, streaming support, and MDX-friendly docs integration.
- Interaction Layer: Chat-style UI with explicit system states (Answering, Clarifying, Uncertain, Handoff ready). No typing indicators or personality fluff.
4) Content Ingestion and Ground Truth
- Crawling: Sitemap-driven crawler; fetch HTML to extract main content, canonical URL, headings, last modified, and content owner.
- Storage: Postgres (or SQLite early) with tables for
pages,content_chunks,ownership,index_runsto stay human-readable, queryable, and auditable without AI.
5) Knowledge Store and Retrieval
- Embeddings: OpenAI text-embedding-3-large or a local equivalent; stable, strong recall, replaceable.
- Vector Store: Postgres + pgvector (or SQLite + vector extension) to keep one database with transactional updates and simple metadata joins.
- Retrieval Strategy: Hybrid keyword filters (URL, section, freshness), semantic similarity, and a hard source allowlist. No unconstrained free-text RAG.
6) AI Interface Layer
- LLM: Start with GPT-4.1/GPT-4o or a local Ollama model when privacy is critical. Use only for query interpretation, answer synthesis, and clarifying question generation. It is never the source of truth.
- Prompt Discipline: System prompt enforces citations, uncertainty signaling, and no speculation. Responses without sources are rejected.
7) Navigation and Intent Awareness
- Intent Classification: Small classifier prompt or lightweight fine-tune with intents: Learn, Compare, Fix, Contact, Unknown. Adjusts confidence and presentation, not hard routing.
- Navigation Graph: Explicit parent/child pages, priority paths, and deprecated flags. AI suggests; users choose.
8) Failure Handling and Handoff
- “I don’t know” as a feature: When confidence is low, generate a copyable, structured handoff summary (user question, clarifications asked, sources checked, gaps identified) so support does not restart the conversation.
9) Observability and Trust
- Metrics: Retrieval confidence, citation coverage, clarification frequency, navigation success rate, handoff rate.
- Logging: Store user intent, retrieved sources, and final answer confidence while excluding raw chat logs by default.
- Review Loops: Periodic review of incorrect or escalated answers by a human owner.
- Feedback Integration: Reviews feed back into indexing rules, prompts, or content ownership.
10) Phase-Aligned Stack Justification
- Phase 1: Postgres + pgvector, sitemap crawler, RAG with citations (minimal, safe, auditable).
- Phase 2: Add intent classifier, navigation graph, and ranked suggestions (still deterministic).
- Phase 3: Explanation UI, source freshness indicators, and confidence visualization (trust over speed).
- Phase 4: Ownership metadata, alerting, and feedback ingestion (infrastructure maturity).
11) Explicit Non-Choices
- Avoid autonomous agents, early fine-tuning, multi-vector stores, premature knowledge graphs, and generic chatbot platforms that expand surface area before failure modes are understood.
5. How I Would Classify This Challenge
This challenge evaluates:
- Comfort operating under uncertainty.
- Ability to sequence work based on risk, not excitement.
- Systems thinking across UX, data, and organizational boundaries.
- Discipline around trust, failure modes, and long-term ownership.
Classification and ownership
- Classification: Staff+/Principal-level challenge (or a seasoned Tech Lead) because it spans product, data, infra, and org process with explicit failure-mode management.
- Why: Success depends on risk sequencing, governance, and operational durability, not just building a chatbot.
- Lowest level to own without burdening leadership: a strong Senior/Staff engineer or tech lead who can drive cross-functional alignment, with leadership setting guardrails but not micromanaging execution.
This system is designed to resist pressure to over-promise or silently degrade.