Babbleborg Build Guide - Architecture Reference

Purpose of This Document

This document is a reference for AI coding agents building this system. It describes the architectural patterns, system components, and design decisions needed to create a student AI chat interface with complete pedagogical control.

If you're a human reader, this document will help you understand the system design, but it's optimized to provide context to command-line AI coding assistants during implementation.

Implementation note: This document describes architectural patterns conceptually. The build stages in this guide implement these patterns using PHP on shared hosting, but the patterns themselves are platform-agnostic. If you're working with Node.js, Python, or another backend technology, these same architectural decisions and component relationships apply - you'll simply implement them using your platform's idioms and libraries.

System Overview

This is a web-based AI chat application designed for educational use. It allows educators to:

Customize system instructions that shape AI behavior for learning
Configure conversation frameworks and modular learning supports
Deploy without authentication systems (privacy-first, simple deployment)
Store student conversations locally in their browsers (data ownership)
Pay only for API usage (no per-student licensing)

The architecture prioritizes educator autonomy and student privacy.

Core Architectural Decisions

No Authentication System

Why: Privacy, simplicity, and faster deployment. Each installation serves a single class or use case. Students access via URL. No user accounts, passwords, or session management is required, and no chat data is used for model training.

Trade-off: Multiple classes need multiple installations (different URLs or paths).

Local Browser Storage

Why: Student data ownership. Conversations never leave the student's device unless they explicitly export them. No cloud storage, no server-side conversation databases.

Implementation: IndexedDB via LocalForage library.

Server-Side Instruction Assembly

Why: Avoid ModSecurity WAF (Web Application Firewall) restrictions that block large POST payloads. Many shared hosting environments reject requests with very long system instructions.

Solution: Store only IDs in configuration. Server-side PHP assembles the full instruction text from multiple sources before sending to AI provider.

API-Agnostic Design

Why: Let educators choose the AI provider that best fits their needs and budget. Providers have different pricing, capabilities, and policies.

Implementation: Core architecture works with any REST-based LLM API. Requires adapting the API proxy for different request/response formats.

Conversation Instruction Snapshots

Why: Enables modular, curriculum-aligned learning. Teachers can change the system configuration (switch conversation frameworks or learning supports) for new conversations without affecting existing student work.

Implementation: Each conversation stores a complete copy of the system instruction that was active when it was created.

Pedagogical benefit: Week 1 students create "brainstorming" conversations, Week 2 teacher switches mode to "project planning" - students can still access and continue their Week 1 brainstorming conversations while new conversations use the planning framework.

System Components

Configuration System

Three JSON files work together to provide flexible, modular configuration:

1. Main Configuration File

Application settings (title, subtitle, version)
Master enable/disable switch
AI model identifier
Core educational framework (base system instruction)
Reference to selected conversation framework (by ID, not full text)
Array of selected learning support IDs (not full text)
Student-facing instructions and warnings
UI customization options

2. Conversation Frameworks Library File

Array of complete conversation framework definitions
Each includes: title, description, full prompt text, example interactions
Referenced by numeric ID (array position)
Allows teachers to build a library of frameworks and switch between them

3. Learning Supports Library File

Array of modular pedagogical strategies
Each includes: unique ID, title, prompt text, usage notes
Can be combined (multiple supports selected at once)
Examples: scaffolding techniques, questioning strategies, feedback approaches

Why this structure:

Modular: Mix and match supports without rewriting prompts
Manageable: Change active configuration without editing large text blocks
Avoids WAF issues: Full text assembled server-side, only IDs sent from client

System Instruction Assembly Component

A server-side PHP component that dynamically builds complete system instructions.

Process:

Receives configuration (or conversation-specific instruction if resuming saved conversation)
If building fresh instruction:
Reads main config file for core framework
Looks up selected conversation framework by ID from frameworks library
Looks up selected learning supports by IDs from supports library
Retrieves any framework-specific customization notes
Combines all components into single complete instruction
Returns assembled instruction for use in API request

Output includes:

Core educational framework
Conversation framework prompt and examples
All selected learning support strategies
Customization notes
Framework title markers (for UI display)

Why server-side:

Keeps large text assembly off the client
Avoids WAF restrictions on POST payload size
Central logic for instruction composition

API Proxy Layer

A PHP backend component that mediates between the client and the AI provider API.

Responsibilities:

Request Handling:

Receives conversation history from client
Retrieves or assembles system instruction
Formats request for AI provider's API specification
Includes API key from environment variable
Sends request to AI provider

Rate Limiting:

Tracks request timestamps per client IP address
Enforces minimum interval between requests (prevents abuse)
Uses filesystem-based locks for tracking

Error Handling & Retry Logic:

Implements exponential backoff for rate limit responses (HTTP 429)
Retries failed requests with increasing wait times
Logs errors for debugging
Returns meaningful error messages to client

Response Handling:

Extracts AI response from provider's format
Includes token usage metadata (for cost tracking)
Returns formatted response to client

Security:

API key stored in environment variable (never in code or config files)
SSL verification enforced for all external requests
Security headers (CSP, XSS protection, frame options)
Input validation for conversation history format
HTML escaping for user content before storage

Why a proxy:

Keeps API keys server-side (not exposed to client)
Centralizes rate limiting and error handling
Allows logging and monitoring
Abstracts provider-specific API details from frontend

Storage Layer

Browser-based storage using IndexedDB via the LocalForage library.

Storage Model:

Each conversation is a complete object stored in IndexedDB
Path-based namespacing allows multiple installations on same domain
Maximum 15 conversations per namespace

Conversation Object Structure:

Unique ID
Title (user-editable)
Creation and last-updated timestamps
Array of message objects (role: user/model, content: text)
Complete system instruction (frozen snapshot from creation time)
Instruction type label (e.g., "Biology Study Assistant")
Metadata: model name, token counts, namespace

Why this structure:

Frozen instruction: Ensures pedagogical consistency within each conversation even if global config changes
Complete messages array: Enables continuing multi-turn conversations
Metadata: Allows cost tracking and conversation management
Namespace: Enables multiple installations without conflicts

Path-Based Namespacing:

Namespace derived from URL path
/math-class/ and /history-class/ maintain separate conversation stores
Enables single domain to host multiple class installations
No cross-contamination of student data

Automatic Limit Management:

Storage validates before saving
If at 15 conversations, identifies oldest by last-updated timestamp
Asks user to delete a conversation to make room for new (can suggest oldest)
User can manually delete conversations anytime

Why browser storage:

Student data never leaves their device
No server-side database needed (simpler deployment)
Works offline for viewing saved conversations
Student owns and controls their data

UI Components

Chat Interface:

Message input field
Send button and keyboard shortcuts
Message display area with role distinction (user vs AI)
Markdown rendering for AI responses
Token usage display
Loading states during API requests

Conversation Management:

List of saved conversations with titles and timestamps
Create new conversation
Save current conversation (manual or auto-save)
Load saved conversation (restores full context)
Delete conversations
Sort options (recent activity, newest, oldest)
Active conversation indicator

Configuration Editors (Optional):

Two modes possible, for use by teachers:

Module editor: Lightweight interface to select conversation frameworks and learning supports
Full editor: Full administrative access to all configuration settings

Both editors:

Read current configuration
Provide selection interfaces (dropdowns, checkboxes)
Save updated configuration back to file
Use same backend handler with different permission levels

Why separate editors:

Teachers maintain control over core framework and available options
Teachers can opt for simpler module editor and leave the full editor to technology staff
Single backend simplifies maintenance

Data Flow Diagrams

Request Flow

sequenceDiagram
    participant User
    participant UI
    participant Storage
    participant API Proxy
    participant AI Provider

    User->>UI: Enter message
    UI->>Storage: Check for saved conversation
    Storage-->>UI: Return conversation + instruction (if exists)
    UI->>API Proxy: Send message + history + instruction
    API Proxy->>API Proxy: Assemble or use provided instruction
    API Proxy->>AI Provider: Format & send request
    AI Provider-->>API Proxy: Return response + token usage
    API Proxy-->>UI: Return formatted response
    UI->>UI: Display response with markdown
    UI->>Storage: Save conversation (if requested)
    Storage-->>UI: Confirm saved

Configuration Assembly

flowchart TD
    A[New Conversation Request] --> B{Has saved instruction?}
    B -->|Yes| C[Use conversation snapshot]
    B -->|No| D[Read main config]
    D --> E[Get core framework]
    D --> F[Get selected framework ID]
    D --> G[Get selected support IDs]
    F --> H[Look up framework in library]
    G --> I[Look up supports in library]
    E --> J[Combine all components]
    H --> J
    I --> J
    J --> K[Complete system instruction]
    C --> L[Send to AI provider]
    K --> L

Storage Architecture

flowchart TD
    A[Browser IndexedDB] --> B[Namespace: /class-path/]
    B --> C[Conversation 1]
    B --> D[Conversation 2]
    B --> E[... up to 15]
    C --> F[ID, Title, Timestamps]
    C --> G[Messages Array]
    C --> H[System Instruction Snapshot]
    C --> I[Metadata]

    J[New Conversation Save] --> K{At 15 limit?}
    K -->|Yes| L[Find oldest by timestamp]
    K -->|No| M[Save directly]
    L --> N[Prompt user to Delete]
    N --> M

Security Considerations

API Key Protection

Store in environment variable, never in code or configuration files
Access via server-side code only (PHP getenv())
Never expose to client-side JavaScript
Rotate periodically if compromised

Input Validation

Validate conversation history format before sending to API
Ensure message structure matches expected schema
Reject malformed requests

Input Sanitization

HTML-escape user content before storing in browser
Prevent XSS attacks via stored conversation data
Markdown rendering library handles safe HTML conversion

Security Headers

Content Security Policy (CSP) to restrict resource loading
X-XSS-Protection header
X-Frame-Options to prevent clickjacking
HTTPS enforcement where possible

Rate Limiting

IP-based request tracking
Minimum interval between requests (typically 1 second)
Prevents abuse and runaway API costs
Filesystem-based lock mechanism for tracking

SSL/TLS

Enforce SSL verification on all API requests to AI providers
Use HTTPS for production deployments
Protect API keys and conversation data in transit

Trade-offs and Alternatives

No Authentication

Trade-off: Anyone with the URL can access. Suitable for classroom environments where URL is shared only with enrolled students.

Alternative considered: User accounts with authentication. Rejected due to complexity, privacy concerns (storing student data), and deployment overhead.

Browser Storage Limit (15 conversations)

Trade-off: Students must delete old conversations to create new ones.

Alternative considered: Unlimited storage. Rejected due to browser storage quotas and UX complexity of managing large conversation lists.

Mitigation: Students can export conversations before deletion.

Server-Side Configuration Files

Trade-off: Requires server access to change configuration. Not suitable for non-technical users without file access.

Alternative considered: Database-backed configuration. Rejected due to deployment complexity and shared hosting limitations.

Mitigation: Configuration editors provide UI for file editing.

Single Installation Per Use Case

Trade-off: Multiple classes need multiple installations (different paths or subdomains).

Alternative considered: Multi-tenant system with class/student management. Rejected due to authentication requirements and complexity.

Benefit: Simplicity, privacy, isolation between classes.

Deployment Considerations

Hosting Requirements

PHP 7.4+ support
Apache web server with mod_rewrite (for clean URLs)
Ability to set environment variables
File write permissions for logs and rate limiting
HTTPS recommended for production

External Dependencies

LocalForage: Browser storage abstraction (loaded via CDN)
Showdown or similar: Markdown rendering (loaded via CDN)
AI Provider API: Internet connectivity required for API requests

No Database Required

All configuration stored in JSON files
All conversation data stored client-side in browser
Simplifies deployment and reduces hosting requirements