Babbleborg Build Guide - Testing Guide

Core Testing Principles

Test One Thing at a Time

Isolate components during testing. Don't test multiple features simultaneously. When a test fails, it should be immediately clear which component caused the failure.

Avoid comprehensive tests that exercise many components at once during initial development. Test each component independently before testing integration.

Common problem: Creating a test that exercises the full user flow (UI interaction → API call → storage → display update) before verifying each piece works independently.

Better approach: Test API proxy independently, test storage independently, test UI independently, then test integration.

Make Results Visible

Include diagnostic output so humans can observe what's happening. Don't assume "no error = working."

For each test provide:

Console output showing key operations
Clear pass/fail indicators
Logged data flow where relevant
Visible UI feedback where appropriate

The human should be able to see exactly what happened and whether it matches expectations.

Wait for Human Verification

After implementing and testing a component, stop and wait for human confirmation before building the next piece.

Provide specific verification instructions:

Exact steps (which DevTools tab to open, which button to click)
What they should see if the test passes
What they might see if it fails
Troubleshooting steps for common issues

Example:

Verification: API Proxy Responds to Requests

1. Open browser DevTools Console (F12)
2. Click "Send Message" button
3. Expected (passing): Console shows "Received response: [object]" and UI displays AI response
4. If failing: Check Network tab for status codes, verify API key environment variable

Confirm this passes before proceeding.

Common problem: Building multiple components without stopping for verification, then discovering earlier components don't actually work.

Lock Working Code

Once a component passes its tests, don't modify it unless explicitly fixing a bug in that component or the human requests changes.

This is critical. AI coding agents frequently refactor, reorganize, or "improve" working code while adding new features, breaking functionality that previously worked.

Rules:

✅ Modify if: Fixing a bug in this component
✅ Modify if: Human explicitly requests changes
❌ Don't modify if: Adding a new feature that doesn't require changes here
❌ Don't modify if: "Improving" code style, organization, or efficiency

New features should be additive. Build on top of working code rather than rewriting it.

Commit immediately after each component passes testing. This creates a rollback point if later changes break things.

Test Realistic Scenarios

Test actual usage patterns, not just ideal conditions.

Test both success and failure scenarios:

Happy path with valid inputs
Network errors - what happens when API requests fail?
Invalid responses - what if API returns unexpected data?
Edge cases - empty inputs, very long inputs, special characters
Timing issues - slow network conditions

Verify error handling provides meaningful feedback to users.

Rollback Strategy

When to Rollback vs. Debug Forward

Rollback if:

Multiple things changed since last working state and it's unclear which change broke functionality
You've been debugging for 10-15 minutes without progress
The working version is recent and little progress would be lost

Debug forward if:

Single focused change with clear error message
Error message points to obvious problem
Significant work would be lost by rolling back

Best practice: Commit working code before making changes. Frequent commits make rollback low-cost.

Performance and Cost Considerations

Rate Limiting Testing

Verify rate limiting works without creating infinite loops or request storms during development.

Test that rate limiting provides clear feedback to users when activated.

API Call Management During Testing

Be mindful that each API call costs money during development.

Use minimal test conversations (short messages)
Consider using mock responses for repeated tests of the same scenario when practical
Test error handling logic without making real API calls when possible

Diagnostic Output Guidelines

During Development

Include generous diagnostic output:

API requests and responses (sanitize API keys)
State changes and data flow
Configuration loading and assembly
Storage operations
User interactions and results

At Stage Completion

Determine which diagnostics remain for production:

Keep:

Error logging for troubleshooting
User-facing error messages
Essential monitoring (rate limiting status, token usage)

Remove:

Verbose debugging output
Development-only logging
Sensitive data logging

Testing Workflow

For Each Component

Build the component
Create a test with diagnostic output
Provide specific verification instructions to human
Wait for human confirmation
Commit working code
Lock it in - don't modify unless fixing a bug
Move to next component

At Stage Completion

Run integration test (all components working together)
Review and clean up diagnostic output
Verify stage completion criteria from stage document
Commit final stage code
Move to next stage

Key Behavioral Reminders

Don't modify working code. This is the most common failure pattern. Resist refactoring code that passes tests.

Stop and wait for verification. Don't assume tests passed and continue building.

Test small pieces. Don't create comprehensive tests that obscure which component failed.

Commit frequently. Create rollback points after each working component.

Make failures obvious. Humans need clear indicators when tests fail.

Testing Methodology and Principles

Core Testing Principles

Test One Thing at a Time

Make Results Visible

Wait for Human Verification

Lock Working Code

Test Realistic Scenarios

Rollback Strategy

When to Rollback vs. Debug Forward

Performance and Cost Considerations

Rate Limiting Testing

API Call Management During Testing

Diagnostic Output Guidelines

During Development

At Stage Completion

Testing Workflow

For Each Component

At Stage Completion

Key Behavioral Reminders