Core Testing Principles

Test One Thing at a Time

Isolate components during testing. Don't test multiple features simultaneously. When a test fails, it should be immediately clear which component caused the failure.

Avoid comprehensive tests that exercise many components at once during initial development. Test each component independently before testing integration.

Common problem: Creating a test that exercises the full user flow (UI interaction → API call → storage → display update) before verifying each piece works independently.

Better approach: Test API proxy independently, test storage independently, test UI independently, then test integration.

Make Results Visible

Include diagnostic output so humans can observe what's happening. Don't assume "no error = working."

For each test provide:

The human should be able to see exactly what happened and whether it matches expectations.

Wait for Human Verification

After implementing and testing a component, stop and wait for human confirmation before building the next piece.

Provide specific verification instructions:

Example:

Verification: API Proxy Responds to Requests

1. Open browser DevTools Console (F12)
2. Click "Send Message" button
3. Expected (passing): Console shows "Received response: [object]" and UI displays AI response
4. If failing: Check Network tab for status codes, verify API key environment variable

Confirm this passes before proceeding.

Common problem: Building multiple components without stopping for verification, then discovering earlier components don't actually work.

Lock Working Code

Once a component passes its tests, don't modify it unless explicitly fixing a bug in that component or the human requests changes.

This is critical. AI coding agents frequently refactor, reorganize, or "improve" working code while adding new features, breaking functionality that previously worked.

Rules:

New features should be additive. Build on top of working code rather than rewriting it.

Commit immediately after each component passes testing. This creates a rollback point if later changes break things.

Test Realistic Scenarios

Test actual usage patterns, not just ideal conditions.

Test both success and failure scenarios:

Verify error handling provides meaningful feedback to users.

Rollback Strategy

When to Rollback vs. Debug Forward

Rollback if:

Debug forward if:

Best practice: Commit working code before making changes. Frequent commits make rollback low-cost.

Performance and Cost Considerations

Rate Limiting Testing

Verify rate limiting works without creating infinite loops or request storms during development.

Test that rate limiting provides clear feedback to users when activated.

API Call Management During Testing

Be mindful that each API call costs money during development.

Diagnostic Output Guidelines

During Development

Include generous diagnostic output:

At Stage Completion

Determine which diagnostics remain for production:

Keep:

Remove:

Testing Workflow

For Each Component

  1. Build the component
  2. Create a test with diagnostic output
  3. Provide specific verification instructions to human
  4. Wait for human confirmation
  5. Commit working code
  6. Lock it in - don't modify unless fixing a bug
  7. Move to next component

At Stage Completion

  1. Run integration test (all components working together)
  2. Review and clean up diagnostic output
  3. Verify stage completion criteria from stage document
  4. Commit final stage code
  5. Move to next stage

Key Behavioral Reminders

Don't modify working code. This is the most common failure pattern. Resist refactoring code that passes tests.

Stop and wait for verification. Don't assume tests passed and continue building.

Test small pieces. Don't create comprehensive tests that obscure which component failed.

Commit frequently. Create rollback points after each working component.

Make failures obvious. Humans need clear indicators when tests fail.