Core Testing Principles
Test One Thing at a Time
Isolate components during testing. Don't test multiple features simultaneously. When a test fails, it should be immediately clear which component caused the failure.
Avoid comprehensive tests that exercise many components at once during initial development. Test each component independently before testing integration.
Common problem: Creating a test that exercises the full user flow (UI interaction → API call → storage → display update) before verifying each piece works independently.
Better approach: Test API proxy independently, test storage independently, test UI independently, then test integration.
Make Results Visible
Include diagnostic output so humans can observe what's happening. Don't assume "no error = working."
For each test provide:
- Console output showing key operations
- Clear pass/fail indicators
- Logged data flow where relevant
- Visible UI feedback where appropriate
The human should be able to see exactly what happened and whether it matches expectations.
Wait for Human Verification
After implementing and testing a component, stop and wait for human confirmation before building the next piece.
Provide specific verification instructions:
- Exact steps (which DevTools tab to open, which button to click)
- What they should see if the test passes
- What they might see if it fails
- Troubleshooting steps for common issues
Example:
Verification: API Proxy Responds to Requests
1. Open browser DevTools Console (F12)
2. Click "Send Message" button
3. Expected (passing): Console shows "Received response: [object]" and UI displays AI response
4. If failing: Check Network tab for status codes, verify API key environment variable
Confirm this passes before proceeding.
Common problem: Building multiple components without stopping for verification, then discovering earlier components don't actually work.
Lock Working Code
Once a component passes its tests, don't modify it unless explicitly fixing a bug in that component or the human requests changes.
This is critical. AI coding agents frequently refactor, reorganize, or "improve" working code while adding new features, breaking functionality that previously worked.
Rules:
- ✅ Modify if: Fixing a bug in this component
- ✅ Modify if: Human explicitly requests changes
- ❌ Don't modify if: Adding a new feature that doesn't require changes here
- ❌ Don't modify if: "Improving" code style, organization, or efficiency
New features should be additive. Build on top of working code rather than rewriting it.
Commit immediately after each component passes testing. This creates a rollback point if later changes break things.
Test Realistic Scenarios
Test actual usage patterns, not just ideal conditions.
Test both success and failure scenarios:
- Happy path with valid inputs
- Network errors - what happens when API requests fail?
- Invalid responses - what if API returns unexpected data?
- Edge cases - empty inputs, very long inputs, special characters
- Timing issues - slow network conditions
Verify error handling provides meaningful feedback to users.
Rollback Strategy
When to Rollback vs. Debug Forward
Rollback if:
- Multiple things changed since last working state and it's unclear which change broke functionality
- You've been debugging for 10-15 minutes without progress
- The working version is recent and little progress would be lost
Debug forward if:
- Single focused change with clear error message
- Error message points to obvious problem
- Significant work would be lost by rolling back
Best practice: Commit working code before making changes. Frequent commits make rollback low-cost.
Performance and Cost Considerations
Rate Limiting Testing
Verify rate limiting works without creating infinite loops or request storms during development.
Test that rate limiting provides clear feedback to users when activated.
API Call Management During Testing
Be mindful that each API call costs money during development.
- Use minimal test conversations (short messages)
- Consider using mock responses for repeated tests of the same scenario when practical
- Test error handling logic without making real API calls when possible
Diagnostic Output Guidelines
During Development
Include generous diagnostic output:
- API requests and responses (sanitize API keys)
- State changes and data flow
- Configuration loading and assembly
- Storage operations
- User interactions and results
At Stage Completion
Determine which diagnostics remain for production:
Keep:
- Error logging for troubleshooting
- User-facing error messages
- Essential monitoring (rate limiting status, token usage)
Remove:
- Verbose debugging output
- Development-only logging
- Sensitive data logging
Testing Workflow
For Each Component
- Build the component
- Create a test with diagnostic output
- Provide specific verification instructions to human
- Wait for human confirmation
- Commit working code
- Lock it in - don't modify unless fixing a bug
- Move to next component
At Stage Completion
- Run integration test (all components working together)
- Review and clean up diagnostic output
- Verify stage completion criteria from stage document
- Commit final stage code
- Move to next stage
Key Behavioral Reminders
Don't modify working code. This is the most common failure pattern. Resist refactoring code that passes tests.
Stop and wait for verification. Don't assume tests passed and continue building.
Test small pieces. Don't create comprehensive tests that obscure which component failed.
Commit frequently. Create rollback points after each working component.
Make failures obvious. Humans need clear indicators when tests fail.