I Tested Every User Flow on My Web App with AI — Here's What I Found
How I built a 22-test regression suite using plain English instead of CSS selectors — and what broke.
The Problem with Testing My Own App
I built a full-stack portfolio website with an integrated CMS admin panel at protik.eu.org. Like most indie developers, I skipped writing end-to-end tests. Why? Because Playwright tests are painful to write and even more painful to maintain:
// Traditional Playwright — brittle, selector-dependent
await page.locator('[data-testid="email-input"]').fill('test@example.com');
await page.locator('[data-testid="password-input"]').fill('password');
await page.locator('[data-testid="submit-button"]').click();
await expect(page.locator('.error-message')).toBeVisible();
Every time I changed a CSS class, renamed a component, or restructured my form, the tests broke. I'd spend more time fixing tests than building features.
Then I found Passmark — an open-source AI testing library that lets you write tests in plain English. No selectors. No page objects. Just describe what to test.
What is Passmark?
Passmark is built by the team behind Bug0 and Hashnode. The idea is simple:
- Describe your test steps in plain English
- AI executes them via Playwright on the first run
- Actions are cached to Redis for subsequent runs at native Playwright speed
- Auto-heals when UI changes break a cached step
Here's what a test looks like:
import { test, expect } from "@playwright/test";
import { runSteps } from "passmark";
test("Failed login shows error", async ({ page }) => {
test.setTimeout(90_000);
await runSteps({
page,
userFlow: "Wrong credentials login attempt",
steps: [
{ description: "Navigate to https://protik.eu.org/auth" },
{ description: "Dismiss any cookie consent banner if visible" },
{ description: 'Type "admin@protik.eu.org" into the email input field' },
{ description: 'Type "wrongpassword123" into the password input field' },
{ description: "Click the Sign In button" },
],
assertions: [
{ assertion: "The page displays an error message indicating login failure or invalid credentials — the user stays on the auth page" },
],
test,
expect,
});
});
That's it. No data-testid, no waitForSelector, no hardcoded CSS selectors. Just English.
Building the Test Suite
I entered the Breaking Apps Hackathon with one goal: test every user-facing flow on my portfolio + CMS app. Here's what I built.
Setup — 5 Minutes Flat
npm init playwright@latest breaking-apps-hackathon
cd breaking-apps-hackathon
npm install passmark dotenv
Created a .env file with the free OpenRouter API key from the hackathon:
OPENROUTER_API_KEY=sk-or-v1-...
Configured Passmark in playwright.config.ts:
import { configure } from "passmark";
dotenv.config({ path: path.resolve(__dirname, ".env") });
configure({
ai: {
gateway: "openrouter", // Routes AI calls through OpenRouter
},
});
Done. No separate Anthropic or Google API keys needed.
What I Tested — 22 Tests Across 3 Categories
🔐 Auth & Navigation (12 tests)
This is the heart of the suite. My app has an admin login portal at /auth that gates a CMS dashboard. I tested every way a login can fail — and succeed:
| Test | What it checks |
|---|---|
| Homepage loads and displays portfolio content | Basic page rendering |
| Navigation menu links work | SPA routing |
| Auth portal displays login form | Form element visibility |
| Failed login — invalid email format | Input validation |
| Failed login — wrong credentials | Auth error handling |
| Failed login — empty fields | Required field validation |
| Failed login — email but empty password | Partial validation |
| Auth portal UI elements and branding | Labels, placeholders, headings |
| Nonexistent route (404) handling | Fallback routing |
| Cookie consent banner dismissal | Consent UX |
| Dashboard blocks unauthenticated access | Auth gating |
| Back to home link from auth page | Navigation flow |
The key insight? Assertion quality matters. Compare these two assertions:
// ❌ Weak — could pass even if the page is broken
{ assertion: "An error appears" }
// ✅ Strong — specific, multi-layered, falsifiable
{ assertion: "The page displays an error message indicating login failure or invalid credentials — the user stays on the auth page" }
Passmark uses consensus assertions — evaluated by multiple AI models independently. A third model breaks ties. So your assertions need to be precise enough that independent models agree on the result.
♿ Accessibility (5 tests)
Traditional Playwright tests rarely check accessibility. With Passmark, it's just as easy as testing anything else:
await runSteps({
page,
userFlow: "Auth form accessibility — input labels",
steps: [
{ description: "Navigate to https://protik.eu.org/auth" },
{ description: "Dismiss any cookie consent or notification banner if visible" },
],
assertions: [
{ assertion: "The email input has an associated label, aria-label, or placeholder text that identifies it as an email field" },
{ assertion: "The password input has an associated label, aria-label, or placeholder text that identifies it as a password field" },
],
test,
expect,
});
I tested:
- Form inputs have accessible labels or placeholders
- Page has correct
langattribute for screen readers - Login button has descriptive accessible text
- Auth page text has sufficient color contrast
- Auth form has visible focus indicators for keyboard navigation
What I found: My auth form had proper placeholder attributes but was missing explicit <label> elements. The AI caught this because it checked for both labels and placeholders — a human tester might have assumed placeholders were "good enough."
⚡ Performance & Rendering (5 tests)
- Homepage loads without significant delay
- Auth page loads and form is interactive
- No broken images on homepage
- Navigation between pages is smooth
- Mobile viewport rendering
What I found: The image test caught a lazy-loaded image that showed a blank space before scrolling into view. Not a bug per se, but something I wouldn't have noticed without an AI agent scrolling through the entire page.
What Surprised Me
1. The AI Reads the DOM — Not Just Screenshots
When Passmark executes a step, it takes an accessibility snapshot of the page — the same tree that screen readers use. This means it understands semantic structure, not just visual layout. When I asked it to verify that inputs have labels, it checked for <label> elements, aria-label attributes, AND placeholder text — all three.
2. First Run is Slow, But That's the Point
The first run of each test takes 30-60 seconds per step because the AI is navigating the page, taking snapshots, and caching actions. But subsequent runs replay cached Playwright actions at native speed. The trade-off: invest time on the first run, get instant tests forever.
3. Auto-Healing is Real (When Redis is Configured)
I didn't have Redis configured during my initial runs (hence the warning in the logs). But the docs say that when a cached step fails because the UI changed, Passmark re-engages AI to discover the new element and updates the cache. No manual test maintenance.
4. Specificity in Assertions Pays Off
Vague assertions like "an error is shown" sometimes pass when they shouldn't — the AI might interpret a console error or a minor UI change as "an error." But specific assertions like "the page displays an error message about invalid credentials AND the user stays on the auth page" give the AI clear criteria to evaluate. The multi-model consensus system makes this even more reliable.
5. OpenRouter Makes It Zero-Friction
The hackathon provides free OpenRouter API credits. OpenRouter acts as a gateway — you don't need separate Anthropic, Google, or OpenAI API keys. One key, all models. This lowered the barrier to entry significantly.
What Didn't Work
API Rate Limits on Long Test Suites
Running all 22 tests sequentially takes a while because each step requires an API call on the first run. I hit rate limits when running too many tests too quickly. The solution: run with --workers 1 and set generous timeouts (90-120 seconds per test).
Step Caching Requires Redis
Without a Redis instance, every test run hits the AI API. This is fine for development but gets expensive for CI. I used a free Upstash Redis instance to enable caching — it's mentioned in the .env.example but I haven't configured it yet in my CI pipeline.
Not a Replacement for Unit Tests
Passmark is great for end-to-end regression testing, but it won't replace your unit tests. It tests user flows, not individual functions. Use it alongside your existing test pyramid, not as a replacement.
The Full Setup
Repo: github.com/MNDL-27/breaking-apps-hackathon
breaking-apps-hackathon/
├── .env.example # Template for API keys
├── .github/workflows/playwright.yml
├── .gitignore
├── README.md
├── package.json
├── playwright.config.ts # Passmark + OpenRouter config
└── tests/
├── dockerdash.spec.ts # 12 auth & navigation tests
├── accessibility.spec.ts # 5 accessibility tests
└── performance.spec.ts # 5 performance & rendering tests
To run it yourself:
git clone https://github.com/MNDL-27/breaking-apps-hackathon.git
cd breaking-apps-hackathon
npm install
npx playwright install --with-deps
# Add your OpenRouter API key
echo "OPENROUTER_API_KEY=sk-or-v1-..." > .env
# Run the tests
npx playwright test --project chromium
# View the report
npx playwright show-report
Lessons Learned
- Plain English tests are more maintainable — When my UI changes, I update the English description, not a selector chain
- AI assertions catch subtle issues — Missing
<label>elements, poor contrast, focus indicators — things traditional e2e tests never check - Assertion quality is the differentiator — Specific, multi-layered assertions produce reliable results; vague ones produce flaky ones
- Redis caching is essential for CI — Without it, every run is a first run
- Start with critical flows — Auth, navigation, and form validation first; accessibility and performance next
Try It Yourself
If you're building web apps and not testing them, Passmark removes the biggest excuse: "writing tests takes too long." With plain English test steps, you can cover your critical user flows in an afternoon.
- ⭐ Star and fork: github.com/bug0inc/passmark
- 📖 Docs: passmark.dev
- 🏆 Join the hackathon: hashnode.com/hackathons/breaking-things
This project was built for the Breaking Apps Hackathon by Bug0 and Hashnode. #BreakingAppsHackathon
If you found this useful, follow me on GitHub and check out my portfolio at protik.eu.org.