𝗘𝘃𝗲𝗿𝘆 𝗧𝗲𝘀𝘁 𝗣𝗮𝘀𝘀𝗲𝗱. 𝗧𝗵𝗲 𝗨𝘀𝗲𝗿 𝗦𝘁𝗶𝗹𝗹 𝗖𝗼𝘂𝗹𝗱𝗻'𝘁 𝗣𝗹𝗮𝘆 𝘁𝗵𝗲 𝗚𝗮𝗺𝗲

"The API returns 200 OK!"

I worked in my first engineering job and saw a major problem. My seniors loved dashboards. They loved high code coverage. They thought if the tests were green, the product worked.

They were wrong.

Code working and a human getting what they need are different things. A button can return a success code while leaving a user stuck on a broken screen.

I built a way to find these UX dead-ends without running the app. I call it the two-agent static walkthrough. It uses two AI agents talking in a loop.

  • Agent A is the user. This agent has a specific goal. It is stubborn. It does not quit after one mistake. It keeps trying different paths.
  • Agent B is the app. It has read access to the actual source code. It traces the code path of every user action. It reports exactly what the code does. It cites the file and line number. It cannot imagine things that do not exist in the code.

I tested it on a broken AI mini-game generator. Here is what happened:

Turn 1: The button failed. The user clicked "Generate." The code sent the request to an old, dead endpoint instead of the new one. The tests passed because the old API still worked.

Turn 2: The un-clickable void. The user tried to click the result. The code put the text in a plain box with no click handler. Nothing happened.

Turn 3: The false blessing. The user tried to fix the error. The backend failed because of a missing ID. The screen showed a green success message even though the system died.

Turn 4: Truncated hope. The user tried to copy the code manually. The API cut the text off halfway through. The code was broken.

The user quit.

Most unit tests only check if an endpoint returns 200. They do not check if the user actually reaches their goal.

How to use this:

  • Make the user agent stubborn. Real bugs hide behind the first mistake.
  • Ground the app agent in real code. This turns role-play into a real bug report.
  • Use this as a complement to your tests. It finds the gaps where your logic meets reality.

This method is static and cheap. It runs before you even write a single test fixture. It turns "the code works" into "the user succeeds."

Source: https://dev.to/terum/every-test-passed-the-user-still-couldnt-play-the-game-388o

Optional learning community: https://t.me/GyaanSetuAi