Diagnose & debug red tests
A guide to diagnosing and debugging test report failures
You know what most common reasons why Octomind’s tests fail are. You’ve learnt how to fix them if the prompt is causing the AI agent getting stuck. You know how to debug test steps using the visual editor.
This is where we’ll show you how to diagnose and debug active test case failures - red tests produced by a test run summarized in a test report
.
To refresh - ‘red’ failed test is extremely important signal in testing. They are caused by 3 reasons:
-
You have a bug in your app: This is why these test exist. They catch bugs before your users do.
-
The test is broken: End-to-end tests tend to break when the app changes. This is not just annoying, but a major time investment to main them.
-
The test is flaky: Flaky tests produce different results when running the exact same test case. They can’t be prevented 100% of the time just yet.
Diagnosing methods
Octomind provides two methods for diagnosing and debugging active test failures:
- Test timeline
- Playwright Trace Viewer
Test timeline
The test timeline
provides a screenshot of every step executed during a test run, with the selector marker indicating the locator successfully found the targeted element.
Timeline of a test run in test report detail, 12/2024
Timeline carousel of a test run in test report detail, 12/2024
But the below step failed because the expected button text “Sign up today” wasn’t found, indicated by the absence of the selector marker.
Failed step in a timeline - selector marker is missing, 12/2024
Here once more in a sequence:
The timeline view is sufficient for identifying most test failures.
Playwright Trace Viewer
But if you need to go deeper, click the debug
tab to access the Playwright code executed during the test and access to the Playwright Trace Viewer
for an advanced visual debugging UI.
Link to Playwright Trace Viewer in the debug tab of the test report detail, 12/2024
Playwright’s Trace Viewer is a GUI tool for debugging end-to-end tests, with a granular timeline complete and detailed diagnostic information such as actions (clicks, navigation), events, console messages, network requests, and more that were generated during a test run.
Screenshot of Playwright Trace Viewer with failed steps, 12/2024
Developers will be familiar with the tabbed interface at the bottom of the screen which functions similarly to in-browser developer tools.
Correct the broken test steps
If the cause of the failure is a broken test, this is how you adjust it by editing a test cases.
The Hack: How we fix our own tests
Since we dogfood our own end-to-end tests, this is how we usually proceed:
- go to the test case
- check the prompt - if it’s not matching the desired flow anymore, adjust it
- click
regenerate steps
- select the last step you wish to keep
- confirm
generate steps
The Agent will regenerate steps of your test and validate them. If it’s alright, click save & run
or adjust steps if needed.
Runs and respective snapshots and traces become outdated after a while. Start
another run to see an up-to-date state to tell which step broke. Refresh it by
hitting the run only
button under save & run
.
'run only' to validate the test before saving it, screenshot 08/2024
Work in progress: Test auto-maintainance
Our goal at Octomind is to make end-to-end testing simple and intuitive while also providing access to low-level code and diagnostic tools such as the locator code editor and utilizing Playwright’s Trace Viewer.
We are aware that maintenance of automated end-to-end tests - the fixing of broken tests - is one of the biggest productivity killers in testing. We are working on a auto-maintenance feature right now that will make QA by orders of magnitude more efficient.
We’ll keep you posted.
In the meantime, please reach out to our team on Discord if you have feedback on how we can improve the debugging experience, or if you need help debugging a tricky test.