AI agents discovering and generating test steps

A web app is typically composed of user flows. A user flow lets a user accomplish a certain goal. “Sign-in with email & password” is an example. To perform a sign-in with email & password, a certain sequence of steps - interactions - is required.

Our AI agents mimic human users (i.e., clicks input fields, signs up for newsletter) to navigate apps, interpret app intent, and identify all relevant user flows. We are recording and storing the interaction chain of a test case in an intermediate representation.

Playwright generates test code

After we record and store each test case’s interaction chain, we generate the corresponding Playwright code deterministically on the fly immediately prior to test execution.

The interaction chain can be examined in the test detail view and the Playwright trace viewer.

playwright & AI diagram in Octomind tests

Use of AI agents and Playwright in Octomind end-to-end tests

Work in progress: AI auto-maintenance

We will follow a playbook to find out if a test failure is caused by a behavioral change of your user flows, the test code itself or a bug in your code. In the case of a behavioral change, we pinpoint failing interactions, and deploy the AI Agent to detect new desired interaction that will allow us to achieve the test case’s goals.

This feature is under active development and not publicly accesible yet.

Issue pinpointing

When a test case fails, we help you quickly understand what went wrong. We are providing a set of tools to help you understand the issue.

  • Snapshots at the time of test failure
  • traces via Playwright trace viewer
  • open source Debugtopus tooling, which lets you run our tests localy, so you can set breakpoints to step through the code.

See more details in Debug your code.

Debugtopus interaction diagram

Debugtopus interaction diagram

Parallel test runs for shorter runtime

Browser tests are not super fast since they are simulating a real user. To provide test results as fast as possible we parallelize test execution to the max.
To do so, we are fully cloud based and we scale instances up and down as needed. We are also working on techniques to separate test execution to avoid side effects for better scaling.

Octomind tests are run in parallel, so your test suite will complete in 20 minutes or less, regardless of size.

Flakiness

Test flakiness is the biggest problem of browser tests. Fighting flakiness is an active focus of research on our end. Some of the strategies we follow are:

measurestatus
Active interaction timing to handle varying response timeswork in progress
Smart learning based retries to understand flakiness levelopen
AI based analysis of unexpected circumstances to handle temporary pop-ups, toasts and similar stuff which would otherwise break the testopen
AI based rediscovery in case of permanent user flow changesopen
Coding rules and best practices of how to write tests in a way to avoid common pitfalls.work in progress