Why the AI agent fails

Octomind is building an AI agent that understands instructions provided in natural language. It navigates your web app to fulfill these instructions, behaving like a real user and finding its way through your app. You can compare it to the computer use research experiment from OpenAI that was presented recently, with the distinction that navigating the page is only the first step. We are also recording the actions and translating them into a step list that can be reproduced deterministically, quickly, and inexpensively for testing purposes. That being said, there are a few reasons why the AI agent might fail to fully automatically generate a test case from a description and subsequently ask for help. Let me explain.

The prompt

The prompt must be sufficiently precise and encompass the steps needed to achieve the goal, similar to instructions given to a manual tester. Occasionally, a brief description or even a single word (e.g., “Log out”) can suffice to compose comprehensive test cases, particularly if the application is designed intuitively. However, this is often not the case. If the prompt is too vague, deviates significantly, or requests interactions not present on the page, the agent will go wild. Learn more about this in prompt debugging.

Missing Agent features

The agent can compose test cases that require a common set of interactions and assertions. However, if something more exotic is required that the agent does not currently support, manual intervention will be necessary. We are constantly building new features to provide a more complete and automated experience. This is an up-to-date list of interaction and assertions types available to the Agent.

Low accessibility of the application

Our Agent looks at images and the DOM of the website and extracts relevant information. Sometimes our extraction mechanisms misses important information or the extracted website elements are not very informative (missing element descriptions, e.g. ARIA labels). In such situations, the Agent gets stuck and needs human intervention. You can significantly assist the Agent by using clear and descriptive names, descriptions, and labels for interactive elements. This practice will also improve your accessibility scores.

Timing and hydration issues

This is by far the trickiest issue on this list. A hydration issue can manifest in hundreds of different ways. Most commonly, text is wiped from an input field shortly after it has been inserted. More extreme cases involve login credentials being erased, preventing the agent from signing in. Hydration issues are rooted in the fact that part of the page is presented to the user while other parts are still loading. Once these other parts finally arrive, they cause nasty side effects, such as deleting text from input boxes. Real users are less likely to notice them since they are typically slow enough to not run into these issues, especially if they have a fast internet connection. This changes, of course, as soon as your infrastructure is less than optimal. We try to mitigate the problem as much as possible. However, depending on the severity of the problem, we might not be able to work around it. In that case, the agent will fail, and you will have to insert the right assertions to make sure the system waits long enough to overcome the issues.

Bot protection and rate limiting

When generating test cases our AI Agent starts by navigating and crawling the application under test using the Octomind bot. Pages with strict bot protection can block it. Here, you’ll find all necessary information you’ll need to allow the Agent to interact with your application. In some cases, the Agent triggers rate limits protecting your app from DDOS attacks. Most of the time it interacts with the website slightly faster and sends more requests than a human would when generating or running tests. If your application is protected with a rate limiter it is very likely that the Agent will fail. This is how you configure an exception for our IP addresses to avoid rate limiting.

How to help the Agent

As you can see, there are a few reasons why the AI agent might fail or cause frustration. You can it dramatically increase the chances of its success by:

Providing a good, comprehensive prompt
Avoid asking the Agent for interaction or assertion types it can’t identify yet
Improve accessibility of your application
Address hydration issues of your application
Grant the Agent an exception in your bot protection and rate limiter setup

Test creation

Test generation failure

Failed test runs

Test orchestration

Troubleshooting

The prompt

Missing Agent features

Low accessibility of the application

Timing and hydration issues

Bot protection and rate limiting

How to help the Agent

Test creation

Test generation failure

Failed test runs

Test orchestration

Troubleshooting

​The prompt

​Missing Agent features

​Low accessibility of the application

​Timing and hydration issues

​Bot protection and rate limiting

​How to help the Agent

The prompt

Missing Agent features

Low accessibility of the application

Timing and hydration issues

Bot protection and rate limiting

How to help the Agent