Pulsing buttons, shadow DOMs and GPT-4o

Octomind Product Update, May 30, 2024

Here I am to spill some beans on what happened at Octomind in the last 2 weeks. Let's start with what went well.

The Good

  • Canceling AI agent runs. We’ve all been there - a sudden mouse slip sends the AI agent on a lengthy generative frenzy. You can now cancel all agent and validation runs. 🛑
  • Seamless test steps editing. A work in progress, but we are getting better. Drag and drop for changing order. Add and delete steps more easily. 👣
  • Discovery state persistence. Don’t remember in which browser tab you started your AI discovery? Just use any of them. The state syncs over all of your tabs. 📑
  • Reduced test flakiness. We put the money where our (blog)mouth is. We reduced flakiness through improved timeout logic. ❄️  
  • AI agent can handle invisible input forms now, like password fields, checkboxes or radio boxes. 🔘
  • Never ending animations and pulsing buttons are all the rave at the moment. We can handle most and we’ve even proposed it as a feature for Playwright. ✴️
  • We moved away from LangChain! You ask to what? There is a blog in the works explaining everything. Cliffhanger 🧗

The Bad

Our AI agent often fails and the AI part is not always the culprit. The agent doesn’t have all the information it needs - it doesn’t see all web elements. Stuff like shadow DOMs and nested-nested iFrames are really tricky.

It dawned on us that many edge-cases are so complex that they can only be handled through precise multimodality - combining screenshot + public DOM analysis. And that requires much faster iterations on the edge-cases we encounter. We hope to get better at that after dropping LangChain.

The Complicated

Not all LLM progress is good news for us. We had high hopes for the new GPT-4o, but our experiments showed mixed results over the last 2 weeks. It is fast, but tends to loop and make more mistakes when reasoning over larger context (e.g. DOM code). It performs worse in complex tasks like our test discovery.

We go further with the slower GPT-turbo for all the complex stuff. We'll use GPT-4o where it shines - in precise settings with well-defined guardrails to speed up the process. Like proposing the next test steps.

The Highlight

We've amped up the agent's output:

We gave the AI assertions some love. We made them faster and more consistent.

AI discovery serves more meaningful tests cases. The discovered tests are now:

  • More functional and less reachability-focused. You’ll get less of the “this link works”.
  • More distinct. The AI agent will suggest less functionally similar tests.

All these improvements are based on our own benchmarking data. Try yourself and throw a new edge-case in the AI agent's way 😊


Thanks for all the feedback to my last update. Happy you liked it. As for the improvement suggestions, please, keep them coming.

daniel roedler, CTO/CPO of Octomind
Daniel Rödler
Co-founder and CTO / CPO  at Octomind
see more blogs

more product updates:

interested in new features?
Subscribe to our monthly product update!
Thank you! We got your submission!
Oops! Something went wrong.