OpenAI rolls out Operator, an AI agent for autonomous web-based tasks
Using its built-in browser, Operator can navigate websites, interact with content by typing, clicking, and scrolling, and execute tasks based on user instructions.
ChatGPT-maker
has introduced a research preview of Operator— an AI agent that's capable of independently performing tasks on the web.Using its built-in browser, Operator can navigate websites, interact with content by typing, clicking, and scrolling, and execute tasks based on user instructions.
Still in its early stages, Operator represents the Sam Altman-led firm’s first step towards creating AI agents capable of independently managing tasks.
“Today we’re releasing Operator, an agent that can go to the web to perform tasks for you. It is currently a research preview, meaning it has limitations and will evolve based on user feedback. Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it,” read OpenAI’s blog.
The feature is currently available to users in the United States on the $200-Pro subscription plan. OpenAI plans to provide access to users in its Plus, Team, and Enterprise tiers.
Operator is built on top of Computer-Using Agent (CUA) — a new model which combines GPT-4o's vision capabilities with advanced reasoning skills developed through reinforcement learning. Reinforcement learning is a machine learning technique that teaches software how to make decisions to achieve the best possible outcome.
CUA is specifically trained to interact with graphical user interfaces (GUIs), such as buttons, menus, and text fields, imitating how a user would engage with on-screen elements.
It can 'see' (through screenshots) and 'interact' (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations.
“To get started, simply describe the task you’d like done and Operator can handle the rest. Users can choose to take over control of the remote browser at any point, and Operator is trained to proactively ask the user to take over for tasks that require login, payment details, or when solving CAPTCHAs,” said the company.
Operator allows users to customise their workflows by adding instructions for specific sites or all websites.
For instance, users can set preferred airlines on Booking.com. Users can also save prompts for quick access on the homepage; for instance, streamline repeated tasks such as restocking groceries on Instacart.
Additionally, Operator supports multitasking. Users can start multiple conversations to handle simultaneous tasks, such as ordering a personalised enamel mug on Etsy while booking a campsite on Hipcamp.
Operator is designed to handle challenges and mistakes with its reasoning capabilities, allowing it to self-correct in real time. If it encounters a task it cannot complete, it hands control back to the user.
The CUA model has achieved benchmark results in WebArena and WebVoyager, two key evaluations for browser-based task performance, the company said.
Edited by Swetha Kannan