Getting Started with “Computer Use”: Anthropic’s new AI Capability

Setting Up Your System and Hands-On with Anthropic’s “Computer Use”

Introduction

Anthropic recently announced Claude 3.5, introducing two new models: Sonnet and Haiku. While Sonnet boasts significant improvements in coding, Haiku focuses on efficiency. Perhaps the most intriguing development, however, is the introduction of a “computer use” capability. Currently in beta, this feature allows Claude to interact with computers in a way that mimics human behavior — using a mouse, clicking, and typing. In this article, we’ll take a closer look at this capability.

Diving into “Computer Use”

With this new “computer use” capability, Claude can now interact with computers just as a person would. Imagine Claude sitting in front of a screen, moving the mouse, clicking on icons, and typing on the keyboard — that’s essentially what it can do now. It’s like having a digital assistant who can navigate and perform actions in the digital world on your behalf.

How to Run “Computer Use” Locally

Anthropic also released a reference implementation, which contains:

– Web interface
– Docker Container running Ubuntu, with a bunch of apps already installed on that machine.
– Example Tools implementation.
– Agent Loop.

You can get started with this implementation using this repo, which in essence just contains 2 steps:

1. Getting anthropic Api key from Anthropic Console.
2. In terminal using the below command:

export ANTHROPIC_API_KEY=%your_api_key%
docker run \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $HOME/.anthropic:/home/computeruse/.anthropic \
    -p 5900:5900 \
    -p 8501:8501 \
    -p 6080:6080 \
    -p 8080:8080 \
    -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Once it downloads and starts the Docker container, it can be accessed on port 8080 and appears as shown below:

How does it work

Here’s how it works: screenshots are sent to the model, and it responds with tool usage that enables it to access elements on the screen. It appears to have precise and accurate knowledge of the screen elements; in my experiments, I didn’t find it missing elements or clicking in random places.

In the screenshot above, I asked it to find trending topics related to Generative AI.

Experimental Insights and Observations

I conducted three experiments:

1. Find the trending Generative AI topics → This was my initial request, and it was able to fulfill it. Once I observed it moving in the right direction, I decided to abandon this task to attempt slightly more complex (and riskier) tasks.

2. Add a Trustpilot review → This one is interesting. I asked it to submit a Trustpilot review for Scotiabank with a specific text and a 4-star rating. Initially, it bailed out as soon as it encountered a login screen, so I updated the prompt to close the login screen and submit the review as a guest. During the process, I noticed that due to UI lag, it didn’t realize that Scotiabank already existed, so it started creating a new company named Scotiabank, filling in all the details, and then proceeded with adding a review for this new company. This might seem like a trivial example, but similar mistakes could have unwanted (or dangerous) implications when we leave decisions entirely to AI. This raises the question: can we design AI systems that are transparent and accountable in their decision-making processes?

Claude Added a new Company(Image by Author)

Can we design AI systems that are transparent and accountable in their decision-making processes?

3. Find Mechanical keyboards on Amazon and add the information in an excel → For this task, I wanted to observe how it handles interactions across multiple tools. It started well but eventually took an interesting (and potentially risky) turn. At one point, a CAPTCHA appeared on the screen, and it asked me to solve it, which made sense, as the website was verifying if a human was performing the action. However, when a second CAPTCHA appeared, the model was able to solve it independently. This raises a concerning question: will CAPTCHA verification become unreliable if this technology becomes mainstream?

Finally it was able to finish the task.

Ask the human to fill the captcha(Image by Author)

Claude Filled the Captcha(Image by Author)

Will CAPTCHA verification become unreliable if this technology becomes mainstream?

Future Scope

The potential applications for this capability are vast. Here are a few that come to mind:

1. UI Testing → Although UI testing frameworks have been around for a while, they rely on conditions defined by human testers. It would be fascinating to see apps fully tested by an AI acting like a human. Who knows? We might even come up with a new term for AI-driven “Monkey Testing” — perhaps “FunkyBot Testing”?

2. Web Scraping → Imagine no longer needing to update your scraping scripts when a website’s structure changes. With this advanced AI, real-time improvisation could make scraping more adaptable and maintenance-free.

3. Anything a SaaS Provides → This could encompass any task requiring mouse and keyboard use, especially when scripting isn’t an option. Potential applications could include creating BI reports using tools like Tableau or Power BI, or even utilizing creative tools like Adobe Photoshop for video editing.

Security Considerations

A few potential security implications include:

1. Unauthorized Access → I’m not sure if it’s currently enabled, but if Claude could use stored logins and passwords to access systems, there’s a risk of credential misuse. For example, if malicious instructions were given, Claude might log into accounts or post on websites on my behalf, potentially wreaking havoc by accessing sensitive areas or making unauthorized actions.

2. Data Leakage → When interacting with computer systems, some data could inadvertently be shared, leading to potential leaks of sensitive information.

3. Over-Reliance → This is a common concern with new technology, but it feels more prominent here, as the AI acts on your behalf. If it performs a task that wasn’t your intent, the repercussions could be significant.

Ethical Concerns

Being a GenAI product, it brings the familiar issues of job displacement, bias, and discrimination. However, additional concerns are particularly prominent with this capability:

1. Misinformation and Manipulation → If this technology expands unchecked and gains the ability to access the internet independently, there’s a risk that bad actors could exploit it to spread misinformation or manipulate individuals. By creating fake accounts or generating misleading content, the AI could be used to distort reality on a large scale.

2. Autonomous Decision-Making → As the AI becomes more capable, there’s a growing concern about delegating decision-making authority to it, especially in sensitive contexts. For example, in my experiments, it autonomously decided to create a new company profile on Trustpilot instead of attempting to search again. Such actions underscore the potential challenges in ensuring the AI makes ethically and contextually appropriate decisions.

Pricing

Anthropic says in their documentation, that computer use are priced as per their model(Claude 3.5 Sonnet) pricing, which is 3$ per million input tokens and 15$ per million output tokens(Source), in addition to that for Anthropic defined tools, tokens are added as per the table below:

In reality, this translates to (drum roll!!) a substantial number of input tokens, which in turn means significant costs. For approximately 5–6 requests, I was charged as follows:

Apparently sending screenshots adds to a lot of input tokens which spikes the cost.

Conclusion

This technology is incredible and holds immense potential, yet I believe there needs to be oversight in its application. It’s thrilling to see how Anthropic is pushing the boundaries with Claude 3.5 and groundbreaking features like “computer use.” As this technology develops, it will be fascinating to observe its impact across industries and applications. I’m especially interested in exploring its capabilities for tasks like autofilling CAPTCHAs, which could have major implications for bot detection and online security.

However, it’s essential to proceed with caution and address the potential security and ethical concerns to ensure its responsible use. We’re stepping into a brave new world, and with thoughtful oversight, we can harness the power of AI like Claude to build a more efficient, accessible, and secure future.

What are your thoughts on this new technology? How do you see it impacting industries, workflows, or even daily life? What would you most like to use it for? Share your use cases, predictions, and concerns in the comments below — I’d love to hear your insights!

🌟 Stay Connected! 🌟

I love sharing ideas and stories here, but the conversation doesn’t have to end when the last paragraph does. Let’s keep it going!

🔹Website : https://madhavarora.net

🔹 LinkedIn for professional insights and networking: https://www.linkedin.com/in/madhav-arora-0730a718/

🔹 Twitter for daily thoughts and interactions:https://twitter.com/MadhavAror

🔹 YouTube for engaging videos and deeper dives into topics: https://www.youtube.com/@aidiscoverylab

Got questions or want to say hello? Feel free to reach out to me at madhavarorabusiness@gmail.com. I’m always open to discussions, opportunities, or just a friendly chat. Let’s make the digital world a little more connected!

Madhav Arora's Blog