Docs/Computer Use

Computer Use & Automation

Let your agent see your screen, click buttons, type text, and automate desktop workflows.

What is Computer Use?

Computer use lets your AI agent interact with your Mac like a human would — taking screenshots, seeing what's on screen, clicking buttons, and typing text.

This unlocks powerful automation workflows:

•Fill out forms — Your agent completes repetitive web forms automatically
•Test applications — Automated UI testing with visual verification
•Guided tours — Walk users through software step-by-step
•Monitor dashboards — Check metrics, detect issues, send alerts
•Control legacy apps — Automate software without APIs

✨

Visual Understanding

Your agent doesn't just click coordinates — it sees and understands what's on screen using Claude's vision capabilities. It adapts to UI changes automatically.

How It Works

When you ask your agent to interact with the desktop, here's what happens:

Screenshot

Your agent captures a screenshot of your screen or a specific window.

Visual Analysis

Claude analyzes the image — identifying buttons, text fields, menus, and UI elements.

Decision

Your agent decides what action to take: click a button, type text, scroll, etc.

Execution

Pinchr sends the mouse/keyboard command to macOS to perform the action.

Verification

Your agent takes another screenshot to verify the action succeeded, then continues.

This loop repeats until the task is complete. Your agent narrates each step so you know what's happening.

Setting Up Permissions on macOS

To enable computer use, Pinchr needs two macOS permissions:

👁️

Screen Recording

Lets Pinchr capture screenshots. Everything stays local — screenshots are never uploaded unless you explicitly share them.

🖱️

Accessibility

Lets Pinchr control mouse and keyboard input to click buttons and type text.

If you skipped these during onboarding, you can grant them anytime:

1.Open Pinchr → Settings → Security
2.Click Request Permissions
3.macOS will prompt you to allow Screen Recording and Accessibility
4.Grant both permissions, then restart Pinchr

💡

You can revoke these permissions anytime from System Settings → Privacy & Security. Your agent will notify you if it needs these permissions to complete a task.

Common Use Cases

Here are some examples of what you can automate with computer use:

📝

Form Filling

"Fill out this expense report with data from my receipts folder."

Your agent opens the form, reads receipt files, enters amounts, dates, categories, and submits.

🧪

Automated Testing

"Test the checkout flow in our web app and verify the confirmation page appears."

Your agent navigates the UI, adds items to cart, completes checkout, and verifies success visually.

📊

Dashboard Monitoring

"Check our analytics dashboard every hour and alert me if traffic drops below 1000 visitors."

Your agent takes a screenshot, reads the metrics visually, and sends you a Slack alert if needed.

🎓

Guided Tours

"Walk me through setting up a new Slack workspace."

Your agent highlights UI elements, explains each step, and can even perform actions for you.

🔧

Legacy Software Automation

"Export all customer records from the old CRM to CSV."

Your agent navigates the legacy app UI, clicks through menus, exports data — no API required.

Safety & Control

Pinchr includes safeguards to prevent unintended actions:

•Visible narration: Your agent describes each action before taking it
•Pause anytime: Press ⌘. to pause automation instantly
•Audit log: Every screenshot and action is logged in Settings → Security
•Scoped windows: Restrict automation to specific apps (e.g., only Chrome, not Finder)
•Kill switch: Disable computer use entirely from Settings if needed

⚠️

Watch the first time. When trying computer use for the first time, keep an eye on what your agent is doing. Once you're comfortable, you can let it run unattended.

Current Limitations

Computer use is powerful but has some limitations:

•Slower than APIs: Visual automation takes 3-10 seconds per action vs instant API calls
•Requires visual stability: Flashing animations or rapidly changing UIs can confuse the agent
•Mouse precision: Tiny UI elements (small icons, dropdowns) may require multiple attempts
•macOS only: Currently macOS-exclusive (Windows/Linux support coming)

For tasks with API access, use MCP servers instead — they're faster and more reliable. Computer use shines when no API exists or visual verification is needed.

Advanced: Recording & Scripting

You can record computer use workflows and replay them as scripts:

1.Ask your agent: "Record this workflow"
2.Perform the actions you want to automate
3.Your agent saves the sequence as a reusable script
4.Later: "Run the expense report workflow"

Scripts are stored in ~/.pinchr/workflows/ as JSON. You can edit them manually for fine-tuning.