OpenClaw Browser 工具完全指南：浏览器自动化从入门到精通

OpenClaw 的 Browser 工具是一个强大的浏览器自动化控制工具，基于 Playwright 和 CDP (Chrome DevTools Protocol) 实现。它可以让你通过自然语言或代码控制浏览器，完成网页交互、数据采集、自动化测试等任务。

本文深入解析 Browser 工具的工作原理、API 参考和最佳实践，帮助你快速上手。

核心能力
#

Browser 工具支持以下核心功能：

🖥️ 多浏览器实例管理 - 支持多个独立的浏览器配置文件（profiles）
🎯 精确元素定位 - 通过 AI 快照、ARIA 树、CSS 选择器定位元素
⚡ 丰富的操作类型 - 点击、输入、导航、截图、文件上传/下载等
🔌 多种连接模式 - 本地托管浏览器、用户浏览器、Chrome 扩展中继、远程 CDP
🌐 节点代理支持 - 可在远程节点上运行浏览器代理

架构原理
#

┌─────────────────────────────────────────────────────────────┐
│                     Agent / LLM                              │
│                    (调用 browser 工具)                        │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                  Gateway / Server                            │
│   - 解析 tool 参数                                             │
│   - 路由到正确的浏览器实例 (profile/target/node)              │
│   - 调用 browser.request RPC                                 │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Browser Control Server                          │
│   - HTTP API: /act, /snapshot, /navigate, /tabs, ...        │
│   - 管理浏览器生命周期 (start/stop/profiles)                 │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                  Playwright + CDP                            │
│   - 通过 CDP 协议控制 Chrome/Chromium                         │
│   - 执行快照、交互、导航等操作                               │
└─────────────────────────────────────────────────────────────┘

工作流程：

Agent 调用 browser 工具
Gateway 解析参数，路由到正确的浏览器实例
Browser Control Server 通过 Playwright + CDP 执行操作
返回结果给 Agent

支持的 Action 类型
#

管理操作
#

Action	描述
`status`	查看浏览器状态
`start`	启动浏览器
`stop`	停止浏览器
`profiles`	列出所有配置文件
`tabs`	列出所有标签页
`open`	打开新标签页
`focus`	聚焦指定标签页
`close`	关闭标签页

交互操作 (act)
#

kind	描述	关键参数
`click`	点击元素	`ref`, `doubleClick`, `button`, `modifiers`
`type`	输入文本	`ref`, `text`, `submit`, `slowly`
`press`	按键	`key`
`hover`	悬停	`ref`
`drag`	拖拽	`startRef`, `endRef`
`select`	选择选项	`ref`, `values`
`fill`	填充表单	`fields`
`resize`	调整视口	`width`, `height`
`wait`	等待条件	`timeMs`, `selector`, `text`, `url`, `loadState`, `fn`
`evaluate`	执行 JS	`fn`, `ref`
`close`	关闭页面	-

观察操作
#

Action	描述
`snapshot`	捕获页面快照 (AI/ARIA 格式)
`screenshot`	截图
`console`	获取控制台消息
`pdf`	保存为 PDF

文件操作
#

Action	描述
`upload`	准备文件上传
`waitfordownload`	等待下载完成
`download`	点击下载并保存
`dialog`	处理弹窗

快速上手
#

1. 启动浏览器并导航
#

# 启动浏览器
- tool: browser
  action: start
  profile: openclaw

# 导航到页面
- tool: browser
  action: navigate
  url: https://example.com
  profile: openclaw

2. 捕获快照并点击
#

# 获取 AI 格式快照（推荐）
- tool: browser
  action: snapshot
  snapshotFormat: ai
  refs: aria
  profile: openclaw

# 假设快照返回 ref="e12" 对应登录按钮
- tool: browser
  action: act
  kind: click
  ref: e12
  targetId: <从快照响应中获取>
  profile: openclaw

3. 填写表单并提交
#

# 使用 fill 批量填充
- tool: browser
  action: act
  kind: fill
  fields:
    - ref: "e5"
      value: "username"
    - ref: "e8"
      value: "password123"
  profile: openclaw

# 点击提交按钮
- tool: browser
  action: act
  kind: click
  ref: "e15"
  profile: openclaw

4. 等待和截图
#

# 等待页面加载完成
- tool: browser
  action: act
  kind: wait
  loadState: networkidle
  timeoutMs: 30000
  profile: openclaw

# 全屏截图
- tool: browser
  action: screenshot
  fullPage: true
  type: png
  profile: openclaw

高级用法
#

使用 Chrome 扩展中继（用户浏览器）
#

# 使用已登录的用户浏览器
- tool: browser
  action: snapshot
  profile: user
  snapshotFormat: ai

# 或使用 Chrome 扩展中继模式
- tool: browser
  action: click
  ref: e20
  profile: chrome-relay
  targetId: <从 tabs 获取>

⚠️ 注意：使用 chrome-relay 时，需要用户点击 Chrome 扩展工具栏图标，确保徽章显示为 ON 状态。

多标签页管理
#

# 列出所有标签页
- tool: browser
  action: tabs
  profile: openclaw

# 打开新标签页
- tool: browser
  action: open
  url: https://another-site.com
  profile: openclaw

# 切换到指定标签页
- tool: browser
  action: focus
  targetId: <targetId from tabs>
  profile: openclaw

文件上传下载
#

# 准备文件上传（先武装文件选择器）
- tool: browser
  action: upload
  paths:
    - /tmp/openclaw/uploads/document.pdf
  ref: e10  # 文件选择器 ref
  profile: openclaw

# 等待并保存下载
- tool: browser
  action: waitfordownload
  path: report.pdf
  profile: openclaw

处理弹窗
#

# 武装弹窗处理（在触发弹窗的操作之前执行）
- tool: browser
  action: dialog
  accept: true
  # 或 dismiss: true
  # 或 promptText: "输入内容"
  profile: openclaw

# 然后执行会触发弹窗的操作
- tool: browser
  action: act
  kind: click
  ref: e25
  profile: openclaw

执行自定义 JS
#

# 获取元素文本内容
- tool: browser
  action: act
  kind: evaluate
  fn: "(el) => el.textContent"
  ref: e10
  profile: openclaw

# 获取页面标题
- tool: browser
  action: act
  kind: evaluate
  fn: "() => document.title"
  profile: openclaw

Profile 选择指南
#

Profile	使用场景
`openclaw` (默认)	隔离的托管浏览器，适合自动化
`user`	使用用户已登录的浏览器，需要用户确认
`chrome-relay`	Chrome 扩展中继模式，需要用户点击工具栏图标
自定义 profile	远程 CDP 或特殊配置

最佳实践
#

1. 快照策略
#

场景	推荐配置
通用 UI 自动化	`snapshotFormat: ai`, `refs: aria`
复杂页面	`mode: efficient`, `limit: 200`
需要视觉参考	`labels: true`（会返回带标注的截图）
无障碍测试	`snapshotFormat: aria`

2. 元素定位优先级
#

aria refs (refs="aria") - 最稳定，跨会话有效
role refs (refs="role") - 基于角色和名称
CSS selector - 精确但易变
文本内容 - 最后选择

3. 性能优化
#

使用 mode: efficient 减少快照大小
设置合理的 limit 和 maxChars
避免频繁调用 snapshot，在状态变化时再捕获
使用 targetId 复用标签页，避免重复打开

常见问题
#

Q1: “No Chrome tabs are attached”
#

原因: 使用 chrome-relay profile 时，用户未点击扩展图标

解决:

确保 Chrome 扩展已安装并启用
点击工具栏的 OpenClaw Browser Relay 图标
确保徽章显示为 ON 状态
重新执行操作

Q2: “tab not found” 或 404 错误
#

原因: targetId 已过期（标签页已关闭或刷新）

解决:

# 重新获取标签页列表
- tool: browser
  action: tabs
  profile: chrome-relay

# 使用新的 targetId 重试操作

Q3: 元素找不到 (ref 无效)
#

原因: 页面状态变化，快照过期

解决:

# 重新捕获快照
- tool: browser
  action: snapshot
  snapshotFormat: ai
  refs: aria

# 使用新的 ref 执行操作

Q4: 中文输入问题
#

解决: 使用 slowly: true 模拟人工输入

- tool: browser
  action: act
  kind: type
  ref: e5
  text: "中文内容"
  slowly: true

配置参考
#

~/.openclaw/openclaw.json:

{
  "browser": {
    "enabled": true,
    "evaluateEnabled": true,
    "headless": false,
    "color": "#FF4500",
    "cdpPortRangeStart": 9220,
    "defaultProfile": "openclaw",
    "profiles": {
      "openclaw": {
        "cdpPort": 9220,
        "driver": "openclaw"
      },
      "user": {
        "driver": "existing-session"
      },
      "remote": {
        "cdpUrl": "http://remote-host:9222",
        "driver": "existing-session"
      }
    },
    "snapshotDefaults": {
      "mode": "efficient"
    },
    "extraArgs": [
      "--window-size=1920,1080"
    ]
  }
}

总结
#

OpenClaw Browser 工具是一个功能强大的浏览器自动化解决方案：

✅ 易于使用 - 通过自然语言或 YAML 配置即可控制浏览器
✅ 功能全面 - 支持点击、输入、导航、截图、文件操作等
✅ 灵活配置 - 多种 profile 模式，适应不同场景
✅ 稳定可靠 - 基于 Playwright 和 CDP，工业级品质

适用场景：

网页数据采集
自动化测试
批量操作
UI 自动化
跨网站工作流

参考资料：

OpenClaw 官方文档：https://docs.openclaw.ai/tools/browser
Playwright 文档：https://playwright.dev/
Chrome DevTools Protocol：https://chromedevtools.github.io/devtools-protocol/

本文基于 OpenClaw 源码分析整理，具体实现可能随版本更新而变化。

核心能力#

架构原理#

支持的 Action 类型#

管理操作#

交互操作 (act)#

观察操作#

文件操作#

快速上手#

1. 启动浏览器并导航#

2. 捕获快照并点击#

3. 填写表单并提交#

4. 等待和截图#

高级用法#

使用 Chrome 扩展中继（用户浏览器）#

多标签页管理#

文件上传下载#

处理弹窗#

执行自定义 JS#

Profile 选择指南#

最佳实践#

1. 快照策略#

2. 元素定位优先级#

3. 性能优化#

常见问题#

Q1: “No Chrome tabs are attached”#

Q2: “tab not found” 或 404 错误#

Q3: 元素找不到 (ref 无效)#

Q4: 中文输入问题#

配置参考#

总结#