DOMShell

效率与工作流

by apireno

通过文件系统命令浏览网页,提供38个MCP工具,可借助Chrome执行ls、cd、grep、click、type等操作。

什么是 DOMShell

通过文件系统命令浏览网页,提供38个MCP工具,可借助Chrome执行ls、cd、grep、click、type等操作。

README

DOMShell

code
           | |
        ___|_|___
       |___|_|___|
       |   | |   |
       |___|_|___|
        /  | |  \
       /   | |   \
      |____|_|____|
      |           |
      |  DOMSHELL |
      |           |
      |___________|
      |###########|
      |###########|
       \#########/
        \_______/
 
  ██   ██ ██ ███████
  ██   ██ ██   ███
  ███████ ██   ██
  ██░░░██ ██   ██
  ██   ██ ██   ██
  ░░   ░░ ░░   ░░
   ███████ ██   ██ ███████
     ███   ███████ ██░░░░░
     ███   ██░░░██ █████
     ███   ██   ██ ██░░░
     ███   ██   ██ ███████
     ░░░   ░░   ░░ ░░░░░░░
   ██████   ██████  ███    ███  ██
   ██   ██ ██    ██ ████  ████  ██
   ██   ██ ██    ██ ██ ████ ██  ██
   ██   ██ ██    ██ ██  ██  ██  ░░
   ██████   ██████  ██      ██  ██
   ░░░░░░   ░░░░░░  ░░      ░░  ░░

The browser is your filesystem. A Chrome Extension that lets AI agents (and humans) browse the web using standard Linux commands — ls, cd, cat, grep, click — via a terminal in the Chrome Side Panel.

Install from Chrome Web Store | npm package | Read the blog post | Project home

DOMShell maps the browser into a virtual filesystem. Windows and tabs become top-level directories (~). Each tab's Accessibility Tree becomes a nested filesystem where container elements are directories and buttons, links, and inputs are files. Navigate Chrome the same way you'd navigate /usr/local/bin.

Why

AI agents that interact with websites typically rely on screenshots, pixel coordinates, or brittle CSS selectors. DOMShell takes a different approach: it exposes the browser's own Accessibility Tree as a familiar filesystem metaphor.

This means an agent can:

  • Browse tabs with ls ~/tabs/ and switch with cd ~/tabs/123 instead of guessing which tab is active
  • Explore a page with ls and tree instead of parsing screenshots
  • Navigate into sections with cd navigation/ instead of guessing coordinates
  • Act on elements with click submit_btn instead of fragile DOM queries
  • Read content with cat or bulk-extract with text instead of scraping innerHTML
  • Search for elements with find --type combobox instead of writing selectors

The filesystem abstraction is deterministic, semantic, and works on any website — no site-specific adapters needed.

Installation

Chrome Web Store (Recommended)

Install DOMShell directly from the Chrome Web Store. No build step required.

From Source

bash
git clone https://github.com/apireno/DOMShell.git
cd DOMShell
npm install
npm run build

Load into Chrome

  1. Open chrome://extensions/
  2. Enable Developer mode (toggle in top right)
  3. Click Load unpacked
  4. Select the dist/ folder
  5. Click the DOMShell icon in your toolbar — the side panel opens

Usage

Getting Started

Open any webpage, then open the DOMShell side panel. You'll see a terminal:

code
╔══════════════════════════════════════╗
║   DOMShell v1.1.0                    ║
║   The browser is your filesystem.    ║
╚══════════════════════════════════════╝

Type 'help' to see available commands.
Type 'tabs' to see open browser tabs, then 'cd tabs/<id>' to enter one.

dom@shell:~$

You start at ~ (the browser root). Jump straight to the active tab with here, or explore:

code
dom@shell:~$ ls
  windows/       (2 windows)
  tabs/          (5 tabs)

dom@shell:~$ here
✓ Entered tab 123
  Title: Google
  URL:   https://google.com
  AX Nodes: 247

Browsing Tabs and Windows

bash
# List all open tabs
dom@shell:~$ tabs
  ID     TITLE                       URL                        WIN
  123    Google                       google.com                 1
  124    GitHub - apireno             github.com/apireno         1
  125    Wikipedia                    en.wikipedia.org                 2

# Switch to a tab by ID
dom@shell:~$ cd tabs/125
✓ Entered tab 125
  Title: Wikipedia
  URL:   https://en.wikipedia.org
  AX Nodes: 312

# You're now inside the tab's DOM tree
dom@shell:~$ pwd
~/tabs/125

# Go back to browser level
dom@shell:~$ cd ~
dom@shell:~$

# Or use substring matching
dom@shell:~$ cd tabs/github
✓ Entered tab 124 (GitHub - apireno)

# List windows (shows tabs grouped under each window)
dom@shell:~$ windows
Window 1 (focused)
├── *123   Google                        google.com
├──  124   GitHub - apireno              github.com/apireno
└──  125   Wikipedia                     en.wikipedia.org

Window 2
├── *126   Stack Overflow                stackoverflow.com
└──  127   MDN Web Docs                  developer.mozilla.org

# Browse a specific window's tabs
dom@shell:~$ cd windows/2
dom@shell:~/windows/2$ ls
  ID     TITLE                       URL
  125    Wikipedia                    en.wikipedia.org
  126    LinkedIn                     linkedin.com

You can also navigate or open new tabs:

bash
# Navigate the current tab to a URL (requires being inside a tab)
dom@shell:~$ navigate https://example.com

# Open a URL in a new tab (works from anywhere)
dom@shell:~$ open https://github.com
✓ Opened new tab
  URL:   https://github.com
  Title: GitHub
  AX Nodes: 412

Tab Groups (isolation)

By default DOMShell operates on your general browser — shared mode, exactly as before. The group command puts a session in its own isolated Chrome tab group, so the agent works in a clearly-marked lane while you keep browsing freely in other tabs:

bash
# Create an isolated tab group and work inside it
dom@shell:~$ group new research
✓ Created isolated group '🐚 research'  [id 4]
  Working tab: 312

# While isolated, every command is confined to the group's tabs —
# entering a tab outside the group is rejected:
dom@shell:~$ cd tabs/126
cd: tab 126 is outside the session group (id 4). ...

# Show the current mode and group
dom@shell:~$ group
Group mode: isolated
  Group: 🐚 research  [id 4]
  Tabs:  1

# Leave the group (it stays open) — back to shared mode
dom@shell:~$ group detach

# Close the group's DOMShell tabs (your own tabs are kept)
dom@shell:~$ group close

Subcommands: group (status), group new [name], group attach <id>, group detach, group close, group list. Isolated mode keeps the agent out of your other tabs; shared mode is the default and unchanged.

When an MCP client connects, DOMShell automatically gives that session its own fresh 🐚 agent group. The group is left open when the session disconnects (non-destructive) — the agent is instructed to ask whether you'd like it closed before it wraps up, and you can always clear leftovers yourself with group close.

Multi-session. Every DOMShell client gets its own session lane — each side-panel window, each MCP connection, separately isolated. Two side panels in two Chrome windows hold independent positions; multiple concurrent MCP agents each work in their own 🐚 agent group with their own cursor. Run group list anytime to see every active lane; group close <id> to close one.

Multiple agents on one MCP connection. Some MCP clients (e.g. Claude Desktop) share one connection across every chat — so by default two chats in the same client would land in one lane. Each chat can carve out its own lane by passing the group_id parameter to domshell_execute: pass "new" to create a fresh one (its id is returned at the end of the reply as [lane: <id>]), then pass that id on every later call. Two chats → two lanes → no collision. Agents can also use this for handoff — one agent reports its lane id, the next agent passes it as group_id and continues in the same state. Agents are instructed to close any lane they created when the task is done.

Navigating the DOM

Once you're inside a tab, the Accessibility Tree appears as a filesystem:

bash
# List children of the current node
dom@shell:~$ ls
navigation/
main/
complementary/
contentinfo/
skip_to_content_link
logo_link

# Long format shows type prefixes and roles
dom@shell:~$ ls -l
[d] navigation     navigation/
[d] main           main/
[x] link           skip_to_content_link
[x] link           logo_link

# Filter by type
dom@shell:~$ ls --type link
skip_to_content_link
logo_link

# Show DOM metadata (href, src, id) inline — great for finding URLs
dom@shell:~$ ls --meta --type link
[x] link           skip_to_content_link  href=https://example.com/#content <a>
[x] link           logo_link             href=https://example.com/ <a>

# Paginate large directories
dom@shell:~$ ls -n 10              # First 10 items
dom@shell:~$ ls -n 10 --offset 10  # Items 11-20

# Count children by type
dom@shell:~$ ls --count
45 total (12 [d], 28 [x], 5 [-])

# Enter a directory (container element)
dom@shell:~$ cd navigation

# See where you are
dom@shell:~$ pwd
~/tabs/125/navigation

# Go back up
dom@shell:~$ cd ..

# Jump to browser root
dom@shell:~$ cd ~

# Multi-level paths work too
dom@shell:~$ cd main/article/form

# Path variable: %here% expands to the focused tab (via its window)
dom@shell:~$ cd %here%           # Enter the active tab
dom@shell:~$ cd %here%/..        # Go to the window containing the active tab
dom@shell:~$ cd %here%/main      # Enter the active tab and cd into main

Type Prefixes

Every node has a type prefix that communicates metadata without relying on color alone:

PrefixMeaningExamples
[d]Directory (container, cd-able)navigation/, form/, main/
[x]Interactive (clickable/focusable)buttons, links, inputs, checkboxes
[-]Static (read-only)headings, images, text

Reading Content

bash
# Inspect an element — cat shows full AX + DOM metadata
dom@shell:~$ cat submit_btn
--- submit_btn ---
  Role:  button
  Type:  [x] interactive
  AXID:  42
  DOM:   backend#187
  Tag:   <button>
  ID:    submit-form
  Class: btn btn-primary
  Text:  Submit Form
  HTML:  <button id="submit-form" class="btn btn-primary">Submit Form</button>

# cat on a link reveals the href URL
dom@shell:~$ cat Read_more
--- Read_more ---
  Role:  link
  Type:  [x] interactive
  AXID:  98
  DOM:   backend#312
  Tag:   <a>
  URL:   https://en.wikipedia.org/wiki/Article_Title
  Text:  Read more
  HTML:  <a href="https://en.wikipedia.org/wiki/Article_Title">Read more</a>

# Navigate to parent to find its properties (e.g. span inside a link)
dom@shell:~$ cd ..
dom@shell:~$ cat parent_link

# Bulk extract ALL text from a section (one call instead of 50+ cat calls)
dom@shell:/main$ text
[textContent of /main — 4,821 chars]
Heading: Welcome to Our Site
Today we announce the launch of our new product...
(full article text continues)

# Extract text from a specific child
dom@shell:~$text main
[textContent of main — 4,821 chars]

# Limit output length
dom@shell:~$text main -n 500

# Include link URLs inline as markdown [text](url)
dom@shell:~$text --links main/article/paragraph_2978
--- Text (with links): paragraph_2978 ---
Artificial intelligence (AI) is the capability of [computational systems](https://en.wikipedia.org/wiki/Computer)
to perform tasks typically associated with [human intelligence](https://en.wikipedia.org/wiki/Human_intelligence),
such as [learning](https://en.wikipedia.org/wiki/Learning), [reasoning](https://en.wikipedia.org/wiki/Reason)...
(text + link URLs in a single call)

# Get a tree view (default depth: 2)
dom@shell:~$tree
navigation/
├── [x] home_link
├── [x] about_link
├── [x] products_link
└── [x] contact_link

# Deeper tree
dom@shell:~$tree 4

Searching

bash
# Search current directory
dom@shell:~$grep login
[x] login_btn (button)
[d] login_form (form)
[x] login_link (link)

# Recursive search across all descendants
dom@shell:~$grep -r search
[x] search_search (combobox)
[x] search_btn (button)

# Limit results
dom@shell:~$grep -r -n 5 link

# Deep search with full paths (like Unix find)
dom@shell:~$find search
[x] /search_2/search_search (combobox)
[x] /search_2/search_btn (button)

# Find by role type
dom@shell:~$find --type combobox
[x] /search_2/search_search (combobox)

dom@shell:~$find --type textbox
[x] /main/form/email_input (textbox)
[x] /main/form/name_input (textbox)

# Limit results
dom@shell:~$find --type link -n 5

# Find all links with their URLs (great for content extraction)
dom@shell:~$find --type link --meta
[x] /nav/home_link (link)  href=https://example.com/ <a>
[x] /main/Read_more (link)  href=https://example.com/article <a>

Command Chaining (Bash-Style Composition)

DOMShell works like a filesystem — use the same mental model as searching files on disk. grep discovers where content lives (like grep -r in bash), cd scopes your context, and text/cat/find reads content (like cat/head/less). The pipe operator (|) filters output, just like bash.

The pattern is: grep (locate) → cd (scope) → extract (read).

bash
# Workflow 1: Find and read an article section
dom@shell:~$ grep -r article
[d] article (article)  →  ./main/article/
dom@shell:~$ cd main/article
dom@shell:~/main/article$ text
[full article content in one call]

# Workflow 2: Find a section and extract its links
dom@shell:~$ grep -r references
[d] references (region)  →  ./main/article/references/
dom@shell:~$ cd main/article/references
dom@shell:~/main/article/references$ find --type link --meta
[x] /wiki_link (link)  href=https://en.wikipedia.org/... <a>
[x] /paper_link (link)  href=https://arxiv.org/... <a>

# Workflow 3: Find a table and extract structured data
dom@shell:~$ grep -r table
[d] table_4091 (table)  →  ./main/section/table_4091/
dom@shell:~$ extract_table table_4091
| Name   | Value  | Date       |
|--------|--------|------------|
| Alpha  | 42     | 2025-01-15 |
| Beta   | 87     | 2025-02-20 |

# Workflow 4: Discover sections, then drill into one
dom@shell:~$ grep -r heading
[−] Introduction_heading (heading)  →  ./main/article/Introduction_heading
[−] Methods_heading (heading)  →  ./main/article/Methods_heading
[−] Results_heading (heading)  →  ./main/article/Results_heading
dom@shell:~$ cd main/article/Results_heading
dom@shell:~/main/article/Results_heading$ text
[text content of the Results section]

# Workflow 5: Find elements by visible text (not just name)
dom@shell:~$ grep -r --content "sign up"
[x] get_started_btn (button)  →  ./main/hero/get_started_btn
# The button's NAME is "get_started_btn" but its displayed text says "Sign Up Free"
dom@shell:~$ click get_started_btn

Pipe Operator

The pipe operator (|) lets you filter command output, just like bash:

bash
# Filter find results to only GitHub links
dom@shell:~$ find --type link --meta | grep github
[x] /main/repo_link (link)  href=https://github.com/example <a>

# Filter ls output to elements mentioning "login"
dom@shell:~$ ls --text | grep login
[x] login_btn  "Log in to your account"

# Limit results with head
dom@shell:~$ find --type heading | head -n 3
[−] /main/intro_heading (heading)
[−] /main/features_heading (heading)
[−] /main/pricing_heading (heading)

# Chain multiple pipes
dom@shell:~$ find --type link --meta | grep docs | head -n 5

Path Resolution

All commands accept relative paths, eliminating the need to cd first:

bash
# Read text from a nested element directly
dom@shell:~$ text main/article/paragraph_2971

# Click a button inside a form without cd'ing
dom@shell:~$ click main/form/submit_btn

# Inspect a link in the nav
dom@shell:~$ cat navigation/home_link

Sibling Navigation

Use --after and --before flags on ls to find content relative to a landmark:

bash
# Show the 3 elements after a heading
dom@shell:~$ ls --after See_also_heading -n 3 --text
[d] related_topics_list  "Machine Learning, Deep Learning, Neural..."
[−] paragraph_4512       "For more information on these topics..."
[x] Read_more_link       "Read more on Wikipedia"

# Find links after a specific section heading
dom@shell:~$ ls --after References_heading --type link --meta
[x] source_1_link (link)  href=https://arxiv.org/... <a>
[x] source_2_link (link)  href=https://doi.org/... <a>

The key insight: grep output feeds cd, and cd scopes everything else. When you don't know where content lives on a page, always grep first, then scope, then extract.

Interacting with Elements

bash
# Click a button or link
dom@shell:~$click submit_btn
✓ Clicked: submit_btn (button)
(tree will auto-refresh on next command)

# Focus an input field
dom@shell:~$focus email_input
✓ Focused: email_input

# Type into the focused field
dom@shell:~$type hello@example.com
✓ Typed 17 characters

# Navigate to a URL (current tab)
dom@shell:~$navigate https://example.com
✓ Navigated to https://example.com

# Open a URL in a new tab
dom@shell:~$open https://github.com
✓ Opened new tab → https://github.com

Auto-Refresh on DOM Changes

DOMShell automatically detects when the page changes — navigation, DOM mutations, or content updates from clicks. You no longer need to manually run refresh:

bash
dom@shell:~$click search_btn
✓ Clicked: search_btn (button)
(tree will auto-refresh on next command)

dom@shell:~$ls
(page changed — tree refreshed, 312 nodes, path reset to tab root)
main/
navigation/
search_results/
...

If the page navigated, CWD is reset to the tab root. If the DOM just updated in place, your CWD is preserved. You can still force a manual refresh:

bash
dom@shell:~$refresh
✓ Refreshed. 312 AX nodes loaded.

Tab Completion

Press Tab to auto-complete commands and element names — works like bash:

bash
dom@shell:$ ta<Tab>
# completes to: tabs

dom@shell:$ cd nav<Tab>
# completes to: cd navigation/

dom@shell:$ click sub<Tab>
# if multiple matches, shows options:
#   submit_btn
#   subscribe_link
  • Single match: auto-completes inline
  • Multiple matches: shows options below, fills the longest common prefix
  • cd only completes directories; other commands complete all elements

Paste Support

Cmd+V (Mac) / Ctrl+V (Windows/Linux) pastes text directly into the terminal. Multi-line pastes are flattened to a single line.

System Commands

bash
# Check if you're authenticated (reads cookies)
dom@shell:~$whoami
URL: https://example.com
Status: Authenticated
Via: session_id
Expires: 2025-12-31T00:00:00.000Z
Total cookies: 12

# Environment variables
dom@shell:~$env
SHELL=/bin/domshell
TERM=xterm-256color

# Set a variable
dom@shell:~$export API_KEY=sk-abc123

# Debug the raw AX tree
dom@shell:~$debug stats
--- Debug Stats ---
  Total AX nodes:   247
  Ignored nodes:    83
  Generic nodes:    41
  With children:    62
  Iframes:          2

Getting Help

Every command supports --help:

bash
dom@shell:$ ls --help
ls — List children of the current node

Usage: ls [options]

Options:
  -l, --long      Long format: type prefix, role, and name
  -r, --recursive Show nested children (one level deep)
  -n N            Limit output to first N entries
  --offset N      Skip first N entries (for pagination)
  --type ROLE     Filter by AX role (e.g. --type button)
  --count         Show count of children only
...

Command Reference

Browser Level

CommandDescription
tabsList all open tabs (shortcut for ls ~/tabs/)
windowsList all windows with their tabs grouped underneath
hereJump to the active tab in the focused window
cd ~Go to browser root
cd ~/tabs/<id>Switch to a tab by ID (enters automatically)
cd ~/tabs/<pattern>Switch to a tab by title/URL substring match
cd ~/windows/<id>Browse a window's tabs
navigate <url>Navigate the current tab to a URL
open <url>Open a URL in a new tab and enter it
backGo back in browser history (like the back button)
forwardGo forward in browser history
close [tab-id]Close the current tab (or a specific tab by ID)

DOM Tree

CommandDescription
ls [options]List children (-l, --meta, --text, -r, -n N, --offset N, --type ROLE, --count, --after NAME, --before NAME, --json)
cd <path>Navigate (.., ~ or / for browser root, %here% for focused tab, main/form for multi-level)
pwdPrint current path (DOM path or browser path)
tree [depth]Tree view of current node (default depth: 2)
cat <name> [--json]Full element metadata: AX info + DOM properties (tag, href, src, id, class, outerHTML)
text [name] [-n N] [--links]Bulk extract all text from a section (--links inlines URLs as [text](url))
read [name] [opts]Structured subtree extraction (--meta, --text, -d N depth) — tree + content in one call
grep [opts] <pattern>Search by name/role/value (-r recursive, --content match visible text, -n N limit)
find [opts] <pattern>Deep recursive search (--type ROLE with fuzzy aliases: input, dropdown, nav, toggle, modal, image, etc.; --meta, --text, --content, -n N, --json)
extract_links [name]Extract all links as [text](url) format (-n N limit)
extract_table <name>Extract table as markdown or CSV (--format csv, -n N row limit)
click <name>Click an element (falls back to coordinate-based click)
focus <name>Focus an input element
type <text>Type text into the focused element
submit <input> <val>Atomic form fill: focus + clear + type + submit (--submit btn or Enter)
scroll [down|up] [N]Scroll page by N viewport heights (default: 1). Returns scroll position %.
scroll <name>Scroll a specific element into the center of the viewport
js <code>Execute JavaScript in the tab context. Returns JSON-serialized result. Supports async/await.
screenshotCapture a PNG screenshot of the current tab (returns image via MCP, base64 in shell)
select <name> <value>Select a dropdown option by value or visible text (dispatches change/input events)
wait <pattern> [--type ROLE] [--timeout N]Wait for an element matching pattern to appear (polls AX tree, default 5s timeout, max 30s)
eval <expr>Evaluate a JS expression (read-only, no --allow-write needed). Same as js but Read tier.
diff [--json]Compare AX tree against pre-action snapshot. Shows added/removed/changed elements after click/submit/navigate.
refreshForce re-fetch the Accessibility Tree

Automation

CommandDescription
watch <cmd> [--interval N] [--times N] [--until-change]Re-run a command periodically. --until-change stops when output differs. Capped at 28s.
for <source-cmd> : <action-tpl>Iterate over output lines. {} is replaced with each line. Capped at 50 items / 28s.
script list|save|show|run|deleteSave and run multi-command scripts. script run name arg1 replaces $1 in saved commands. Persisted.
each [--pattern FILTER] <cmd>Run a command across all matching tabs. Restores original tab afterward.
functions [pattern] [--json]List callable global JS functions on the page with name, arity, params.
call <funcName> [arg1] [arg2] ...Call a global JS function by name. Args auto-parsed (JSON or string). Write-tier.

System

CommandDescription
whoamiCheck session/auth cookies for the current page
envShow environment variables
export K=VSet an environment variable
history [-n N]Show command history. history clear to reset. !N to recall command N.
bookmark [name] [path]Save/list named paths. bookmark inbox saves current path. cd @inbox jumps back. bookmark --delete name removes.
debug [sub]Inspect raw AX tree (stats, raw, node <id>)
connect <token>Connect to an MCP server via WebSocket bridge
disconnectDisconnect from the MCP server, clear token
helpShow all available commands
clearClear the terminal

How the Filesystem Mapping Works

DOMShell maps the browser into a two-level virtual filesystem:

Browser Level (~)

The browser itself becomes the top of the filesystem hierarchy:

code
~                              (browser root)
├── windows/                   (all Chrome windows)
│   ├── <window-id>/           (tabs in that window)
│   │   ├── <tab-id>           (cd into = enter AX tree)
│   │   └── ...
│   └── ...
└── tabs/                      (flat listing of ALL tabs)
    ├── <tab-id>               (cd into = enter AX tree)
    └── ...

cd-ing into a tab transparently attaches CDP and drops you into its DOM tree.

DOM Level (inside a tab)

Each tab's Accessibility Tree (AXTree) is read via the Chrome DevTools Protocol. Each AX node gets mapped to a virtual file or directory:

Directories (container roles): navigation/, main/, form/, search/, list/, region/, dialog/, menu/, table/, Iframe/, etc.

Files (interactive/leaf roles): submit_btn, home_link, email_input, agree_chk, theme_switch, etc.

cd .. from the DOM root exits back to the tab listing. cd ~ returns to browser root from anywhere.

Naming Heuristic

Names are generated from the node's accessible name and role:

AX NodeGenerated Name
role=button, name="Submit"submit_btn
role=link, name="Contact Us"contact_us_link
role=textbox, name="Email"email_input
role=checkbox, name="I agree"i_agree_chk
role=navigationnavigation/
role=generic, no name, 1 child(flattened — child promoted up)

Duplicate names are automatically disambiguated with _2, _3, etc.

Node Flattening

The AX tree contains many "wrapper" nodes — ignored nodes, unnamed generics, and role=none elements that add structural noise without semantic meaning. DOMShell recursively flattens through these, promoting their children up so you see the meaningful elements without navigating through layers of invisible divs.

Iframe Support

DOMShell discovers iframes via Page.getFrameTree and fetches each iframe's AX tree separately. Iframe nodes are merged into the main tree with prefixed IDs to avoid collisions, so elements inside iframes appear naturally in the filesystem.

Color Coding

ColorMeaning
Blue (bold)Directories (containers)
Green (bold)Buttons
Magenta (bold)Links
Yellow (bold)Text inputs / search boxes
Cyan (bold)Checkboxes / radio / switches
WhiteOther elements
GrayImages, metadata

Architecture

code
┌────────────────────┐
│  Claude Desktop    │──┐
└────────────────────┘  │
┌────────────────────┐  │  HTTP POST/GET/DELETE              ┌─────────────────────┐
│  Claude CLI        │──┼─ localhost:3001/mcp ──┐            │  Side Panel (UI)    │
└────────────────────┘  │  (Bearer token auth)  │            │                     │
┌────────────────────┐  │                       │            │  React + Xterm.js   │
│  Cursor / Other    │──┘                       │            │  - Paste support    │
└────────────────────┘                          ▼            │  - Tab completion   │
                               ┌─────────────────────┐      │  - Command history  │
                               │  MCP Server          │      └─────────┬───────────┘
                               │  (mcp-server/)       │                │
                               │                      │       chrome.runtime
                               │  Express HTTP server  │       .connect()
                               │  Per-session MCP      │                │
                               │  Security layer:     │      ┌─────────▼───────────┐
                               │  - Auth token        │      │  Background Worker  │
                               │  - Command tiers     │      │   (Shell Kernel)    │
                               │  - Domain allowlist  │      │                     │
                               │  - Audit log         │      │  Browser hierarchy  │
                               └──────────┬───────────┘      │  (~, tabs, windows) │
                                          │                  │  Command parser     │
                               WebSocket (localhost:9876)     │  Shell state (CWD)  │
                               + auth token                  │  VFS mapper         │
                               + alarm keepalive             │  CDP client         │
                                          │                  │  DOM change detect  │
                                          └─────────────────►│  WebSocket bridge   │
                                                             └─────────┬───────────┘
                                                                       │
                                                              chrome.debugger
                                                              (CDP 1.3)
                                                                       │
                                                             ┌─────────▼───────────┐
                                                             │   Active Tab        │
                                                             │   Accessibility     │
                                                             │   Tree + iframes    │
                                                             │                     │
                                                             │   DOM events ──────►│
                                                             │   (auto-refresh)    │
                                                             └─────────────────────┘

The MCP server runs as a standalone HTTP service that any number of MCP clients can connect to simultaneously. It exposes two ports: an HTTP endpoint for MCP clients (default 3001) and a WebSocket bridge for the Chrome extension (default 9876).

The extension follows a Thin Client / Fat Host model. The side panel is a dumb terminal — it captures keystrokes, handles paste, and renders ANSI-colored text. All logic lives in the background service worker: command parsing, AX tree traversal, filesystem mapping, CDP interaction, browser hierarchy navigation, and DOM change detection.

Source Layout

code
src/
  background/
    index.ts        # Shell kernel — commands, state, message router, auto-refresh, WS bridge
    cdp_client.ts   # Promise-wrapped chrome.debugger API + iframe discovery
    vfs_mapper.ts   # Accessibility Tree → virtual filesystem mapping
  sidepanel/
    index.html      # Side panel entry HTML
    index.tsx        # React entry point
    Terminal.tsx     # Xterm.js terminal (paste, tab completion, history)
  shared/
    types.ts        # Message types, AXNode interfaces, role constants
public/
  manifest.json     # Chrome Manifest V3
  options.html      # Extension settings page (MCP bridge config)
mcp-server/
  index.ts          # MCP server — standalone Express HTTP + StreamableHTTP, WebSocket bridge, security
  proxy.ts          # Stdio↔HTTP bridge for clients that require command/args (e.g. Claude Desktop)
  package.json      # MCP server dependencies
  tsconfig.json     # MCP server TypeScript config

Tech Stack

  • React + TypeScript — Side panel UI
  • Xterm.js (@xterm/xterm) — Terminal emulator with Tokyo Night color scheme
  • Vite — Build tooling with multi-entry Chrome Extension support
  • Chrome DevTools Protocol (CDP 1.3) via chrome.debugger — AX tree access, element interaction, iframe discovery, DOM mutation events
  • Chrome Manifest V3sidePanel, debugger, activeTab, cookies, storage, alarms permissions

Development

bash
# Watch mode (rebuilds on file changes)
npm run dev

# One-time production build
npm run build

# Type checking
npm run typecheck

After building, reload the extension on chrome://extensions/ and reopen the side panel to pick up changes.

Connecting MCP Clients (Claude Desktop, CLI, Cursor, etc.)

DOMShell includes a hardened MCP server that lets any MCP-compatible client control the browser through DOMShell commands. The server runs as a standalone HTTP service — multiple clients can connect simultaneously.

Three install paths

DOMShell's MCP server supports three install paths — pick whichever matches your setup. Path 1 is the documented default and what most users want. Paths 2 and 3 are optional and exist for users who want container isolation or lifecycle management.

PathWhat you runReboot behaviorWhen to pick it
1. Native (npx)npx @apireno/domshell --allow-write --token <token>Survives naturally — MCP client (Claude Desktop, Cursor, …) spawns it on demandYou want the simplest install — no Docker, no extra tooling
2. Dockerized (compose)docker compose up -d from mcp-server/ with a .env fileSurvives with Docker Desktop's "Start at login" toggle (restart: unless-stopped)You want container isolation but don't need a multi-MCP supervisor
3. ToolHive-managed (thv)thv run + a one-time launchd autostart agentSurvives via launchd → thv restart --allYou're running multiple MCP servers and want one place to thv list / thv logs them all

Full Path 2 / Path 3 instructions (build, .env install pattern, launchd autostart template, reboot recovery): docs/deploy/container-and-toolhive.md. The rest of this README covers Path 1 — the simplest and recommended default.

Install via npm (Path 1 — default)

bash
npm install -g @apireno/domshell

Or run directly without installing:

bash
npx @apireno/domshell --allow-write --token my-secret-token

Architecture

code
User starts independently:
  npx @apireno/domshell --allow-write --token xyz
    → HTTP on :3001/mcp  (MCP clients)
    → WebSocket on :9876  (Chrome extension)

Claude Desktop spawns (stdio proxy):                    ┐
  npx domshell-proxy --port 3001 --token xyz            ├─► HTTP :3001/mcp
Claude CLI connects directly:                           │
  url: http://localhost:3001/mcp?token=xyz              │
Gemini CLI connects directly:                           │
  url: http://localhost:3001/mcp?token=xyz              ┘

The MCP server is a standalone HTTP service — you start it independently, and any number of MCP clients connect to it. No single client "owns" the server process. For clients that require stdio (like Claude Desktop), a tiny proxy bridges stdio to the running HTTP server.

Setup

Quick setup (recommended):

bash
npx @apireno/domshell init

The wizard detects installed MCP clients (Claude Desktop, Cursor, Windsurf), generates a shared token, and writes each client's config. You then start the server once in a terminal — all clients connect to it.

Use --yes for non-interactive mode with sensible defaults:

bash
npx @apireno/domshell init --yes

Manual setup:

1. Start the MCP server:

bash
npx @apireno/domshell --allow-write --token my-secret-token

The server starts two listeners:

  • HTTP on http://127.0.0.1:3001/mcp — MCP client endpoint
  • WebSocket on ws://127.0.0.1:9876 — Chrome extension bridge

Tip: Use --token to set a known token so you can pre-configure clients. If omitted, a random token is generated and printed on startup.

2. Connect MCP clients:

Claude CLI / Gemini CLI / Cursor (direct HTTP — recommended):

code
http://localhost:3001/mcp?token=my-secret-token

Claude Desktop (requires stdio — use the proxy):

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

json
{
  "mcpServers": {
    "domshell": {
      "command": "npx",
      "args": ["-y", "@apireno/domshell", "--allow-write", "--token", "my-secret-token"]
    }
  }
}

Restart Claude Desktop. DOMShell tools will appear.

3. Connect the extension (Options Page):

  1. Go to chrome://extensions/
  2. Find DOMShell and click Options (or right-click the extension icon → Options)
  3. Enable the MCP Bridge toggle
  4. Paste the same token you used in the Claude Desktop config (my-secret-token)
  5. Click Save — the status indicator turns green when connected

The options page shows live connection status: Disabled, Connecting, Connected, or Disconnected.

Alternative: Connect via terminal

You can also connect from the DOMShell terminal instead of the options page:

bash
dom@shell:$ connect my-secret-token

4. Test it:

Ask Claude: "List my open tabs and tell me what's on the first one."

Security

The MCP server is hardened with multiple layers of security. By default, it's read-only — Claude can browse but not click or type.

Command Tiers

TierCommandsDefaultEnable With
Readls, cd, pwd, cat, text, grep, find, tree, refresh, tabs, windows, here, screenshot, wait, eval, diff, history, bookmark, functions, watch, for, script, eachEnabled(always on)
Navigatenavigate, goto, open, back, forwardDisabled--allow-write
Writeclick, focus, type, scroll, js, select, close, callDisabled--allow-write
Sensitivewhoami (exposes cookies)Disabled--allow-sensitive

The Navigate tier is separate from Write because navigation is equivalent to typing a URL — it requires --allow-write but skips the interactive confirmation prompt. This is important for Claude Desktop where /dev/tty is unavailable.

Security Flags

FlagDescription
--allow-writeEnable click/focus/type/scroll/js/select/close/navigate/back/forward commands
--allow-sensitiveEnable whoami (cookie access)
--allow-allShorthand for both
--confirmOpt in to per-action y/n prompts in the server terminal before each write. Off by default.
--no-confirmNo-op (kept for backward compatibility — per-action prompts are off by default).
--domains example.com,app.example.comRestrict commands to specific domains
--expose-cookiesShow full cookie values (default: redacted)
--mcp-port NMCP HTTP endpoint port (default: 3001)
--port NWebSocket bridge port (default: 9876)
--log-file PATHAudit log file (default: audit.log)

Per-Action Confirmation (opt-in)

Per-action terminal prompts are off by default — the MCP server's terminal is detached from where the agent and side panel actually run, so the prompt is awkward to answer in any GUI-spawned setup (Claude Desktop, Cursor, CLI-Anything's harness). The audit log captures every command, and the tier flags (--allow-write, --allow-sensitive) plus --domains remain the actual security boundaries.

If you start the server in your own terminal and want a y/n prompt before every write, add --confirm:

code
[DOMShell] Claude wants to: click submit_btn
Allow? (y/n):

--no-confirm is preserved as a no-op (it matches the default), so any existing config that passes it keeps working unchanged.

Auth Token

  • Use --token to set a known token in the MCP server config, or let the server generate a random one on startup
  • The extension must present this token (via the options page or connect <token>) before the bridge works
  • WebSocket connections without a valid token are rejected
  • Token is stored in chrome.storage.local — survives service worker restarts

Domain Allowlist

With --domains, commands are only executed when the active tab's URL matches:

bash
npx tsx index.ts --allow-write --domains "github.com,docs.google.com"

Audit Log

Every command is logged with timestamps to audit.log (or --log-file):

code
[2026-02-07T12:00:00.000Z] EXECUTE: ls -l
[2026-02-07T12:00:01.000Z] RESULT: 12 items
[2026-02-07T12:00:05.000Z] [WRITE] EXECUTE: click submit_btn
[2026-02-07T12:00:05.500Z] [WRITE] RESULT: ✓ Clicked: submit_btn (button)

Disconnecting

Disable the MCP Bridge toggle in the extension options page, or run disconnect in the DOMShell terminal:

bash
dom@shell:$ disconnect
✓ Disconnected from MCP server.

The domshell_execute interface

DOMShell's MCP server exposes a single tool by default — domshell_execute — the recommended way to drive DOMShell. You pass a command string, exactly as you would type it in the DOMShell terminal:

code
domshell_execute("ls")
domshell_execute("cd tabs/4815")
domshell_execute("find --type link --meta")

Multi-command calls. Pass several commands separated by newlines and they run in sequence, with the combined output returned — a whole workflow in one tool call:

code
domshell_execute("open https://example.com
cd main
text")

One round-trip instead of three, and fewer tool-call cycles.

Multi-line semantics. Each line runs in order in the same MCP session and lane, so cwd, env, and history persist between lines (the second line's cd main is relative to the first line's freshly-opened tab). An error on any single line does not halt the rest — its error message is included in the combined output and subsequent lines still run. That's the right shape for cleanup-line idioms like "cd path\ngrep pattern\ncd back" where the trailing restore must run even if the middle step errors. Implementation: mcp-server/index.ts:1115-1136.

Two modes:

ModeTools exposedUse when
Single-tool (default)domshell_execute onlyNormal use. One approval covers the whole session — no per-command prompts.
Granular (--granular)38 per-command tools (domshell_ls, domshell_click, …)You want your MCP client to prompt for approval per operation type — finer human oversight at the client layer.

Start the server with --granular for the per-command tools:

code
npx @apireno/domshell --granular

Security is identical in both modes. DOMShell's server-side tiers (write / sensitive — set by --allow-write, --allow-sensitive, with optional --confirm for per-action server-terminal prompts) gate risky operations regardless of which tool issued the command. Granular mode does not add security — it adds an extra approval prompt in your MCP client per operation type. That's more human oversight, not more protection.

MCP Tools Reference (--granular mode)

The table below lists the per-command tools exposed when the server runs with --granular. In the default single-tool mode, run the same commands through domshell_execute — the Maps To column shows the command string.

MCP ToolMaps ToTier
domshell_tabstabs (list all tabs)Read
domshell_herehere (jump to active tab)Read
domshell_lsls [options] (DOM or browser level)Read
domshell_cdcd <path> (~, ~/tabs/, /, ..)Read
domshell_pwdpwdRead
domshell_catcat <name>Read
domshell_texttext [name] [-n N] [--links] (bulk text; links=true inlines URLs)Read
domshell_readread [name] [--meta] [--text] [-d N] (structured subtree)Read
domshell_findfind [pattern] [--type ROLE/alias] [--meta] [--text] [-n N] (type accepts fuzzy aliases: input, dropdown, nav, etc.)Read
domshell_grepgrep [-r] [-n N] [--content] <pattern> (section discovery)Read
domshell_treetree [depth]Read
domshell_extract_linksextract_links [name] [-n N] (all links as [text](url))Read
domshell_extract_tableextract_table <name> [--format csv] (table → markdown/CSV)Read
domshell_refreshrefreshRead
domshell_navigatenavigate <url> (current tab)Navigate
domshell_openopen <url> (new tab)Navigate
domshell_clickclick <name>Write
domshell_focusfocus <name>Write
domshell_scrollscroll [down|up] [N] or scroll <target>Write
domshell_jsjs <code> (arbitrary JavaScript execution)Write
domshell_typetype <text>Write
domshell_submitsubmit <input> <value> [--submit btn] (atomic form fill)Write
domshell_backback (browser history back)Navigate
domshell_forwardforward (browser history forward)Navigate
domshell_closeclose [tab-id] (close a tab)Write
domshell_screenshotscreenshot (capture tab as PNG image)Read
domshell_selectselect <name> <value> (dropdown selection)Write
domshell_waitwait <pattern> [--type ROLE] [--timeout N] (wait for element)Read
domshell_evaleval <expression> (read-only JS evaluation, no --allow-write needed)Read
domshell_diffdiff [--json] (compare tree against pre-action snapshot)Read
domshell_whoamiwhoamiSensitive
domshell_functionsfunctions [pattern] [--json] (list callable page functions)Read
domshell_callcall <funcName> [args] (call a global JS function)Write
domshell_watchwatch <cmd> [--interval N] [--times N] [--until-change] (periodic re-execution)Read
domshell_forfor <source> : <template> (iterate over output lines, {} replaced)Read
domshell_scriptscript list|save|show|run|delete (scripts with $1 substitution)Read
domshell_eacheach [--pattern FILTER] <cmd> (cross-tab operations)Read
domshell_execute(any command)Varies

Roadmap

Distribution & Setup

  • Chrome Web Store listinglive on the Chrome Web Store
  • GitHub release with .crxv1.1.1 release with extension zip
  • MCP setup wizardnpx @apireno/domshell init detects installed MCP clients, generates a token, and writes the config automatically
  • Support for other MCP clients — Gemini Desktop, OpenAI ChatGPT desktop, Cursor, Windsurf, and other MCP-compatible hosts

New Commands

  • watch — periodic re-execution of a command (e.g. watch ls --times 3 --interval 1 to poll for DOM changes)
  • history — command history with recall (history, !n to re-run)
  • back / forward — browser-style history navigation within the current tab
  • close — close the current tab (close or close <tab-id>)
  • screenshot — capture a screenshot of the current tab (useful for visual verification alongside AX tree inspection)
  • pipe / | — pipe output between commands (e.g. find --type link | grep login)
  • select <name> — select an option from a <select> dropdown by value or visible text
  • scroll — scroll the page or a specific element (scroll down, scroll up, scroll <name>)
  • wait — wait for a specific element to appear (e.g. wait submit_btn blocks until it exists in the tree)
  • for loop — iterate over command output lines (e.g. for "find --type heading -n 3" : text {}) — replaces manual iteration
  • script command — save and run multi-command scripts (e.g. script save scrape open url ; cd main ; text) for repeatable workflows

JavaScript Layer

  • js command — execute arbitrary JavaScript in the tab context and return the result
  • functions + callfunctions [pattern] lists callable global JS functions with name/arity/params; call funcName arg1 arg2 invokes them. call is write-tier.
  • eval <expr> — quick expression evaluation (e.g. eval document.title, eval window.location.href)

Agent Ergonomics

  • --text flag — show visible text previews inline with ls and find using .innerText (rendered text only, respects CSS visibility); configurable length via --textlen N; cat also shows VisibleText separately from textContent
  • --meta flag — show DOM properties (href, src, id, tag) inline with ls, find, and read output — essential for extracting URLs without separate cat calls
  • --content matching — search by visible text content with grep --content and find --content (or find --text "pattern") — finds elements by what they display, not just their AX name
  • Path resolution — all commands accept relative paths (e.g. text main/article/paragraph, click form/submit_btn) — eliminates unnecessary cd round-trips
  • Sibling navigation--after/--before flags on ls to slice children relative to a landmark element (e.g. ls --after heading --type link --meta)
  • --links flag on text — include hyperlink URLs inline as markdown [text](url) in text output; extracts both content and link destinations in a single call (e.g. text --links main/paragraph)
  • Fuzzy type aliases for findfind --type accepts natural-language aliases (input, dropdown, nav, toggle, modal, image, btn, sidebar, etc.) that expand to matching AX roles — eliminates wasted tool calls from guessing exact role names
  • Visible text cache — lazy cache for innerText results, keyed by backendDOMNodeId, cleared on tree rebuild — eliminates redundant CDP calls during --content matching in grep/find
  • bookmark / alias — save named paths for quick navigation (e.g. bookmark inbox ~/tabs/gmail/main/inbox_list, cd @inbox)
  • each (multi-tab) — run a command across multiple tabs (e.g. each --pattern wiki text to extract text from every Wikipedia tab)
  • Structured output mode--json flag on commands for machine-parseable output (e.g. ls --json, cat --json, find --json, diff --json)
  • Session persistence — save and restore shell state (path, env vars, bookmarks, history) across service worker restarts via chrome.storage.local
  • diff — compare AX tree snapshots to see what changed after an action (auto-snapshots before click/submit/navigate)
  • Session tab-group isolation — each session's tabs are placed in a labeled Chrome tab group and every command is confined to that group, so the agent works in its own lane while you keep browsing freely in other tabs of the same window (#32)
  • Multi-session DOMShell — every side-panel console and every MCP connection gets its own independent shell session (its own current directory, tab, and DOM cursor), so multiple human consoles and concurrent agents each work in an isolated lane instead of sharing one global cursor (#33)
  • Agent-declared sessions — an optional group_id on domshell_execute lets an agent address a specific lane: omit it for the current lane, "new" for a fresh one, or pass an id to join an existing lane — so two agent chats sharing one MCP connection stay isolated, and one agent can hand a session off to another (#34)

Platform

  • Standalone headless browser — ship DOMShell as a self-contained headless Chromium process (via Chrome for Testing or embedded Chromium) that agents launch directly — no extension install, no user Chrome profile; just npx @apireno/domshell --headless and connect via MCP. Ideal for CI pipelines, server-side automation, and agent-in-a-loop workflows where a visible browser isn't needed
  • Firefox extension — port to Firefox using WebExtensions API + remote debugging protocol
  • Playwright/Puppeteer backend — alternative to Chrome extension for headless agent workflows
  • REST API mode — expose DOMShell commands over HTTP for non-MCP integrations
  • WASM build — compile DOMShell to WebAssembly so it can be embedded directly on a website for interactive demos without requiring a Chrome extension install

Experiments

  • Nexa: DOMShell vs Raw HTML — same model (Qwen3-4B), same tasks: compare DOMShell's text/AX-tree interface against raw HTML scraping. Tests on both nexa serve and Ollama backends. Found a crossover interaction: Ollama+DOMShell and Nexa+HTML are equally best (1.20 avg). See experiments/nexa_ollama/.
  • Nexa vs Claude (model size) — compared Qwen3-1.7B/4B on progressive tasks. Capability cliff at T3 (paragraph extraction). 4B shows better error recovery. See experiments/nexa_claude/.
  • Model shootout — compared Qwen3-4B, Hermes3-3B, Granite4-Tiny, Llama3.2-3B on Ollama+DOMShell. Qwen3-4B remains best (8/15), only model to break the T3 cliff. Llama3.2-3B close second (7/15, zero hallucinations). See experiments/model_shootout/.
  • Token cost benchmark — measure total input/output tokens per task across DOMShell vs screenshot-based browsing (CiC). Extend existing experiments/claude_domshell_vs_cic/ with token counting. Hypothesis: structured text (2-3KB per response) vs base64 screenshots (500KB+) should show >2x token savings on top of the 2x call-count reduction already measured

Integrations

Nexa AI (Local LLM)

Run DOMShell with local models via nexa-sdk — fully on-device browser automation with no cloud API needed. Uses the same MCP protocol as Claude Desktop but powered by local inference (Granite-4-Micro, Qwen3, etc.).

bash
python integrations/nexa/agent.py --task "Open wikipedia.org/wiki/AI and extract the first paragraph" --verbose

See integrations/nexa/ for setup and usage.

How This Project Was Built

The technical specification for DOMShell was authored by Google Gemini, designed as a comprehensive prompt that could be handed directly to a coding agent to scaffold and build the entire project from scratch. The full original specification is preserved in intitial_project_prompt.md.

The implementation was then built by Claude (Anthropic) via Claude Code, working from that specification.

An AI-designed project, built by another AI, intended for AI agents to use. It's agents all the way down.

Links

License

MIT

常见问题

DOMShell 是什么?

通过文件系统命令浏览网页,提供38个MCP工具,可借助Chrome执行ls、cd、grep、click、type等操作。

相关 Skills

技能工坊

by anthropics

Universal
热门

覆盖 Skill 从创建到迭代优化全流程:起草能力、补测试提示、跑评测与基准方差分析,并持续改写内容和描述,提升效果与触发准确率。

技能工坊把技能从创建、迭代到评测串成闭环,方差分析加描述优化,特别适合把触发准确率打磨得更稳。

效率与工作流
未扫描149.6k

PPT处理

by anthropics

Universal
热门

处理 .pptx 全流程:创建演示文稿、提取和解析幻灯片内容、批量修改现有文件,支持模板套用、合并拆分、备注评论与版式调整。

涉及PPTX的创建、解析、修改到合并拆分都能一站搞定,连备注、模板和评论也能处理,做演示文稿特别省心。

效率与工作流
未扫描149.6k

PDF处理

by anthropics

Universal
热门

遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。

PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。

效率与工作流
未扫描149.6k

相关 MCP Server

文件系统

编辑精选

by Anthropic

热门

Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。

这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。

效率与工作流
87.1k

by wonderwhy-er

热门

Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。

这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。

效率与工作流
6.2k

EdgarTools

编辑精选

by dgunning

热门

EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。

这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。

效率与工作流
2.3k

评论