...

2026年7月3日 05:55

重點摘要

WebBrain is a free, open-source browser agent for Chrome and Firefox. It reads pages, extracts data, and automates multi-step tasks.

站內 AI 整理稿

WebBrain is a free, open-source browser agent for Chrome and Firefox. It reads pages, extracts data, and automates multi-step tasks. Unlike most browser AI plugins, it can also run entirely on a local model. It is built by Emre Sokullu and licensed under MIT. The full source lives on GitHub. Run the agent against a local model, and no page data leaves your machine. Connect a cloud API when you want more capability. What is WebBrain? WebBrain lives in your browser’s side panel. In Chrome it uses Manifest V3 and the sidePanel API. In Firefox it uses Manifest V2 and sidebar_action. Each tab keeps its own conversation history. The extension operates inside your existing authenticated session. It sees your logged-in accounts exactly as you do. It stores no data externally and adds no telemetry or accounts. The plugin ships in English, Español, Français, Türkçe, and 中文. It auto-detects your browser language on first launch. Ask Mode, Act Mode, and How Actions Actually Fire WebBrain has two modes: Ask mode is read-only and cannot change the page. Act mode can click, type, scroll, navigate, and run workflows. Ask mode reads pages through ordinary content scripts. Act mode is different. It drives the page through the Chrome DevTools Protocol via the chrome.debugger API. That produces trusted input events that modern sites actually honor. It also reaches cross-origin iframes and shadow DOM that content scripts cannot see. That power is scoped deliberately. WebBrain attaches the debugger only when an action needs it, per tab. Chrome surfaces its standard ‘WebBrain started debugging this browser’ banner while attached. Firefox has no CDP equivalent, so its Act mode is meaningfully weaker. Temperatures are fixed for predictability. Act mode uses temperature 0.15. Ask mode uses 0.3. Dedicated vision screenshot descriptions use 0. The Security Model Browser agents run on an adversarial surface. Web pages can hide prompt injections that hijack an agent’s behavior. WebBrain’s design addresses this directly. The agent starts in read-only Ask mode. It asks before consequential actions. You can disable those prompts in the Permissions settings. They are on by default. There is also a UI-first rule for mutations. For anything that creates, sends, submits, or buys, WebBrain uses the visible UI. It refuses to call REST or GraphQL endpoints directly for mutations. A per-conversation /allow-api override exists when the UI genuinely fails. Reading is treated separately. Fetching a README or comparing prices uses background HTTP through the fetch_url and research_url tools. Reading changes nothing remotely, so the strict rules do not apply. Use Cases, With Concrete Examples Data extraction is the obvious one: Open a catalog and ask: ‘Extract all product names and prices from this page.’ The agent reads the structure and returns rows. It also works with PDFs. Research summaries are another: Ask ‘Summarize this article,’ then follow up with a specific question. WebBrain detects paywalls honestly and does not try to bypass them. It also dismisses common cookie-consent banners before reading. Form filling suits repetitive signups: An optional Profile auto-fill stores a short bio in local plaintext. That text is sent to your configured LLM to complete low-stakes forms. Keep important passwords out of it. Automation spans multiple steps: Try ‘Navigate to github.com and find trending repositories.’ In Act mode, the agent chains navigation, reads, and clicks. Keeping Token Costs Down Cloud tokens add up on long sessions. WebBrain bounds the cost in three ways. Screenshots are resized and iteratively JPEG-compressed before they leave your machine. That keeps image tokens small. Conversation history and tool outputs are trimmed oldest-first as the context window fills. You can also pair a cheap text model for planning with a separate vision model for screenshots. How It Compares WebBrain sits between browser AI plugins and full agent frameworks. Here is the plugin comparison, drawn from the project’s own documentation. FeatureWebBrainClaude in ChromeOpen sourceMIT LicenseProprietaryPriceFree foreverRequires Claude Pro ($20/mo)Local LLM supportllama.cpp, OllamaNo — Claude onlyMulti-providerAll OpenAI-compatible endpointsClaude onlyChromeYes (MV3)YesFirefoxYes (MV2)NoSide panel UIYesYesAsk / Act modesYesSimilarFully offlineYes (with local LLM)No — cloud requiredSelf-hostableYesNo Frameworks like OpenClaw or Browser-Use are a different category. Those are developer SDKs for headless pipelines. WebBrain is an end-user extension you drive from a chat panel. You can use both. Running It: Providers and Setup WebBrain supports local and cloud models through one interface. Local options include llama.cpp, Ollama, LM Studio, Jan, vLLM, and SGLang. Cloud options include OpenAI, Anthropic Claude, Gemini, Mistral, DeepSeek, and xAI Grok. It also supports Groq, MiniMax, Alibaba Cloud (Qwen), Nvidia NIM, and OpenRouter. A built-in managed option, WebBrain Cloud, needs no local setup. It costs $5 per month per device profile under a fair-use policy. For local use, llama.cpp needs no API key. Starting a local server takes one command: Copy CodeCopiedUse a different Browser# llama.cpp — load at least a 16k-token context window llama-server -m your-model.gguf -c 16384 --port 8080 # Ollama (OpenAI-compatible) — set the extension-origin env var OLLAMA_ORIGINS="*" ollama serve # then set the base URL to http://localhost:11434/v1 in settings Point WebBrain at the endpoint in settings. For a cross-machine vLLM server, enable CORS with –allowed-origins ‘[“*”]’. The recommended model is Qwen 3.6 35B (Qwen3.6-35B-A3B). It beat Gemma 4 on the project’s screenshot benchmark. An RTX 5090 is ideal; an RTX 4090 works with INT4 AutoRound quantization. Each provider is a class that extends BaseLLMProvider. It normalizes to one response shape: Copy CodeCopiedUse a different Browser{ content: string, toolCalls: Array|null, usage: Object|null } Key Takeaways WebBrain is a free, MIT-licensed AI browser agent for Chrome and Firefox, built by Emre Sokullu. It runs on local models (llama.cpp, Ollama; Qwen 3.6 35B recommended) or any cloud API — no page data leaves your machine when local. Ask mode reads pages read-only; Act mode clicks and types via the Chrome DevTools Protocol for trusted input events. Security-first by design: starts read-only, approves consequential actions, and uses the UI instead of direct API calls for mutations. Free forever self-hosted, or $5/month per device profile for the managed WebBrain Cloud under fair use. Interactive Explainer with Demo Demo-1 </div> <div> <h2>WebBrain &mdash; Interactive Demo</h2> <p class="wb-sub">Pick a task, choose Ask or Act, and watch the agent work.</p> </div> <span class="wb-badge">Simulated &middot; no real LLM calls</span> </div> <div class="wb-controls"> <div class="wb-seg" id="wb-modeseg"> <button data-mode="ask" class="on">Ask mode</button> <button data-mode="act">Act mode</button> </div> <div class="wb-chips" id="wb-chips"></div> </div> <div class="wb-grid"> <div class="wb-pane"> <div class="wb-bar"> <div class="wb-dots"><span class="wb-dot" style="background:#f87171"></span> <span class="wb-dot" style="background:#fbbf24"></span> <span class="wb-dot" style="background:#34d399"></span></div> <div class="wb-url" id="wb-url"></div> </div> <div class="wb-page" id="wb-page"></div> </div> <div class="wb-pane"> <div class="wb-panehead"><img src="https://s.w.org/images/core/em

原始來源：MarkTechPost AI ↗

查看原始來源

MarkTechPost AI模型更新

RAG-Anything Tutorial: Build a Multimodal Retrieval Pipeline for Text, Tables, Equations, and Images in Colab

In this tutorial, we build a RAG-Anything workflow and use it to explore how multimodal retrieval works across text, tables, equations, and images.

10 小時前閱讀分析

36氪模型更新

美團發佈「零英偉達」萬億大模型，「國芯+國模」徹底跑通了？

這篇消息聚焦「美團發佈「零英偉達」萬億大模型，「國芯+國模」徹底跑通了？」。原始導語提到：中國AI企業繞開“英偉達”，是否將成為一種常態？從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

12 小時前閱讀分析

量子位模型更新

全球首個英偉達含量為0的萬億模型，成了海外開發者的搶手貨

這篇消息聚焦「全球首個英偉達含量為0的萬億模型，成了海外開發者的搶手貨」。原始導語提到：霸榜OpenR ou 從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

12 小時前閱讀分析

IT之家模型更新

葡萄牙發佈首個歐洲葡語開源大語言模型 AMALIA

這篇消息聚焦「葡萄牙發佈首個歐洲葡語開源大語言模型 AMALIA」。原始導語提到：AMALIA 模型由來自葡萄牙多家學術機構的 60 餘位研究人員歷時 18 個月開發而成，目前提供具備多模態能力的 9B 版本，後續還將新增 22B 版本。從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

22 小時前閱讀分析

MarkTechPost AI模型更新

Google Health API 有了 CLI：ghealth 是專為 Fitbit 資料設計的開源工具

Google Health API 是 Fitbit Web API 的官方後繼者，它鎖定 Google Health API v4，並讓開發者遷移至 Google OAuth 2.0。現在，一款名為 ghealth 的開源 CLI 命令列工具將該 API 包裝起來，適用於終端機與 AI 代理。該工具是單一的 Go 二進位檔，採用 Apache 2.0 授權。它將 40 種經過驗證的資料類型以結構化 JSON 形式呈現，讓你能將睡眠、心率與步數資料直接導入代理的上下文。什麼是 ghealth？ghealth 是 Google Health API v4 的包裝工具。你可以透過 go build -o ghealth . 從原始碼建置，產出一個自包含的二進位檔。該工具明確以代理為優先，每個指令都會回傳形狀穩定的簡化 JSON。此外，它還提供確定性錯誤碼、--dry-run 旗標與 --raw 旗標。

22 小時前閱讀分析

AIBase模型更新

支付寶“阿寶”公測開啟：告別菜單跳轉，進入“對話式”辦事新時代

支付寶旗下AI助手“螞蟻阿寶”開啟公測，用戶可通過搜索“阿寶”或右滑進入對話界面體驗。作為支付寶從傳統陳列式交互向對話式服務升級的核心，阿寶以極簡對話框提供直觀高效的智能服務。

1 天前7400閱讀分析

相關文章