We got local models to triage the OpenClaw repo for FREE!*
重點摘要
Back to Articles We got local models to triage the OpenClaw repo for FREE!* Published June 22, 2026 Update on GitHub Upvote - Onur Solmaz osolmaz Follow ben burtenshaw burtenshaw Follow shaun smith evalstate Follow Pedro Cuenca pcuenq Follow Lysandre lysandre Follow *Free as in beer, excluding the cost of electricity, and assuming you already own the hardware June 2026 will go down as the moment that people realized closed models can be taken away. With the removal of Anthropic's latest flagship model Claude Fable 5 fresh in memory, one can see why it is more important than ever to own your AI stack and be able to run models locally, especially if you are building your business on top of AI. In that light, we wanted to share how we use local models like Gemma and Qwen in an agent harness,
Back to Articles We got local models to triage the OpenClaw repo for FREE!* Published June 22, 2026 Update on GitHub Upvote - Onur Solmaz osolmaz Follow ben burtenshaw burtenshaw Follow shaun smith evalstate Follow Pedro Cuenca pcuenq Follow Lysandre lysandre Follow *Free as in beer, excluding the cost of electricity, and assuming you already own the hardware June 2026 will go down as the moment that people realized closed models can be taken away. With the removal of Anthropic's latest flagship model Claude Fable 5 fresh in memory, one can see why it is more important than ever to own your AI stack and be able to run models locally, especially if you are building your business on top of AI. In that light, we wanted to share how we use local models like Gemma and Qwen in an agent harness, to run classification tasks[^1]. This approach is different from using a model like BERT for classification. A local model in an agent harness like Pi can be used in tandem with structured outputs, to assign labels. We chose this approach because we already had local models and the harness on hand, and have conviction that similar setups will increase in popularity as local models improve in capability.[^2] Our starting point was open source contributions in the OpenClaw repo. OpenClaw gets hundreds of issues and PRs every day, which need to be triaged, prioritized and routed to maintainers. I, Onur, am working to make local models work well with OpenClaw. Being a maintainer of this specific vertical, I need to react quickly to any P0 issues. With SOTA closed models like GPT-5, Opus, or Sonnet, this is a pretty straightforward task. But I happen to sit on 128 GB of unified memory, namely an NVIDIA GB10. So I took on the task: Can I build a real-time notification system that filters and notifies me for only the issues that I am responsible for... with local open-weight models? This tiny box, a.k.a. DGX Spark, can run gemma-4-26b-a4b with high concurrency and generate hundreds of tokens per second. If I set up my OpenClaw main agent running on a $200/mo ChatGPT Pro plan to trigger a job on every new issue or PR, that would use up my quota. I might instead set it to run every 2 hours, or 6 hours. This would batch issues over longer periods, so we would be trading real-time notifications for delayed processing. If I were to run this on a local model on the hardware I already have up and running, I would not only have near-instantaneous notifications, I would also be able to do it for free (or rather, for the cost of electricity). Categorizing issues and PRs We came up with a finite set of labels representing the categories of issues we need to triage, and then use a local model to classify each issue into one of those categories, like local_models, self_hosted_inference, acp, agent_runtime, codex, ui_tui and so on.[^3] But how do we classify pull requests? A simple single request to a Chat Completions endpoint with a tool JSON schema, with the topics as an enum? Kind of. But this is 2026, not 2023, and we have AGENTS. We can do better! For the local model choices, we tested gemma-4-26b-a4b and qwen3.6-35b-a3b. With performance optimizations, both can generate hundreds of tokens per second locally. We use an agent harness to drive the classification run. For this, we bundle pi as a harness that can call local model endpoints. The agent by default receives the PR title, body and a truncated excerpt of the PR diff in the first prompt. Then, it can choose to use the bash tool to perform read-only operations on the OpenClaw repo (in case it needs to look at the codebase), or the final_json tool to submit the final classification result. You wouldn't want to give full bash access to a local model running in this high-throughput setting, because a prompt-injected issue or PR could otherwise steer the model into doing something unrelated to classification. For that reason, we use reposhell instead of bash: a restricted bash-like shell that only allows read-only operations (ls, find, cat, grep, etc.) on the OpenClaw repo. The model thinks it is using bash, but any operation that is not allowed is rejected: reposhell bound cwd=/repo/openclaw repos=openclaw type help for allowed commands; exit or quit to leave reposhell /repo/openclaw> help allowed: pwd, ls, find, rg, grep, sed -n, cat, head, tail, wc -l, git status --short, git show --name-only, git grep, git ls-files search: rg -n -i "lm studio" or grep -R -n -i "lm studio" . files: rg --files -g "*.ts" or git ls-files src examples: rg -n reposhell README.md | sed is not allowed; use one simple command at a time reposhell /repo/openclaw> head README.md # 🦞 OpenClaw — Personal AI Assistant <p align="center"> <picture> <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/openclaw/openclaw/main/docs/assets/openclaw-logo-text-dark.svg"> <img src="https://raw.githubusercontent.com/openclaw/openclaw/main/docs/assets/openclaw-logo-text.svg" alt="OpenClaw" width="500"> </picture> </p> <p align="center"> reposhell /repo/openclaw> curl localhost reposhell policy denied command: unsupported command "curl" exit_code=2 reposhell /repo/openclaw> Here is a concrete example where this mattered. In one saved session example, qwen3.6-35b-a3b was classifying openclaw/openclaw#84621, titled Fix Kimi tool-call rewriting stop reason handling. The thinking block shows the model initially considering coding_agent_integrations because the changed path extensions/kimi-coding made it look plausible. The model used reposhell to inspect the local repo with simple read-only commands like ls extensions, ls extensions/kimi-coding, and cat extensions/kimi-coding/package.json. That package metadata showed the extension was actually @openclaw/kimi-provider, an OpenClaw Kimi provider plugin. So the model corrected the final labels to inference_api and tool_calling, and explicitly excluded coding_agent_integrations. We have mentioned earlier that we bundle a specific pi configuration that can only perform read-only operations and return classification output. We call it localpager-agent, named after localpager, the main project here. Each PR and issue generates a prompt, which is then passed to the CLI like below, alongside other args: localpager-agent \ --model "<model-id>" \ --base-url "<openai-compatible-base-url>" \ --session-dir "<session-output-dir>" \ --final-schema "<runtime-schema.json>" \ --tools bash,final_json \ --reposhell-socket "<reposhell.sock>" \ --reposhell-default-repo "<repo-id>" \ --reposhell-visible-repos "<repo-id>[,<repo-id>...]" \ -p "$(cat <rendered-prompt.md>)" Processing incoming PRs and issues So then what orchestrates everything in between the incoming PR/issue and the final notification on Discord? This is what the final filtered Discord notification looks like: a PR about the desired vertical gets routed to me. The orchestration around this is very simple; only the classification step involves an LLM: We use openclaw/gitcrawl to act as a local mirror for the repo. Whenever there is a new PR or issue, each item is normalized into the same shape and written into localpager's own SQLite database. If the item is new, localpager creates a classification job for it. A worker then claims jobs from that queue. It builds a GitHub context object containing the issue or PR title, body, labels, author, state, and optionally comments, changed files, and selected diff excerpts. That means the local model does not need to browse GitHub or open the URL itself most of the time. It is handed all the relevant context. The context object is rendered into a prompt and passed to localpager-agent as described in the previous section. The agent can think and use reposhell, but must eventually output a classification result in the defined schema. The output is stored back in localpager SQLite database, and relayed to Discord based on the notification policy configured by the user (i.e. notify me for these topics, but not
Related
相關文章

KTV 頻現 AI 魔改 MV:畫面與歌曲毫不相干,消費者可投訴
AI MV 到底來自何處?某大型連鎖量販式 KTV 董事長唐先生表示,KTV 播放的 MV,大多是由點歌設備廠家打包提供的,之所以出現 AI MV,可能有兩種情況。

可口可樂的世界盃TVC,居然是prompt生成的
這篇消息聚焦「可口可樂的世界盃TVC,居然是prompt生成的」。原始導語提到:24小時在線,2秒回話,陪你聊球 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

戰勝Mythos 5,OpenAI安全專用GPT-5.5-Cyber完全體來了
這篇消息聚焦「戰勝Mythos 5,OpenAI安全專用GPT-5.5-Cyber完全體來了」。原始導語提到:網友並不買賬:你倒是發GPT-5.6啊 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

尷尬了,OpenAI剛派GPT-5.5-Cyber修補地球,Codex卻爆出致命bug
今天,OpenAI祭出滿血GPT-5.5-Cyber,要給全世界的開源代碼修漏洞。結果話音剛落,Codex被扒出史詩級bug:一年狂寫640TB,能把SSD直接寫廢。
影眸科技獲數億元新一輪融資,發佈千萬面級3D大模型Rodin Gen-2.5
影眸科技完成數億元融資,凱輝基金、上海國投先導領投。資金用於3D大模型研發與全球商業化,加速遊戲、電商等場景落地。核心產品Hyper3D升級,海外收入佔比80%,服務字節跳動、Unity等客戶。
豆包視頻生成大模型 Seedance 2.5 亮相,將於 7 月初正式發佈
{"id":"64b3b70a-fc58-47da-a595-1037623caeaf","object":"response","model":"deepseek-v4-flash","output":[],"stop_reason":"max_output_tokens","usage":{"input_tokens":157,"output_tokens":200,"total_tokens":357,"input_tokens_details":{"cached_tokens":128}}}