MarkTechPost AIAI Agent

在 Microsoft SkillOpt 上實作編碼:儀表化提示詞優化、技能演化分析與基準比較

2026年6月10日 22:07

重點摘要

在本次教學中,我們實作了一個儀表化工作流程,應用於 Microsoft SkillOpt。首先設定 SkillOpt 儲存庫,將其連接到兼容 OpenAI 的模型存取,配置優化器與目標模型,並以受控的樣本限制執行 SearchQA 優化管線,以控制成本。我們先評估原始種子技能作為基準,然後執行真實的優化迴圈:透過展開、反思、聚合、選擇、更新及基於驗證的閘控機制,SkillOpt 逐步改進技能。過程中,我們檢查訓練歷程、視覺化準確率變化、檢視編輯預算行為、監控累計代幣用量,並將演化後的技能與原始基準進行比較。

站內 AI 整理稿

In this tutorial, we implement an instrumented workflow for Microsoft SkillOpt. We set up the SkillOpt repository, connect it to OpenAI-compatible model access, configure the optimizer and target models, and run the SearchQA optimization pipeline with a controlled sample limit to keep costs manageable. We first evaluate the original seed skill as a baseline, then run a real optimization loop in which SkillOpt improves the skill through rollout, reflection, aggregation, selection, updating, and validation-based gating. Along the way, we inspect the training history, visualize changes in accuracy, review edit-budget behavior, monitor cumulative token usage, and compare the evolved skill with the original baseline. SkillOpt Environment Setup Copy CodeCopiedUse a different Browserimport os, re, json, glob, subprocess, pathlib, difflib try: from google.colab import userdata OPENAI_KEY = userdata.get("OPENAI_API_KEY") except Exception: OPENAI_KEY = os.environ.get("OPENAI_API_KEY", "") OPENAI_KEY = OPENAI_KEY or "sk-PASTE-YOUR-KEY-HERE" assert OPENAI_KEY.startswith("sk-"), "Set a real OpenAI key (Colab Secrets -> OPENAI_API_KEY)." OPTIMIZER_MODEL = "gpt-4o" TARGET_MODEL = "gpt-4o-mini" RUN = "outputs/searchqa_adv" LIMIT = 24 RUN_KNOBS = dict(num_epochs=2, batch_size=8, minibatch=4, merge_batch=4, workers=2, lr=4, lr_sched="cosine", limit=LIMIT) if not pathlib.Path("/content/SkillOpt/scripts/train.py").exists(): subprocess.run("git clone --depth 1 https://github.com/microsoft/SkillOpt.git", shell=True, cwd="/content") subprocess.run('pip -q install -e . && pip -q install "openai>=1.0" pandas matplotlib', shell=True, cwd="/content/SkillOpt") os.chdir("/content/SkillOpt") os.environ["AZURE_OPENAI_ENDPOINT"] = "https://api.openai.com/v1" os.environ["AZURE_OPENAI_API_KEY"] = OPENAI_KEY os.environ["AZURE_OPENAI_AUTH_MODE"] = "openai_compatible" SPLIT = "data/searchqa_id_split" CFG = "configs/searchqa/default.yaml" COMMON = ["--azure_openai_endpoint","https://api.openai.com/v1", "--cfg-options","model.backend=azure_openai", "model.azure_openai_auth_mode=openai_compatible"] We prepare the full Colab environment for running SkillOpt. We load the OpenAI API key, define the optimizer and target models, clone the SkillOpt repository, and install the required dependencies. We also configure the OpenAI-compatible backend so the SkillOpt scripts can communicate with the selected models. Baseline Skill Evaluation Copy CodeCopiedUse a different Browserdef run_cli(args, tag): print("\n" + "#"*80 + f"\n# {tag}\n# $ " + " ".join(args) + "\n" + "#"*80) p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) buf = [] for line in p.stdout: print(line, end=""); buf.append(line) p.wait(); return "".join(buf) def parse_acc(txt): m = re.search(r"Results:\s*hard=([\d.]+)\s+soft=([\d.]+)", txt) if m: return {"hard": float(m.group(1)), "soft": float(m.group(2))} g = re.findall(r"hard=([\d.]+)", txt) return {"hard": float(g[-1]), "soft": None} if g else None seed = "skillopt/envs/searchqa/skills/initial.md" if not pathlib.Path(seed).exists(): seed = "baseline_skill.md"; pathlib.Path(seed).write_text("You answer questions from the given context.\n") base_out = run_cli(["python","scripts/eval_only.py","--config",CFG, "--skill",seed,"--split","valid_unseen","--split_dir",SPLIT, "--target_model",TARGET_MODEL,*COMMON, "env.workers=1",f"env.limit={LIMIT}"], "BASELINE EVAL (env seed skill, no training)") base = parse_acc(base_out) We define helper functions to run SkillOpt commands and extract evaluation accuracy from the output. We then locate the initial seed skill used by the SearchQA environment and evaluate it on the unseen validation split. This gives us a baseline result before any optimization or training takes place. Training And Visualization Copy CodeCopiedUse a different Browserk = RUN_KNOBS train_out = run_cli(["python","scripts/train.py","--config",CFG,"--split_dir",SPLIT, "--optimizer_model",OPTIMIZER_MODEL,"--target_model",TARGET_MODEL,"--out_root",RUN, *COMMON, "train.train_size=0", f"train.num_epochs={k['num_epochs']}", f"train.batch_size={k['batch_size']}", f"gradient.minibatch_size={k['minibatch']}", f"gradient.merge_batch_size={k['merge_batch']}", f"gradient.analyst_workers={k['workers']}", f"optimizer.learning_rate={k['lr']}", f"optimizer.lr_scheduler={k['lr_sched']}", "optimizer.use_slow_update=true", "optimizer.use_meta_skill=true", f"env.workers={k['workers']}", f"env.limit={k['limit']}"], "TRAIN (rollout->reflect->aggregate->select->update->gate; slow-update + meta-skill)") import pandas as pd, matplotlib.pyplot as plt hist = json.loads(pathlib.Path(f"{RUN}/history.json").read_text()) df = pd.json_normalize(hist) print("\nhistory.json columns:", list(df.columns)) def col(*cands): for c in cands: for actual in df.columns: if c in actual.lower(): return actual return None c_step = col("step") x = df[c_step] if c_step else range(len(df)) c_tr, c_va = col("train_acc","train_hard","train"), col("val_acc","val_hard","valid","val") c_lr, c_tok = col("edit_budget","lr","learning_rate","budget"), col("token","cost") fig, ax = plt.subplots(1, 3, figsize=(16,4)) if c_tr: ax[0].plot(x, df[c_tr], "o-", label="train acc") if c_va: ax[0].plot(x, df[c_va], "s-", label="val acc (gate)") if base and base["hard"] is not None: ax[0].axhline(base["hard"], ls="--", c="grey", label="baseline (seed)") ax[0].set_title("Skill accuracy over steps"); ax[0].set_xlabel("step"); ax[0].legend(); ax[0].grid(alpha=.3) if c_lr: ax[1].plot(x, df[c_lr], "d-", c="purple") ax[1].set_title("Edit-budget / LR schedule (cosine)"); ax[1].set_xlabel("step"); ax[1].grid(alpha=.3) if c_tok: ax[2].plot(x, pd.to_numeric(df[c_tok],errors="coerce").cumsum(), c="darkorange") ax[2].set_title("Cumulative token usage"); ax[2].set_xlabel("step"); ax[2].grid(alpha=.3) plt.tight_layout(); plt.savefig(f"{RUN}/training_dashboard.png", dpi=120); plt.show() We run the main SkillOpt training loop with the selected optimizer and target models. We configure important training settings such as epochs, batch size, minibatch size, learning rate, slow update, meta-skill, and data limit. We then read the training history, visualize accuracy, edit-budget behavior, and cumulative token usage on a dashboard. Inspecting Skill Evolution Copy CodeCopiedUse a different Browsersnaps = sorted(glob.glob(f"{RUN}/skills/skill_v*.md")) best = pathlib.Path(f"{RUN}/best_skill.md").read_text() print("\n" + "="*80 + f"\nSKILL EVOLUTION: {len(snaps)} snapshots; diff v0 -> best_skill\n" + "="*80) if snaps: diff = difflib.unified_diff(pathlib.Path(snaps[0]).read_text().splitlines(), best.splitlines(), snaps[0].split('/')[-1], "best_skill.md", lineterm="") print("\n".join(list(diff)[:120]) or "(no textual diff captured)") prot = re.search(r"(SLOW_UPDATE.*?)$", best, re.S) print("\n--- protected SLOW_UPDATE block ---\n", prot.group(1)[:1500] if prot else "(none — appears after an epoch boundary)") patch = (sorted(glob.glob(f"{RUN}/steps/step_*/patches/*.json")) or [None])[0] analy = (sorted(glob.glob(f"{RUN}/steps/step_*/analysis/*")) or [None])[0] print("\n" + "="*80 + "\nTEXTUAL GRADIENT — one aggregated patch (clipped to edit budget):\n" + "="*80) print(pathlib.Path(patch).read_text()[:1500] if patch else "(no patch files)") print("\n--- one raw Reflect-stage analysis ---\n", pathlib.Path(analy).read_text()[:1000] if analy else "(no analysis files)") for name in ("slow_update", "meta_skill"): files = sorted(glob.glob(f"{RUN}/{name}/epoch_*/*")) print(f"\n[{name}] {len(files)} artifact(s):", [pathlib.Path(f).name for f in files[:6]]) We inspect how the skill evolves during the optimization process. We compare the first saved skill snapshot with the final best skill, check whether a protected slow-update block appears, and review one generated patch and one reflection analysis. We also list the slow-update and meta-skill artifacts created during epoch-level training. Final Evaluation Comparison Copy CodeCopiedUse a different

Related

相關文章

Hugging Face BlogAI Agent

MosaicLeaks: Can your research agent keep a secret?

Back to Articles MosaicLeaks: Can your research agent keep a secret? Enterprise Article Published June 18, 2026 Upvote - Alexander Gurung agurung Follow ServiceNow Rafael Pardinas rafapi-snow Follow ServiceNow TL;DR Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent's external queries may leak sensitive information. MosaicLeaks proposes a new deep-research task with multi-hop questions that interleave public and private information. Across the models we tested, agents frequently leaked private information, and training only for task performance made it worse. We propose a mosaic-leakage-aware RL training method, Privacy-Aware Deep Research (PA-DR), which raises strict chain success (the share of chains

16 小時前
量子位AI Agent

騰訊老兵+大廠00後新銳,碼上飛想做的不只是AI Coding

這篇消息聚焦「騰訊老兵+大廠00後新銳,碼上飛想做的不只是AI Coding」。原始導語提到:已接入華為鴻蒙生態 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

16 小時前

21年老牌企服公司的AI實驗:讓Agent跑一遍流程

這篇消息聚焦「21年老牌企服公司的AI實驗:讓Agent跑一遍流程」。原始導語提到:司盟企服接入騰訊雲WorkBuddy後,將海外郵件管理、審計理賬、訂單審核等高頻交付流程交給Agent先跑一遍 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

18 小時前
TechWebAI Agent

曹操出行宣佈啟動全面AI轉型,組織升級向AI原生公司邁進

曹操出行在2026國際汽車及供應鏈博覽會 上宣佈啟動全面AI轉型,併發布RoboX戰略,打造全球領先的物理AI移動科技平臺。與此同時,公司正式啟動組織升級,加快向AI原生公司邁進。為推動全面AI轉型,今年上半年,公司推進戰略聚焦,持續優化業務結構,主動收縮非核心業務,加快向AI原生公司轉型。

20 小時前