MarkTechPost AIAI Agent

在 Microsoft SkillOpt 上實作編碼：儀表化提示詞優化、技能演化分析與基準比較

2026年6月10日 22:07

重點摘要

在本次教學中，我們實作了一個儀表化工作流程，應用於 Microsoft SkillOpt。首先設定 SkillOpt 儲存庫，將其連接到兼容 OpenAI 的模型存取，配置優化器與目標模型，並以受控的樣本限制執行 SearchQA 優化管線，以控制成本。我們先評估原始種子技能作為基準，然後執行真實的優化迴圈：透過展開、反思、聚合、選擇、更新及基於驗證的閘控機制，SkillOpt 逐步改進技能。過程中，我們檢查訓練歷程、視覺化準確率變化、檢視編輯預算行為、監控累計代幣用量，並將演化後的技能與原始基準進行比較。

站內 AI 整理稿

In this tutorial, we implement an instrumented workflow for Microsoft SkillOpt. We set up the SkillOpt repository, connect it to OpenAI-compatible model access, configure the optimizer and target models, and run the SearchQA optimization pipeline with a controlled sample limit to keep costs manageable. We first evaluate the original seed skill as a baseline, then run a real optimization loop in which SkillOpt improves the skill through rollout, reflection, aggregation, selection, updating, and validation-based gating. Along the way, we inspect the training history, visualize changes in accuracy, review edit-budget behavior, monitor cumulative token usage, and compare the evolved skill with the original baseline. SkillOpt Environment Setup Copy CodeCopiedUse a different Browserimport os, re, json, glob, subprocess, pathlib, difflib try: from google.colab import userdata OPENAI_KEY = userdata.get("OPENAI_API_KEY") except Exception: OPENAI_KEY = os.environ.get("OPENAI_API_KEY", "") OPENAI_KEY = OPENAI_KEY or "sk-PASTE-YOUR-KEY-HERE" assert OPENAI_KEY.startswith("sk-"), "Set a real OpenAI key (Colab Secrets -> OPENAI_API_KEY)." OPTIMIZER_MODEL = "gpt-4o" TARGET_MODEL = "gpt-4o-mini" RUN = "outputs/searchqa_adv" LIMIT = 24 RUN_KNOBS = dict(num_epochs=2, batch_size=8, minibatch=4, merge_batch=4, workers=2, lr=4, lr_sched="cosine", limit=LIMIT) if not pathlib.Path("/content/SkillOpt/scripts/train.py").exists(): subprocess.run("git clone --depth 1 https://github.com/microsoft/SkillOpt.git", shell=True, cwd="/content") subprocess.run('pip -q install -e . && pip -q install "openai>=1.0" pandas matplotlib', shell=True, cwd="/content/SkillOpt") os.chdir("/content/SkillOpt") os.environ["AZURE_OPENAI_ENDPOINT"] = "https://api.openai.com/v1" os.environ["AZURE_OPENAI_API_KEY"] = OPENAI_KEY os.environ["AZURE_OPENAI_AUTH_MODE"] = "openai_compatible" SPLIT = "data/searchqa_id_split" CFG = "configs/searchqa/default.yaml" COMMON = ["--azure_openai_endpoint","https://api.openai.com/v1", "--cfg-options","model.backend=azure_openai", "model.azure_openai_auth_mode=openai_compatible"] We prepare the full Colab environment for running SkillOpt. We load the OpenAI API key, define the optimizer and target models, clone the SkillOpt repository, and install the required dependencies. We also configure the OpenAI-compatible backend so the SkillOpt scripts can communicate with the selected models. Baseline Skill Evaluation Copy CodeCopiedUse a different Browserdef run_cli(args, tag): print("\n" + "#"*80 + f"\n# {tag}\n# $ " + " ".join(args) + "\n" + "#"*80) p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) buf = [] for line in p.stdout: print(line, end=""); buf.append(line) p.wait(); return "".join(buf) def parse_acc(txt): m = re.search(r"Results:\s*hard=([\d.]+)\s+soft=([\d.]+)", txt) if m: return {"hard": float(m.group(1)), "soft": float(m.group(2))} g = re.findall(r"hard=([\d.]+)", txt) return {"hard": float(g[-1]), "soft": None} if g else None seed = "skillopt/envs/searchqa/skills/initial.md" if not pathlib.Path(seed).exists(): seed = "baseline_skill.md"; pathlib.Path(seed).write_text("You answer questions from the given context.\n") base_out = run_cli(["python","scripts/eval_only.py","--config",CFG, "--skill",seed,"--split","valid_unseen","--split_dir",SPLIT, "--target_model",TARGET_MODEL,*COMMON, "env.workers=1",f"env.limit={LIMIT}"], "BASELINE EVAL (env seed skill, no training)") base = parse_acc(base_out) We define helper functions to run SkillOpt commands and extract evaluation accuracy from the output. We then locate the initial seed skill used by the SearchQA environment and evaluate it on the unseen validation split. This gives us a baseline result before any optimization or training takes place. Training And Visualization Copy CodeCopiedUse a different Browserk = RUN_KNOBS train_out = run_cli(["python","scripts/train.py","--config",CFG,"--split_dir",SPLIT, "--optimizer_model",OPTIMIZER_MODEL,"--target_model",TARGET_MODEL,"--out_root",RUN, *COMMON, "train.train_size=0", f"train.num_epochs={k['num_epochs']}", f"train.batch_size={k['batch_size']}", f"gradient.minibatch_size={k['minibatch']}", f"gradient.merge_batch_size={k['merge_batch']}", f"gradient.analyst_workers={k['workers']}", f"optimizer.learning_rate={k['lr']}", f"optimizer.lr_scheduler={k['lr_sched']}", "optimizer.use_slow_update=true", "optimizer.use_meta_skill=true", f"env.workers={k['workers']}", f"env.limit={k['limit']}"], "TRAIN (rollout->reflect->aggregate->select->update->gate; slow-update + meta-skill)") import pandas as pd, matplotlib.pyplot as plt hist = json.loads(pathlib.Path(f"{RUN}/history.json").read_text()) df = pd.json_normalize(hist) print("\nhistory.json columns:", list(df.columns)) def col(*cands): for c in cands: for actual in df.columns: if c in actual.lower(): return actual return None c_step = col("step") x = df[c_step] if c_step else range(len(df)) c_tr, c_va = col("train_acc","train_hard","train"), col("val_acc","val_hard","valid","val") c_lr, c_tok = col("edit_budget","lr","learning_rate","budget"), col("token","cost") fig, ax = plt.subplots(1, 3, figsize=(16,4)) if c_tr: ax[0].plot(x, df[c_tr], "o-", label="train acc") if c_va: ax[0].plot(x, df[c_va], "s-", label="val acc (gate)") if base and base["hard"] is not None: ax[0].axhline(base["hard"], ls="--", c="grey", label="baseline (seed)") ax[0].set_title("Skill accuracy over steps"); ax[0].set_xlabel("step"); ax[0].legend(); ax[0].grid(alpha=.3) if c_lr: ax[1].plot(x, df[c_lr], "d-", c="purple") ax[1].set_title("Edit-budget / LR schedule (cosine)"); ax[1].set_xlabel("step"); ax[1].grid(alpha=.3) if c_tok: ax[2].plot(x, pd.to_numeric(df[c_tok],errors="coerce").cumsum(), c="darkorange") ax[2].set_title("Cumulative token usage"); ax[2].set_xlabel("step"); ax[2].grid(alpha=.3) plt.tight_layout(); plt.savefig(f"{RUN}/training_dashboard.png", dpi=120); plt.show() We run the main SkillOpt training loop with the selected optimizer and target models. We configure important training settings such as epochs, batch size, minibatch size, learning rate, slow update, meta-skill, and data limit. We then read the training history, visualize accuracy, edit-budget behavior, and cumulative token usage on a dashboard. Inspecting Skill Evolution Copy CodeCopiedUse a different Browsersnaps = sorted(glob.glob(f"{RUN}/skills/skill_v*.md")) best = pathlib.Path(f"{RUN}/best_skill.md").read_text() print("\n" + "="*80 + f"\nSKILL EVOLUTION: {len(snaps)} snapshots; diff v0 -> best_skill\n" + "="*80) if snaps: diff = difflib.unified_diff(pathlib.Path(snaps[0]).read_text().splitlines(), best.splitlines(), snaps[0].split('/')[-1], "best_skill.md", lineterm="") print("\n".join(list(diff)[:120]) or "(no textual diff captured)") prot = re.search(r"(SLOW_UPDATE.*?)$", best, re.S) print("\n--- protected SLOW_UPDATE block ---\n", prot.group(1)[:1500] if prot else "(none — appears after an epoch boundary)") patch = (sorted(glob.glob(f"{RUN}/steps/step_*/patches/*.json")) or [None])[0] analy = (sorted(glob.glob(f"{RUN}/steps/step_*/analysis/*")) or [None])[0] print("\n" + "="*80 + "\nTEXTUAL GRADIENT — one aggregated patch (clipped to edit budget):\n" + "="*80) print(pathlib.Path(patch).read_text()[:1500] if patch else "(no patch files)") print("\n--- one raw Reflect-stage analysis ---\n", pathlib.Path(analy).read_text()[:1000] if analy else "(no analysis files)") for name in ("slow_update", "meta_skill"): files = sorted(glob.glob(f"{RUN}/{name}/epoch_*/*")) print(f"\n[{name}] {len(files)} artifact(s):", [pathlib.Path(f).name for f in files[:6]]) We inspect how the skill evolves during the optimization process. We compare the first saved skill snapshot with the final best skill, check whether a protected slow-update block appears, and review one generated patch and one reflection analysis. We also list the slow-update and meta-skill artifacts created during epoch-level training. Final Evaluation Comparison Copy CodeCopiedUse a different

原始來源：MarkTechPost AI ↗

查看原始來源

TechWebAI Agent

網易有道全面向AI轉型全場景Agent矩陣亮相圖博會

{"id":"39ef5947-b77a-4904-bf03-ff6264f08dc4","object":"response","model":"deepseek-v4-flash","output":[],"stop_reason":"max_output_tokens","usage":{"input_tokens":154,"output_tokens":200,"total_tokens":354}}

剛剛閱讀分析

Hugging Face BlogAI Agent

MosaicLeaks: Can your research agent keep a secret?

Back to Articles MosaicLeaks: Can your research agent keep a secret? Enterprise Article Published June 18, 2026 Upvote - Alexander Gurung agurung Follow ServiceNow Rafael Pardinas rafapi-snow Follow ServiceNow TL;DR Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent's external queries may leak sensitive information. MosaicLeaks proposes a new deep-research task with multi-hop questions that interleave public and private information. Across the models we tested, agents frequently leaked private information, and training only for task performance made it worse. We propose a mosaic-leakage-aware RL training method, Privacy-Aware Deep Research (PA-DR), which raises strict chain success (the share of chains

16 小時前閱讀分析