MarkTechPost AIAI Agent

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

2026年6月9日 05:53

重點摘要

站內 AI 整理稿

A new working research from Perplexity and Harvard offers field evidence on what AI agents do to knowledge work. It draws on production data from two Perplexity products: Search and Computer. The setup is a natural comparison. Search is a conversational answer engine. Computer is an agent that plans and executes tasks end to end. The same users touch both products, so the team can hold the task roughly constant. What the Study Actually Measures The research study covers a 90-day window, February 27 through May 27, 2026. Computer launched two days before that window opened. The core method matches near-identical query pairs across the two products. The research team found 10,000 session pairs with cosine similarity above 0.99. Each pair is effectively the same task attempted both ways. Computer pairs are gated to sessions that invoke an execution tool. These ‘do’ tools include code execution, browser actions, file writes, and connector calls. That gate ensures every Computer session does real autonomous work. Adoption rose over the window. Cumulative Computer queries reached 84× their first-week total. A matched analysis found Computer adoption also raised users’ daily Search queries by 1.05. The positive effect points to complementarity, not substitution. https://research.perplexity.ai/articles/how-ai-agents-reshape-knowledge-work The Cost-Structure Framework The research grounds its data in a simple task-based model. Each task has a step count, and longer tasks carry weakly higher value. Agents change the cost structure. They charge a higher fixed cost per task, for delegation and review. But they charge a lower marginal cost per step, since the system executes. This produces a breakeven step count. Below it, the conversational mode is cheaper. Above it, the agent mode wins. Short lookups stay manual; long workflows move to the agent. Autonomy: 26 Minutes vs 33 Seconds The first autonomy measure is execution time. Computer runs 26 minutes of machine work per session. Search runs 33 seconds. That is a 48× gap. Medians show the same pattern: 9 minutes versus 14 seconds. The gap varies by domain. Local tasks show 75×; Science shows 26×, since plain answers often suffice. Higher autonomy did not lower quality here. The research team scored next-turn dissatisfaction from what users do next. Computer’s meaningful dissatisfaction rate was 1.3%, against 2.9% for Search (55% reduction). Follow-up turns also shift toward review and extension on Computer, though the changes are small. Connector usage rose more clearly. Computer invoked at least one connector in 7.9% of sessions, versus 1.8% for Search. Computer chains external tools that Search users would otherwise run by hand. Efficiency: Where the Savings Come From The efficiency section estimates a Search + Human counterfactual. A human with Search alone takes 269 minutes per matched task. Computer + Human takes 36 minutes. That is 87% less time and 94% less cost overall. Cost savings exceed time savings because domain wages amplify the effect. Computer’s model cost runs $4–10 per task; Search runs about $0.05. The marginal numbers support the framework. Computer + Human costs $0.16 per step, versus $2.05 for Search + Human. Matched Computer sessions also ran longer prompts, 652 versus 448 characters at the median. That supports the higher fixed-cost assumption for agents. Breakeven analysis says a professional must finish all manual steps in under 20 minutes to match Computer. The research team cross-checked with an independent LLM estimate and user interviews. The LLM method found 84% time and 93% cost savings. Interviewees reported speedups from 5× to 300×. Horizontal and Vertical Expansion Scope is where this research extends past prior work. Autonomy does not just speed up tasks. It changes which tasks users attempt. Horizontally, Computer queries cross occupational lines more often. Cross-occupation share averaged 59% on Computer, versus 50% on Search. Management and Entrepreneurship showed the largest gap, at 19 points. Vertically, Computer queries are more demanding. On Bloom’s Revised Taxonomy, 76% required higher-order cognition, versus 55% for Search. Create-level work was 50% of Computer queries, against 26%. Computer tasks also span more knowledge domains. Each query touched 2.40 O*NET Knowledge domains on average, versus 1.74. It was nearly three times as likely to need three or more domains. Composability climbs as the O*NET hierarchy gets finer. At the Task Statement level, Computer engaged 60% more activities. About 23% of Computer queries hit a Task Statement that the same users never sent to Search. https://research.perplexity.ai/articles/how-ai-agents-reshape-knowledge-work Comparison Table: Search vs Computer DimensionPerplexity SearchPerplexity ComputerMode in the frameworkConversational answer engineAgent orchestratorMachine time per session33 seconds (median 14s)26 minutes (median 9m)Queries per session2.85.3Meaningful (mid+high) dissatisfaction2.9%1.3%Sessions with a connector call1.8%7.9%Counterfactual task time269 min (Search + Human)36 min (Computer + Human)Cost per step$2.05$0.16Model cost per task~$0.05$4–10Cross-occupation query share50%59%Higher-order Bloom cognition55%76%O*NET Knowledge domains per query1.742.40 Key Takeaways Computer runs 26 minutes of autonomous work per session versus 33 seconds for Search, a 48× gap. On matched tasks, Computer + Human cuts estimated time 87% and cost 94% versus Search + Human. Computer’s meaningful dissatisfaction rate is 1.3% versus 2.9% for Search, a 55% reduction. Computer queries cross occupations more (59% vs 50%) and demand more higher-order cognition (76% vs 55%). About 23% of Computer queries hit a Task Statement the same users never sent to Search. Marktechpost’s Visual Explainer #mtp-harvard-agents *{box-sizing:border-box!important;margin:0;padding:0} #mtp-harvard-agents hr,#mtp-harvard-agents p:empty,#mtp-harvard-agents del,#mtp-harvard-agents s{display:none!important} #mtp-harvard-agents{ --crimson:#A51C30;--crimson-deep:#7A1420;--crimson-darker:#5E0F18; --ink:#1E1E1E;--ink-soft:#4A4A4A;--ivory:#FBF7F1;--line:#E7DDD2; font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif!important; background:linear-gradient(135deg,#7A1420 0%,#5E0F18 100%)!important; color:#FBF7F1!important;border:1px solid #5E0F18!important;border-radius:16px!important; padding:26px!important;max-width:860px;margin:24px auto;line-height:1.5; box-shadow:0 10px 30px rgba(94,15,24,.25)!important; } #mtp-harvard-agents .mtp-head{display:flex;align-items:center;justify-content:space-between;gap:12px;margin-bottom:16px} #mtp-harvard-agents .mtp-eyebrow{font-size:11px;letter-spacing:2.4px;text-transform:uppercase;font-weight:700;color:#F4C9CF!important} #mtp-harvard-agents .mtp-badge{font-size:11px;font-weight:700;letter-spacing:1px;color:#FBF7F1!important;border:1px solid rgba(251,247,241,.35)!important;border-radius:999px!important;padding:4px 12px!important} #mtp-harvard-agents .mtp-stage{position:relative;background:#FBF7F1!important;border-radius:12px!important;border:1px solid #E7DDD2!important;overflow:hidden} #mtp-harvard-agents .mtp-slide{display:none;padding:30px 32px 34px;min-height:362px} #mtp-harvard-agents .mtp-slide.is-active{display:block;animation:mtpfade .4s ease} @keyframes mtpfade{from{opacity:0;transform:translateY(6px)}to{opacity:1;transform:none}} #mtp-harvard-agents .mtp-accent{width:46px;height:4px;background:#A51C30!important;border-radius:2px!important;margin-bottom:16px} #mtp-harvard-agents .mtp-num{font-size:12px;font-weight:700;letter-spacing:1px;color:#A51C30!important;margin-bottom:8px} #mtp-harvard-agents h2{font-family:Georgia,"Times New Roman",serif!important;font-size:25px;line-height:1.2;color:#1E1E1E!important;margin-bottom:6px;font-weight:700} #mtp-harvard-agents .mtp-sub{font-size:14px;color:#4A4A4A!important;margin-bottom:18px} #mtp-harvard-agents .mtp-cover h2{font-size:32px;margin-bot

原始來源：MarkTechPost AI ↗

查看原始來源

TechWebAI Agent

網易有道全面向AI轉型全場景Agent矩陣亮相圖博會

{"id":"39ef5947-b77a-4904-bf03-ff6264f08dc4","object":"response","model":"deepseek-v4-flash","output":[],"stop_reason":"max_output_tokens","usage":{"input_tokens":154,"output_tokens":200,"total_tokens":354}}

剛剛閱讀分析

Hugging Face BlogAI Agent

MosaicLeaks: Can your research agent keep a secret?

Back to Articles MosaicLeaks: Can your research agent keep a secret? Enterprise Article Published June 18, 2026 Upvote - Alexander Gurung agurung Follow ServiceNow Rafael Pardinas rafapi-snow Follow ServiceNow TL;DR Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent's external queries may leak sensitive information. MosaicLeaks proposes a new deep-research task with multi-hop questions that interleave public and private information. Across the models we tested, agents frequently leaked private information, and training only for task performance made it worse. We propose a mosaic-leakage-aware RL training method, Privacy-Aware Deep Research (PA-DR), which raises strict chain success (the share of chains

17 小時前閱讀分析