Hugging Face Blog生成式AI

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

2026年6月1日 13:51

重點摘要

站內 AI 整理稿

Back to Articles Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic Enterprise Article Published June 1, 2026 Upvote - Fuller nfuller Follow ibm-research Guides have aided humanity throughout history. Prehistoric civilizations understood that the sun and the moon could be used to navigate vast distances on land and the high seas. Over time, various journeys facilitated the production of maps for better planning and faster travel time to repeat destinations. Centuries later, the introduction of the compass enabled seagoers to achieve greater accuracy in seeking unexplored destinations. And today, GPS navigation apps guide our every journey. In today’s world of agentic AI, AI agents, admittedly, have the potential to enable scalable AI adoption, transforming industries as we know them. However, an intelligent guide, agentic logic, is needed to realize this potential by fueling high agent quality, cost-effectiveness, and consequent end-user trust. Enterprise Workflows & Use Cases Numerous studies have cited the overwhelming failure of AI pilots, while others have also highlighted the need for AI to operate at the core of enterprise workflows to enable scalable adoption. [1] [2] To better understand this phenomenon and the associated assertion, some analysis of enterprise workflows is required. These workflows are: A. Dynamic and long-running B. Possess a plethora of APIs, databases and services C. Oftentimes are constrained by business policies and/or regulations For an agent to function effectively, given these above characteristics, naturally demands an expanded model context, which state-of-the-art frontier LLMs certainly possess, but at what tradeoff? Increased hallucinations, token consumption? Further, can LLMs be equipped with an intelligent guide, GPS, to enable agentic AI execution at the core of the workflow, driving more desirable outcomes? We tested these hypotheses by designing and building agents, equipped with pertinent agent logic, for IBM offerings fully considering the above characteristics. These offerings pertain to some of the most challenging tasks confronting subject matter experts who own various stages of the enterprise software delivery lifecycle for mission critical workloads including: Understanding applications written in legacy code (Cobol / PL/1) Expediting test generation for developers Proactively responding to incidents and enabling shift-left app resiliency Automating compliance modernization for critical environments Before examining each of these domains in detail, let us define what characterizes agent logic. Agent logic is software primitives, such as knowledge graphs, algorithms, program analysis libraries, which operate at the agentic layer (within an agent harness) and can intentionally steer the LLM in the direction of the enterprise workflow, reducing the context space. In so doing, have strong tendency to drive more performant outcomes in a more cost-effective manner. Let us now examine how agent logic is able to achieve such outcomes in each of the above four domains. Understanding applications written in legacy code (Cobol / PL/1) - program analysis.[3] IBM watsonx Code assistant for Z (WCA4Z), used to accelerate mainframe application development and modernization with AI and automation, is equipped with an App Insights agent for application understanding - one of the primary focus areas of enterprise clients running mission critical workloads on IBM mainframe. This agent leverages deep static analysis across the application and stores a pre-indexed representation in a database schema that spans hundreds of interrelated tables with complex semantics, allowing the agent to retrieve precise, structured already available information; thereby improving answer accuracy, reducing token usage, and minimizing back-and-forth interactions with the language model (Mistral Medium 250B in this instance). This approach when applied to multiple mission-critical legacy systems (up to 1M lines of code and 1K programs) maintains marginally superior app understanding performance with ~30× lower token consumption than a baseline frontier LLM-only approach. Expediting test generation for developers with Aster - program analysis. [4], [5] Aster is an IBM proprietary program analysis and data pre- and post-processing-based library utilized for agent-based generation of unit, integration, API and change-based tests; which from analysis of multiple developer communities achieves higher developer ratings compared with various open-sourced tools or developer-written tests. Based on the latter and superior line, branch and method coverage benchmarks compared with similar open-sourced tools (integration tests) and zero-shot LLMs and coding agents (unit tests), all tested on open-sourced applications, we have been running Aster in pre-production mode on 75+ java IBM CIO applications (up to 560+ classes and 67K+ lines of code) with Devstral 24B model. Steady-state results to date yield +20% - 45% improvement in line, branch and method coverage coupled with superior performance on a subset of these apps compared with state-of-the-art coding agent with orders of magnitude lower token consumption (up to 15×). The rationale for these results is that the program analysis output (used to prompt and “focus” the LLM) coupled with sub-agents for augmenting coverage and remediating runtime and compilation errors enable a more performant outcome with significant cost reduction. Proactively responding to incidents and enabling shift-left app resiliency - knowledge graphs, program analysis libraries and investigation (observability) - driven orchestration. [6],[7] While LLM context for app-related use cases as described in 1 and 2 are “restricted” to the app source code, for runtime management of apps on deployed infra, the underlying IT full stack comes into play. Here we define a knowledge graph (KG) encompassing entities (microservices, database/middleware services, MELT etc.) coupled with embedded (“tribal”) knowledge from domain experts. With such a graph and bounding the LLM to local bound reasoning for non-deterministic outcomes, an observability-driven approach is used to achieve reduced context space spanning the IT stack and underlying app source code (if relevant) for incident root cause analysis (and other use cases). With this approach, leveraging the equivalent Instana data model, we have seen the proprietary Instana “I3” (intelligent incident investigation [8]) agent achieve up to 4.0× improvement over ReAct agent with GPT-5.1 as measured using ITBench [9]. With Gemini 3 Flash the ReAct agent performance improves to within 17% lower than the I3 agent while consuming 1.6× more tokens, We have extended this approach to source code with agents for code analysis (leveraging program dependency graphs) and bug remediation (leveraging inference scaling), also tested on ITBench, illustrating superior performance for the source code analysis and bug remediation agents (Gemini 2.5 Flash) over state-of-the-art coding agent both for finding the culpable microservice (3.0×) and bug repair (1.6×) while consuming respectively 3.7× and 5.9× less tokens. This multi-agent system was announced at IBM Think as part of the newly unveiled IBM Concert Platform for shift-left IT Operations and is also being piloted internally with IBM CIO. [10] Automating IT compliance modernization for critical environments - algorithms and adaptive planning and orchestration. [11] Enterprises face increasingly complex and fragmented compliance requirements, forcing teams to spend considerable time manually creating controls, assessments and remediation plans. No centralized knowledge exists and fixes are written manually, which introduces a risk of errors and security gaps. Because compliance work is complex and multi-step, it requires coordinated policy-driven automation across specialized agents rather than manual effort or simple AI prompts. Our mu

原始來源：Hugging Face Blog ↗

查看原始來源

鈦媒體生成式AI

Edge AI Daily 早報（6月19日）

AI Engineer World's Fair 2026規模再創新高，標誌AI工程從幕後走向舞臺中央。行業面臨結構性調整：楊立昆警示OpenAI年虧210億美元揭示商業模式脆弱性，Transformer之父轉投OpenAI反映人才爭奪白熱化。Anthropic多線佈局——語音支持七種語言、加入碳清除聯盟、落子首爾辦事處，展現生態擴張野心。監管壓力加劇，意大利依據DMA調查蘋果iCloud，巴西開放iOS側載佣金降至5%，蘋果圍牆花園持續崩塌。

2 小時前閱讀分析

36氪生成式AI

今天起，Claude Design要把設計師和程序員變成同一種人了

猝不及防！Anthropic深夜甩出Claude Design大更新，設計系統一鍵導入，代碼雙向同步，9大平臺一鍵導出。Anthropic設計師親自下場錄屏：AI跑了八輪自查，才敢把設計稿給你看。

15 小時前閱讀分析

IT之家生成式AI

OpenAI 成為 Rust 基金會白金會員，合計贊助 60 萬美元

OpenAI 正式成為 Rust 基金會白金會員，將提供總計 60 萬美元資金，用於支持 Rust 開源項目維護者及 Rust 創新實驗室等計劃。這標誌著 AI 巨頭對安全、高效系統編程語言的重視。 #OpenAI #Rust #開源

18 小時前閱讀分析

IT之家生成式AI

Claude Design 上線首周用戶破百萬，和 Claude Code 共享 AI 配額

Anthropic 今天（6 月 18 日）發佈公告，在宣佈 Claude Design 上線首周用戶規模突破 100 萬後，進一步強化和 Claude Code 的雙向聯動，實現從設計到編程的無縫工作流。

19 小時前閱讀分析

智東西生成式AI

谷歌時隔6年再發智能音箱，Gemini上桌，售價不到700元

智東西編譯 | 劉煜編輯 | 陳駿達智東西6月18日消息，谷歌昨日宣佈，其首款搭載居家版Gemini語音助手的智能音箱（Google Home Speaker）已開啟預售，將於當地時間6月25日正式上市，售價為99.99美元（約合人民幣677.03元）。在此之前，谷歌已有6年沒有推出過獨立智能音箱產品。谷歌這款智能音箱外觀近似球形，風格類似亞馬遜新一代Echo音箱與蘋果舊款音箱HomePod Mini。 ▲谷歌智能音箱（圖源：谷歌官網）使用音箱時，用戶只需通過口令“Hey Google”或“OK Google”喚醒Gemini，就可以繼續下達相應指令。這與谷歌舊款音箱、智能顯示屏等喚醒語音助手的方式相同。此外，用戶只要按照日常說話習慣下達命令，Gemini便能理解用戶意圖，相比之前大大提升溝通效率。一、加強短時對話記憶，會員可與Gemini不限次數對話谷歌此次推出的全新音箱升級諸多功能。其中，音箱搭載的Gemini語音助手擁有10款全新擬人化語音音色，用戶可以根據喜好自行選擇聲線。音箱還可支持用戶一次性下達多條語音指令，即使指令未能說對、說完整，用戶中途改口Gemini也能識別。 Gemini還具備多鏈路推理能力，落地到實際生活場景中比較實用。例如，用戶問：“我支持的足球隊下場比賽天氣如何？”Gemini收到指令後，會自動查詢賽事時間、舉辦地點，同時匹配相應時段天氣，再給出答覆。同時，Gemini加強了短時對話記憶，能承接上下文實現連續對話功能。即使用戶連續追問、甚至串聯多項任務、不重複交代前置條件，該語音助手也能實現來回連貫交流。 ▲谷歌Gemini對話場景（圖源：谷歌官網）不僅如此，Gemini搭配的連續對話功能，能讓應答後的音箱麥克風保持短暫收音，用戶無需重複喊“OK Google”就能繼續提問。該功能現已全面支持所有Gemini原生適配的語言，包括

22 小時前閱讀分析