Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference
重點摘要
Liquid AI shipped LFM2.5-230M, it’s the company’s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face. The pitch is narrow on purpose. This is not a general reasoning model. It is built for data extraction and tool use on edge hardware. TL;DR Liquid AI’s LFM2.5-230M is its smallest model yet: 230M params, open-weight, built on LFM2. Runs on-device at 213 tok/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5. Beats larger models (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and data extraction. Tuned for tool use and extraction; not for math, code generation, or creative writing. Day-one support across llama.cpp, MLX, vLLM, SGLang, a
Liquid AI shipped LFM2.5-230M, it’s the company’s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face. The pitch is narrow on purpose. This is not a general reasoning model. It is built for data extraction and tool use on edge hardware. TL;DR Liquid AI’s LFM2.5-230M is its smallest model yet: 230M params, open-weight, built on LFM2. Runs on-device at 213 tok/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5. Beats larger models (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and data extraction. Tuned for tool use and extraction; not for math, code generation, or creative writing. Day-one support across llama.cpp, MLX, vLLM, SGLang, and ONNX, with a 293–375 MB footprint. What is LFM2.5-230M? LFM2.5-230M is a 230-million-parameter, text-only model. It is built on the LFM2 architecture. The model has 14 layers total. Eight are double-gated LIV convolution blocks. The remaining six are grouped-query attention (GQA) blocks. The hybrid layout targets fast CPU inference. The context length is 32,768 tokens. The vocabulary size is 65,536. The knowledge cutoff is mid-2024. It supports ten languages, including English, Chinese, Arabic, and Japanese. Liquid AI team ships two checkpoints. LFM2.5-230M-Base is the pre-trained model for fine-tuning. LFM2.5-230M is the general-purpose instruction-tuned version. The license is lfm1.0. Training and Post-Training The model was pre-trained on 19 trillion tokens. That total includes a 32K context extension phase. The post-training recipe then runs in three stages. First comes supervised fine-tuning with distillation from the larger LFM2.5-350M. Second is direct preference optimization (DPO). Third is multi-domain reinforcement learning. This preserves flexibility for downstream specialization. The distillation step is what keeps a 230M model competitive with larger checkpoints. It inherits behavior from the bigger LFM2.5-350M on targeted tasks. Benchmark Liquid AI team evaluated LFM2.5-230M across ten benchmarks. They span knowledge, instruction following, data extraction, and tool use. The instruction-following results support that. On IFEval, LFM2.5-230M scores 71.71. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, ahead of both. On CaseReportBench, a clinical data-extraction test, it scores 22.51. ModelParamsIFEvalIFBenchCaseReportBenchBFCLv4MMLU-ProLFM2.5-230M230M71.7138.4022.5121.0320.25LFM2.5-350M350M76.9640.6932.4521.8620.01Granite 4.0-H-350M350M61.2717.2212.4413.2813.14Qwen3.5-0.8B (Instruct)800M59.9422.8713.8318.7037.42Gemma 3 1B IT1B63.4920.332.287.1714.04 LFM2.5-230M leads on instruction following and data extraction. It trails on broad knowledge: MMLU-Pro is 20.25, behind Qwen3.5-0.8B’s 37.42. It is also weak on some agentic tool use. On τ²-Bench Telecom it scores just 5.26. Liquid AI is direct about the limits. It does not recommend the model for reasoning-heavy workloads. That means advanced math, code generation, and creative writing. Use Cases With Examples The model fits two jobs well. The first is large-scale data extraction pipelines. Picture a pipeline parsing 100,000 clinical reports into structured fields. A 4-bit build with a 293–375 MB memory footprint runs that on commodity CPUs. You extract locally, with no per-token API bill. The second job is lightweight on-device agentic workloads. Think a home automation hub that turns speech into tool calls. Or a phone assistant that routes a request to the right function. As an early signal, Liquid AI deployed the model on a Unitree G1 humanoid robot. It ran entirely on the robot’s onboard NVIDIA Jetson Orin. There the model acted as a skill-selection layer. It turned one natural-language instruction into a sequence of tool calls. Those calls invoked low-level skills from NVIDIA’s SONIC framework. Tool Use: How It Works LFM2.5 supports function calling in four steps. You define tools as JSON in the system prompt. The model writes a Pythonic function call between special tokens. You execute the call and return the result. The model then writes a plain-text answer. By default the call is a Python list. It sits between the <|tool_call_start|> and <|tool_call_end|> tokens. Here is the documented pattern, with the tool JSON abbreviated: Copy CodeCopiedUse a different Browser<|im_start|>system List of tools: [{"name": "get_candidate_status", "parameters": {"candidate_id": {"type": "string"}}}]<|im_end|> <|im_start|>user What is the current status of candidate ID 12345?<|im_end|> <|im_start|>assistant <|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|> You can also force JSON-formatted calls through the system prompt. Running It: A Minimal Example The model works with Transformers 5.0.0 and up. The recommended generation settings are temperature 0.1, top_k 50, and repetition_penalty 1.05. Note the do_sample=True flag, which is required for those sampling settings to apply. Copy CodeCopiedUse a different Browserfrom transformers import AutoModelForCausalLM, AutoTokenizer model_id = "LiquidAI/LFM2.5-230M" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", dtype="bfloat16", ) tokenizer = AutoTokenizer.from_pretrained(model_id) inputs = tokenizer.apply_chat_template( [{"role": "user", "content": "What is C. elegans?"}], add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) output = model.generate( **inputs, do_sample=True, temperature=0.1, top_k=50, repetition_penalty=1.05, max_new_tokens=512, ) print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) Liquid AI also publishes fine-tuning recipes. They cover SFT, DPO, and GRPO with LoRA, via Unsloth and TRL. Each ships as a Colab notebook. Interactive Explainer (function(){ window.addEventListener("message",function(e){ if(e.data&&e.data.type==="lfm-resize"){ var f=document.getElementById("lfm25-frame"); if(f&&e.data.height){f.style.height=e.data.height+"px";} } }); })(); Check out the Model weight on HF, Technical details and Docs. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference appeared first on MarkTechPost.
Related
相關文章

四大頂級 AI 對決《文明 VI》!Claude「核平」法國,結果還是輸了
英國前首相府數據科學家將 Claude、GPT、Gemini 等四大 AI 模型投入《文明 VI》進行 23 場治國模擬。Claude 扮演的葡萄牙在即將外交勝利時,因過度關注法國文化威脅,耗費 50 回合研發核彈摧毀圖盧茲,卻最終因外交分落後而輸掉比賽。測試揭示了 AI 普遍存在的“感知盲區”和“知行差距”兩大工程瓶頸。#AI 治理# #文明 VI#

Anthropic 調研:約半數 Claude 用戶稱 AI 已可承擔一半以上的工作
Anthropic 調研顯示,約半數 Claude 用戶認為 AI 已能承擔其 50% 以上工作。營銷文案、博客創作和數據庫查詢是 AI 應用最廣的場景。儘管職場新人擔憂被替代,但重度使用者卻對個人價值提升持樂觀態度。#AI 職場# #Claude#
安全預警系統,看不見的滴滴「基建」
近日,滴滴在成都辦了一場安全開放日,多個司機、乘客、行業專家和媒體被邀請走進滴滴安全預警中心,現場體驗一線安全專家如何聽音、研判和幹預訂單風險。這也是滴滴少有地把網約車安全後臺攤開給外界看。滴滴創始人、CEO、集團安委會主任程維在活動中表示,保障每天數千萬訂單的安全,是滴滴身上的責任。滴滴安全體系已經迭代多次,但仍然“還在路上”。他希望通過開放日,讓外界看到滴滴如何做安全,也給平臺提出更多意見。網約車安全和常規互聯網推薦系統不同。推薦、搜索算法通常要在準確率和召回率之間取得平衡,但安全算法不一樣。滴滴網約車技術和安全負責人曹樂表示,安全風險是低頻事件,但只要平臺有能力召回,就不能為了提高準確率而漏掉風險。這意味著,滴滴安全系統必須優先保證召回率。然而,代價是大量誤判。據曹樂介紹,即便經過大模型多輪篩查,進入安全預警中心的訂單,絕大多數最終仍然是安全訂單。真正有風險的比例,可能只有幾千分之一,甚至幾萬分之一。但這套系統仍然需要持續運轉。目前,滴滴一方面投入大量GPU資源做大模型前置篩查,另一方面配置了4000多名安全相關工作人員,處理風險工單。滴滴內部每年在安全上的投入,已經超過去年集團淨利潤。曹樂稱,在滴滴,安全投入幾乎是唯一不會被質疑ROI的投入方向。原因是2018年順風車事件之後,滴滴重新搭建整套安全體系。此後幾年,安全投入即使成本很高,也沒有被壓縮。平臺的目標不是證明每一筆安全投入都能產生收入,而是儘可能不漏掉任何一個可識別風險。從叫車開始,到司乘真正分離滴滴現在對“行程安全”的定義,已經不只是上車到下車。在安全專家眼裡,一次行程從用戶叫車成功開始,到司機和乘客安全分離結束。這決定了滴滴安全體系覆蓋的是行前、行中和行後。行前,平臺會對司機、車輛證件和人車一致性進行核驗。司機每天出車前,還需要完成身份、車輛、安全教育等檢查。如果司機有過輕微安全問題,平臺會推送專項安全教
vivo“再造”摺疊屏
手機行業從來不缺挑戰。自從2023年中國科技產業開啟了所謂的“大模型狂飆時代”後,從互聯網巨頭到傳統製造業,從手機廠商到汽車新勢力,中國科技行業迅速陷入了一場前所未有的焦慮當中。在行業落地方向尚未明確時,各行各業均選擇了“先上車後補票”的方式乘上這趟“快車”。手機作為最接近消費者的入口,同樣也成為了各行各業眼中的“必爭之地”。手機廠商如何在激烈的競爭中佔據領先地位,早已成為了新的課題。但對手機廠商而言,競爭是立體的,既是向內,也是向外。向內,手機廠商之間的競爭已進入消耗戰。除了卷參數、卷配置、卷價格三條路徑,各家同樣開始了AI時代的內卷,大量投入研發資源,爭取奪下“AI手機”的心智高地。向外,OpenAI正以65億美元收購蘋果前首席設計官Jony Ive創立的io Products團隊,並準備推出自有品牌智能手機;字節的豆包手機隨著第一代的爆火,也即將推出第二代產品。階躍星辰、追覓Eclix同樣也在向著這一領域進發。向內是紅海,向外是未知數。AI時代需要新的產品形態支撐產品價值,只是從AI的發展速度來看,留給手機廠商的機會並不充裕,想要成為牌桌上的“領頭羊”,這需要手機廠商有清晰的洞察和產品能力,且每一步都極為關鍵。今年年初,vivo總裁、首席運營官胡柏山曾對外透露過vivo對於未來產品發展的初步規劃,並提出了“Agent Phone”的技術構想,表明了下一代智能手機應該做到:能夠主動理解用戶需求、自動完成任務、智能調度服務。彼時,vivo將相機Agent能力搭載在了新一代旗艦機型X300 Ultra 與X300s上,除了專業的影像能力進一步提升,從某種程度來說,這也是Agent Phone的階段性成果展示。但胡柏山也直言:“X300 Ultra會展示Agent Phone的核心能力,但它還不是終極形態。”隨著這兩臺手機在消費者群體中廣受好評,如今vivo決定向著“Agen

谷歌 Gemini 助手新功能:直接幫用戶安裝 Play 商店內的應用
谷歌宣佈將 Play 商店集成至 Gemini 助手,用戶現在可以直接在聊天窗口內搜索、篩選並安裝應用,甚至完成內購,無需跳轉。這項功能旨在解決海量應用庫中“大海撈針”的痛點,讓找 App 變得更簡單。#Gemini# #GooglePlay#

GPT-5.6來了:旗艦版碾壓 GPT-5.5,價格卻沒漲
這篇消息聚焦「GPT-5.6來了:旗艦版碾壓 GPT-5.5,價格卻沒漲」。原始導語提到:OpenAI把GPT-5.6做成了像蘋果芯片一樣的產品矩陣。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。