Experimenting with the proposed Cross-Origin Storage API in Transformers.js
重點摘要
Back to Articles Experimenting with the proposed Cross-Origin Storage API in Transformers.js Published June 23, 2026 Update on GitHub Upvote 1 Thomas Steiner tomayac Follow google (This is a guest post by Developer Relations Engineer Thomas Steiner from the Chrome team at Google.) Transformers.js provides Web developers with a simple way to use the power of transformers in their Web apps through task-specific pipelines. To run inference in the browser, developers create an instance of pipeline() and specify a task they want to use the pipeline for. As a concrete example, the following snippet shows how to set up an automatic speech recognition (ASR) pipeline. import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]'; const asr = await pipeline( 'automatic-spee
Back to Articles Experimenting with the proposed Cross-Origin Storage API in Transformers.js Published June 23, 2026 Update on GitHub Upvote 1 Thomas Steiner tomayac Follow google (This is a guest post by Developer Relations Engineer Thomas Steiner from the Chrome team at Google.) Transformers.js provides Web developers with a simple way to use the power of transformers in their Web apps through task-specific pipelines. To run inference in the browser, developers create an instance of pipeline() and specify a task they want to use the pipeline for. As a concrete example, the following snippet shows how to set up an automatic speech recognition (ASR) pipeline. import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]'; const asr = await pipeline( 'automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' }, ); const result = await asr('jfk.wav'); console.log(result); The cache challenge You will notice in the source code that I specified Xenova/whisper-tiny.en as the model, which is a very decent choice for common English automatic speech recognition tasks. In fact, it's even the default model according to the Transformers.js default model resolution, as per the linked excerpt. Model resources When you run this example in the browser, Transformers.js automatically takes care of downloading and caching the relevant model resources and Wasm files. The following screenshot shows the Chrome DevTools Cache storage section after visiting the app. When you reload the page, the resources are served from the Cache API, and the model returns results almost instantly. However, Xenova/whisper-tiny.en being a popular model (and, as mentioned before, even being the ASR default model in Transformers.js), you can well imagine that more than just one app that you visit would use it. To simulate this situation, here's the same example app from before, but served from a different origin. When you visit this different origin app, rather than being usable almost instantly, the browser instead has to download and cache all the model resources again, even if they're byte-by-byte the same as before. Even in this toy example, this adds up to 177 MB of duplicate download and storage, as you can examine in the Storage section of the Chrome DevTools Application panel. You can imagine that this quickly adds up. Wasm runtime resources But it gets worse. Let's add a second pipeline to the toy example: sentiment analysis. Sentiment analysis by default uses the Xenova/distilbert-base-uncased-finetuned-sst-2-english model. By not specifying the model, Transformers.js' default model resolution automatically picks it for you. const classifier = await pipeline('sentiment-analysis'); const sentiment = await classifier(result.text); pre.append('\n\n' + JSON.stringify(sentiment, null, 2)); Two entirely different AI models, but they depend on the same 4,733 kB ort-wasm-simd-threaded.asyncify.wasm WebAssembly (Wasm) runtime file from the underlying ONNX Runtime library that Transformers.js is built on top of. Open the extended demo on a different origin, and you will notice in the Network tab how also the Wasm runtime gets downloaded and cached again. So even if you run apps that don't share the same AI models, your browser still makes redundant requests for shared Wasm resources you already have, and on top of that also caches them again, which consumes space on your hard disk. Cache isolation AI model resources serving By default, AI model resources come from the Hugging Face Hub, and ultimately the Hugging Face CDN. The browser makes a request for a resource like https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json which then gets redirected to the final CDN URL like https://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22 in this case. Wasm runtime resources serving The Wasm runtime resources are served from the jsDelivr CDN by default. For example, ort-wasm-simd-threaded.asyncify.wasm comes from https://cdn.jsdelivr.net/npm/[email protected]/dist/ort-wasm-simd-threaded.asyncify.wasm at the time of this writing. Now you may say that if different apps, even though running on different origins, in the end serve their resources from the same CDN URLs, caching shouldn't be a problem, as long as the final URLs are the same. Unfortunately, this is not how caching works in browsers for a long time. The article Gaining security and privacy by partitioning the cache goes into all the details, but essentially, caches are isolated by origin to prevent timing attacks: the time a website takes to respond to HTTP requests can reveal that the browser has accessed the same resource in the past, which makes the browser vulnerable to security and privacy leaks. Chrome's implementation The concrete implementation may vary by browser, but in Chrome, cached resources are keyed using a Network Isolation Key in addition to the resource URL. The Network Isolation Key is composed of the top-level site and the current-frame site. Take the previous toy examples hosted on the origins https://googlechrome.github.io and https://rawcdn.rawgit.net. If they both use the Wasm runtime from https://cdn.jsdelivr.net/npm/[email protected]/dist/ort-wasm-simd-threaded.asyncify.wasm, their cache keys will look like in the following table. Network Isolation Key Resource URL Top-level site Current-frame site https://googlechrome.github.io https://googlechrome.github.io https://cdn.jsdelivr.net/npm/[email protected]/dist/ort-wasm-simd-threaded.asyncify.wasm https://rawcdn.rawgit.net https://rawcdn.rawgit.net https://cdn.jsdelivr.net/npm/[email protected]/dist/ort-wasm-simd-threaded.asyncify.wasm So even if the resource URLs are exactly the same, since the Network Isolation Keys don't match, there's no cache hit, which means duplicate download and duplicate storage. This is the challenge that the Cross-Origin Storage proposal aims to tackle. Enter the Cross-Origin Storage API 💡 Note: The Cross-Origin Storage API is an early-stage proposal that isn't final. While the proposed API is not yet natively implemented in any browser, you don't have to wait to experiment with it. Install the Cross-Origin Storage extension to inject the navigator.crossOriginStorage polyfill on all pages and test the complete flow. The proposed Cross-Origin Storage (COS) API introduces a dedicated navigator.crossOriginStorage interface through which web apps can store and retrieve large files across origin boundaries, identified not by a URL, but by a cryptographic hash. That last point about cryptographic hashes is key. Because COS identifies files by their hash rather than by their URL or origin, the same ort-wasm-simd-threaded.asyncify.wasm Wasm runtime you downloaded while visiting https://googlechrome.github.io is recognized as identical to the one https://rawcdn.rawgit.net is about to request, no matter where either of the two origins fetched it from. See the following code snippet that illustrates the basic flow. const hash = { algorithm: 'SHA-256', value: '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4', }; try { const handle = await navigator.crossOriginStorage.requestFileHandle(hash); // Cache hit! Get the file as a Blob and use it directly. const fileBlob = await handle.getFile(); } catch (err) { // Cache miss. Download from network, then store for next time. const fileBlob = await fetch('https://cdn.jsdelivr.net/.../ort-wasm-simd-threaded.asyncify.wasm') .then(r => r.blob()); const handle = await navigator.crossOriginStorage.requestFileHandle( hash, { create: true, origins: '*' },
Related
相關文章

微信上線高考 AI 志願助手,可在搜一搜直接語音提問
微信近期在「搜一搜」功能中推出「高考 AI 志願助手」,考生可透過文字或語音輸入分數、大學名稱等問題,系統會整合歷年錄取數據,自動給出「衝、穩、保」三個梯度的高校建議。這項功能旨在降低志願填報的資訊落差,但考生仍需注意隱私保護與數據準確性問題,AI 建議僅供參考。

微信推了AI助手「小微」,它會成為AI大模型的戰場嗎?
這篇消息聚焦「微信推了AI助手「小微」,它會成為AI大模型的戰場嗎?」。原始導語提到:別再叫「小微」,叫它「阿信」! 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

日本 Sakana AI 推出 Fugu:智能調用最佳模型,部分場景優於 Fable 5
這篇消息聚焦「日本 Sakana AI 推出 Fugu:智能調用最佳模型,部分場景優於 Fable 5」。原始導語提到:**日本 Sakana AI 推出 Fugu:智能調用最佳模型,部分場景優於 Fable 5** 日本人工智慧新創公司 Sakana AI 近日發表了一項名為「Fugu」的新系統,引發業界關注。根據目前已揭露的資訊,Fugu 的核心特色在於「智能調用」(intelligent routing),能夠根據不同任務的特性,自動選擇最適合的大型語言模型來執行,並且在部分特定場景中,其表現甚至超越了... 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。
騰訊QQ郵箱放大招:AI從此有了自己的"專屬郵箱",還能幫你談業務
騰訊QQ郵箱正式上線面向AI智能體的“Agently Mail”郵箱服務並啟動內測。該服務獨立於用戶個人郵箱,為AI提供安全隔離、具備獨立身份的數字通信空間及“數字名片”,讓AI不再借用用戶郵箱,有效規避隱私洩露、誤刪郵件等風險。
亞馬遜的“雙標”算盤:在 ChatGPT 投廣告引流,卻嚴防 AI 抓取數據
亞馬遜在AI零售浪潮中展現雙標:一方面在ChatGPT上砸錢投廣告,借其海量用戶精準獲客;另一方面嚴防其他AI抓取其商品頁數據。據分析師,它已入駐OpenAI廣告體系,為零售業中最積極的巨頭。當用戶用ChatGPT詢問購物建議時,系統會優先顯示亞馬遜的贊助產品。
從文本摘要到知識網格:谷歌 NotebookLM 新功能將重塑學術與深度閱讀工作流
谷歌為NotebookLM秘密測試“文獻綜述”矩陣工具,推動從散文報告向結構化多維表格轉型。該功能可一鍵映射大規模文獻,自動生成涵蓋主題、論點、方法論的對比網格,專為學術研究與複雜長文本分析優化。