RAG-Anything 教學：在 Colab 中建立支援文字、表格、方程式與圖像的多模態檢索管道

2026年7月2日 21:38

重點摘要

在本教學中，我們將建構一套 RAG-Anything 工作流程，並藉此探索多模態檢索如何應用於文字、表格、方程式與圖像。首先，我們會準備 Colab 環境、安裝所需套件，並在執行階段安全輸入 OpenAI API 金鑰，確保筆記本既實用又可安全執行。接著，我們會建立一份合成多模態報告、產生圖表與 PDF，將內容轉換為 RAG-Anything 的 direct content_list 格式，並插入檢索系統。隨著教學進行，我們將設定基於 OpenAI 的乾淨對話、視覺與嵌入函數，初始化 RAG-Anything，並測試不同檢索模式（如 naive、local、global 與 hybrid）。安裝 RAG-Anything 相依套件：複製程式碼（使用其他瀏覽器）import os i

站內 AI 整理稿

In this tutorial, we build a RAG-Anything workflow and use it to explore how multimodal retrieval works across text, tables, equations, and images. We start by preparing the Colab environment, installing the required packages, and securely entering our OpenAI API key at runtime to keep the notebook practical and safe to run. We then create a synthetic multimodal report, generate a chart and PDF, convert the content into RAG-Anything’s direct content_list format, and insert it into the retrieval system. As we move through the tutorial, we configure clean OpenAI-based chat, vision, and embedding functions, initialize RAG-Anything, and test different retrieval modes such as naive, local, global, and hybrid. Installing RAG-Anything Dependencies Copy CodeCopiedUse a different Browserimport os import re import sys import json import time import shutil import hashlib import asyncio import inspect import getpass import subprocess import importlib import importlib.metadata from pathlib import Path from typing import List, Dict, Any def run_shell(cmd, check=True): print(f"\n$ {cmd}") result = subprocess.run(cmd, shell=True, text=True) if check and result.returncode != 0: raise RuntimeError(f"Command failed: {cmd}") return result.returncode print("=" * 80) print("RAG-Anything Advanced Colab Tutorial") print("=" * 80) print("\n[1/10] Installing dependencies...") for module_name in list(sys.modules): if module_name == "PIL" or module_name.startswith("PIL."): del sys.modules[module_name] run_shell( 'pip -q install -U ' '"raganything[image,text]" ' '"openai>=1.0.0" ' '"python-dotenv" ' '"reportlab" ' '"pandas" ' '"matplotlib" ' '"tabulate"' ) run_shell('pip -q install --no-cache-dir --force-reinstall "pillow==11.3.0"') for module_name in list(sys.modules): if module_name == "PIL" or module_name.startswith("PIL."): del sys.modules[module_name] importlib.invalidate_caches() try: print("Pillow version:", importlib.metadata.version("Pillow")) except Exception as e: print("Could not read Pillow version:", repr(e)) print("\n[2/10] Importing libraries...") import numpy as np import pandas as pd import matplotlib.pyplot as plt from IPython.display import display from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas from reportlab.lib.units import inch from openai import AsyncOpenAI from raganything import RAGAnything, RAGAnythingConfig from lightrag.utils import EmbeddingFunc print("Imports successful.") We begin by setting up the complete Colab environment for the RAG-Anything workflow. We install the required libraries, repair the Pillow dependency, and import all the modules needed for plotting, PDF creation, OpenAI access, and RAG-Anything. We also define a reusable shell helper so the setup remains clear and easy to rerun. Configuring Directories, Runtime Variables Copy CodeCopiedUse a different Browserprint("\n[3/10] Preparing directories and runtime settings...") BASE_DIR = Path("/content/raganything_advanced_tutorial") if Path("/content").exists() else Path.cwd() / "raganything_advanced_tutorial" ASSET_DIR = BASE_DIR / "assets" OUTPUT_DIR = BASE_DIR / "output" WORKING_DIR = BASE_DIR / "rag_storage" LOG_DIR = BASE_DIR / "logs" RESET_STORAGE = True RUN_FULL_DOCUMENT_PARSE = False PARSER_FOR_FULL_PARSE = "mineru" PARSE_METHOD = "auto" for d in [BASE_DIR, ASSET_DIR, OUTPUT_DIR, WORKING_DIR, LOG_DIR]: d.mkdir(parents=True, exist_ok=True) if RESET_STORAGE and WORKING_DIR.exists(): shutil.rmtree(WORKING_DIR) WORKING_DIR.mkdir(parents=True, exist_ok=True) os.environ["LOG_DIR"] = str(LOG_DIR) os.environ["SUMMARY_LANGUAGE"] = "English" os.environ["ENABLE_LLM_CACHE"] = "false" os.environ["ENABLE_LLM_CACHE_FOR_EXTRACT"] = "false" os.environ["MAX_ASYNC"] = "2" os.environ["CHUNK_SIZE"] = "900" os.environ["CHUNK_OVERLAP_SIZE"] = "120" os.environ["TIMEOUT"] = "240" for var in [ "OPENAI_API_KEY", "OPENAI_ORG_ID", "OPENAI_ORGANIZATION", "OPENAI_PROJECT", "OPENAI_DEFAULT_HEADERS", "LLM_BINDING_API_KEY", "LLM_BINDING_HOST", ]: os.environ.pop(var, None) print(f"Base directory: {BASE_DIR}") print(f"Assets directory: {ASSET_DIR}") print(f"Storage directory: {WORKING_DIR}") print("\n[4/10] Entering OpenAI API key securely...") def clean_api_key(raw_value: str) -> str: raw_value = str(raw_value or "").strip() raw_value = raw_value.replace("Bearer ", "").replace("bearer ", "").strip() raw_value = raw_value.strip("'").strip('"').strip("`").strip() if "=" in raw_value: raw_value = raw_value.split("=", 1)[1].strip().strip("'").strip('"').strip("`") raw_value = re.sub(r"\s+", "", raw_value) raw_value = raw_value.encode("ascii", errors="ignore").decode("ascii").strip() return raw_value OPENAI_API_KEY_RAW = getpass.getpass("Paste your OpenAI API key here. Input is hidden: ") OPENAI_API_KEY = clean_api_key(OPENAI_API_KEY_RAW) if not OPENAI_API_KEY: raise ValueError( "No API key was captured. Paste the key into the hidden input box and press Enter." ) print("Captured key length:", len(OPENAI_API_KEY)) print("Captured key prefix:", OPENAI_API_KEY[:12] + "...") print("Captured key suffix:", "..." + OPENAI_API_KEY[-6:]) LLM_MODEL = "gpt-4o-mini" VISION_MODEL = "gpt-4o-mini" EMBEDDING_MODEL = "text-embedding-3-small" EMBEDDING_DIM = 1536 openai_client = AsyncOpenAI(api_key=OPENAI_API_KEY) os.environ["LLM_MODEL"] = LLM_MODEL os.environ["VISION_MODEL"] = VISION_MODEL os.environ["EMBEDDING_MODEL"] = EMBEDDING_MODEL os.environ["EMBEDDING_DIM"] = str(EMBEDDING_DIM) print("Testing OpenAI chat API with the captured key...") try: test_response = await openai_client.chat.completions.create( model=LLM_MODEL, messages=[{"role": "user", "content": "Reply with exactly: ok"}], temperature=0, ) print("Chat API test response:", test_response.choices[0].message.content) except Exception as e: raise RuntimeError( "The key was captured, but OpenAI rejected the request or the account/model access failed. " "Check billing, project permissions, and make sure this is an OpenAI Platform API key." ) from e print("\nTesting OpenAI embedding API...") try: test_embedding = await openai_client.embeddings.create( model=EMBEDDING_MODEL, input=["RAG-Anything embedding test"], ) print("Embedding vector length:", len(test_embedding.data[0].embedding)) except Exception as e: raise RuntimeError( "Chat worked, but embeddings failed. Make sure your API key has permission for embeddings." ) from e print("OpenAI API key is working.") print(f"Chat model: {LLM_MODEL}") print(f"Vision model: {VISION_MODEL}") print(f"Embedding model: {EMBEDDING_MODEL}") print(f"Embedding dimension: {EMBEDDING_DIM}") We prepare the working directories, output folders, logs, and runtime environment variables that RAG-Anything uses during execution. We securely capture the OpenAI API key via a hidden input, clean the pasted value, and verify that both the chat and embedding calls work correctly. We also define the models and embedding dimensions that power the rest of the tutorial. Generating a Synthetic Multimodal Report Copy CodeCopiedUse a different Browserprint("\n[5/10] Creating a synthetic multimodal report...") monthly_data = pd.DataFrame( { "Month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"], "Query Volume": [1200, 1700, 2100, 2600, 3300, 4100], "Hybrid Accuracy": [0.71, 0.74, 0.79, 0.83, 0.87, 0.91], "Average Latency ms": [980, 920, 850, 790, 760, 730], } ) table_md = monthly_data.to_markdown(index=False) plt.figure(figsize=(8, 4.8)) plt.plot(monthly_data["Month"], monthly_data["Query Volume"], marker="o", label="Query Volume") plt.plot(monthly_data["Month"], monthly_data["Hybrid Accuracy"] * 4000, marker="s", label="Hybrid Accuracy scaled") plt.title("Multimodal RAG Usage and Quality Trend") plt.xlabel("Month") plt.ylabel("Volume / Scaled Accuracy") plt.legend() plt.grid(True, alpha=0.3) plt.text( 0.02, 0.95, "Synthetic figure: usage rises while latency falls", transform=plt.gca().transAxes, fontsize=9, verticalalignment="top", bbox=dict(boxstyle="round", alpha=0.15), ) chart_path = ASSET_DIR / "raganything_quality_trend.png" pl

原始來源：MarkTechPost AI ↗

查看原始來源

36氪生成式AI

Claude Fable 5，名存實亡

assistant: 根據提供的內容，這似乎是一則關於AI模型服務的報導或評論。摘要如下：Claude的Fable 5模型在更新後性能大幅下滑，跑分結果出現斷崖式下跌。官方文檔揭露，用戶在付費使用Fable 5的過程中，實際運行的可能一直是舊版的Opus模型。此事件引發了對模型服務透明度的質疑。</think>Claude的Fable 5模型在更新後性能大幅下滑，跑分結果出現斷崖式下跌。官方文檔揭露，用戶在付費使用Fable 5的過程中，實際運行的可能一直是舊版的Opus模型。此事件引發了對模型服務透明度的質疑。

剛剛閱讀分析

智東西生成式AI

對話Kimi B端負責人黃震昕：把國產大模型搬上亞馬遜雲科技，未來與海外“御三家”掰手腕

月之暗面Kimi與亞馬遜雲科技展開四層合作，涵蓋基礎設施、平台服務、業務合作及垂直行業，藉此拓展全球市場。Kimi B端負責人黃震昕透露，公司提供業界最高人均算力，B端業務快速增長，並在Token效率、長程推理及Agent集群等方面取得技術突破，目標是與海外頂尖模型競爭。他預測，雖然算力成本上漲推升模型價格，但技術優化將持續提升性價比。

4 小時前閱讀分析

雷峰網生成式AI

算力之外的博弈：ICML 2026 透露了哪些學術硬通貨？

告別盲目刷榜，28頁 PPT 帶你摸透 ICML 新風向。作者丨陳淑瑜編輯丨岑峰 ICML 2026 的投稿量從去年的 12107 篇直接飆升至 23,918 篇，幾近翻倍。然而，最終的接收率卻牢牢釘在 26.56%，與去年幾乎持平。這一數據傳遞出一個明確的信號：並非競爭變得盲目激烈，而是學術評審標準經歷了一次深刻的“重新校準”。

6 小時前閱讀分析

智東西生成式AI

獨家：阿里全面禁用Claude

智東西作者 | 李水青編輯 | 雲鵬智東西7月3日獨家獲悉，今日，阿里巴巴內部宣佈反向禁用Claude。阿里全員被要求卸載Anthropic相關產品，包括Sonnet、Opus、Fable等多個系列模型，以及Claude Code在內的Agent產品。禁令於7月10日正式生效。

7 小時前閱讀分析

智東西生成式AI

超190億！AI視頻最大單筆融資誕生，阿里騰訊百度都投了

快手旗下AI視頻生成業務「可靈AI」完成190.48億元融資，阿里、騰訊、百度均參與投資，快手持股比例降至約68.33%。可靈AI自2024年6月上線以來已更新30多次，2025年營收約11億元，年化收入運行率達5億美元。快手同時宣布首次授予員工股權獎勵，並計劃在未來12個月內推動可靈AI赴港上市。

11 小時前閱讀分析

雷峰網生成式AI

Claude Sonnet 5 上線一日差評刷屏：打不過千問和 Minimax，性價比全面翻車

根據雷峰網的原始內容，這篇消息聚焦「Claude Sonnet 5 上線一日差評刷屏：打不過千問和 Minimax，性價比全面翻車」。以下整理保留來源中的主要事實與脈絡。根據雷峰網的原始內容，這篇消息聚焦「Claude Sonnet 5 上線一日差評刷屏：打不過千問和 Minimax，性價比全面翻車」。以下整理保留來源中的主要事實與脈絡。

1 天前閱讀分析

相關文章

Claude Fable 5，名存實亡

對話Kimi B端負責人黃震昕：把國產大模型搬上亞馬遜雲科技，未來與海外“御三家”掰手腕

算力之外的博弈：ICML 2026 透露了哪些學術硬通貨？

獨家：阿里全面禁用Claude

超190億！AI視頻最大單筆融資誕生，阿里騰訊百度都投了

Claude Sonnet 5 上線一日差評刷屏：打不過千問和 Minimax，性價比全面翻車