The argument in one line.
RAG Anything solves the plain-text ceiling of most knowledge graph systems by running a free local parser that converts any document type into the same vector database and entity graph LightRAG already uses.
Read if. Skip if.
- You already have a self-hosted LightRAG server running and want to feed it scanned PDFs, spreadsheets with charts, or image-heavy documents.
- You use Claude Code to query a local knowledge graph and keep hitting walls when source material is not plain text.
- You are building a personal or agency knowledge base and need multi-modal document ingestion without a cloud RAG subscription.
- You want to understand the architecture of a production RAG pipeline well enough to debug it when ingestion breaks.
- You have not set up LightRAG yet -- this video explicitly assumes the prior episode as a prerequisite.
- You need a turnkey hosted solution; every step here requires running local Python scripts and Docker.
The full version, fast.
Most RAG systems can only ingest plain text, which breaks the moment you feed them scanned PDFs, charts, or images. RAG Anything -- from the same team that built LightRAG -- fixes this with a local document parser called MinerU that classifies every element in a PDF (text, image, chart, LaTeX) and routes each through a specialized pipeline. The text bucket and image bucket each produce a vector database and a knowledge graph, which are merged together, then merged with your existing LightRAG instance. The output is structurally identical to a text-only setup, and querying through Claude Code works exactly the same way as before.
Chat with this breakdown — free.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →Where the time goes.

01 · Intro
Hook on the plain-text limitation of RAG systems. RAG Anything introduced as the fix. Prerequisites established: assumes LightRAG is already running.

02 · RAG Anything Overview
High-level: RAG Anything is a multimodal wrapper for LightRAG from the same HKUDS team. Handles PDFs, images, charts. Brief sponsor mention (Chase AI+ masterclass). All output routes to the same knowledge graph.

03 · How It Works
Architecture walkthrough: MinerU runs locally and classifies PDF elements into text, images, charts, LaTeX. Text bucket -> PaddleOCR -> LLM -> embeddings + entities. Image bucket -> screenshot -> LLM (OCR + vision) -> embeddings + entities. Each path produces a vector DB and knowledge graph. All four are merged into one pair, then merged with the existing LightRAG instance.

04 · Install and Demo
One-shot Claude Code prompt installs RAG Anything, updates storage paths, swaps models to GPT-5.4 Nano, and patches the embedding double-wrap bug. Non-text ingestion requires a Python script wrapped into a Claude Code skill. Demo: querying a fake NovaTech SaaS PDF with a bar chart -- Claude Code returns monthly revenue data Jan-Sep 2025.

05 · Final Thoughts
MinerU runs locally on CPU/GPU -- API cost only for the LLM embedding step. Free Skool community has the one-shot prompt and Claude Code skill.
Lines worth screenshotting.
- RAG systems that only handle text are broken for real-world use -- most business documents are PDFs with charts, not markdown files.
- MinerU is a free, locally-running document parser that classifies every PDF element (header, text, chart, image, LaTeX equation) before any LLM sees it.
- Separating text from images before sending to an LLM is dramatically cheaper than sending everything as screenshots -- a scalpel, not a shotgun.
- From one non-text document, RAG Anything creates four intermediate artifacts (two vector DBs, two knowledge graphs) then merges all four into one.
- The LightRAG + RAG Anything merge produces a single unified knowledge graph indistinguishable from a text-only setup at query time.
- The only user-facing change after setup is invoking a Claude Code skill instead of dragging files into the LightRAG web UI.
- Running MinerU on CPU is slow but free; switching to GPU PyTorch cuts processing time and Claude Code can configure it automatically.
- The RAG Anything GitHub repo ships with an embedding double-wrap bug in its example scripts -- a one-shot Claude Code prompt fixes it.
- Querying a chart-heavy PDF returns correct structured data values that LightRAG alone would have missed or hallucinated.
- Architectural knowledge of the two-path system (text bucket vs. image bucket) is what lets you debug ingestion failures without guessing.
How to give any RAG system a document-type ceiling lift.
The plain-text wall breaks most self-hosted RAG pipelines -- and the fix is a local parsing layer that is completely invisible at query time.
- The plain-text limitation of RAG systems is not an edge case -- scanned PDFs and chart-heavy documents are the norm in most business contexts.
- MinerU classifies every PDF element into typed buckets (text, image, chart, LaTeX) before any LLM sees them -- this local pre-sort is what keeps API costs manageable at scale.
- Sending images and text through separate LLM prompts rather than one giant screenshot prompt is the core cost-control insight: a scalpel beats a shotgun when processing thousands of documents.
- From one non-text document, the system creates four intermediate artifacts (two vector DBs, two knowledge graphs) merged down to one -- structurally identical to a text-only LightRAG output.
- The user-facing workflow change is exactly one step: replace the drag-and-drop upload with a Claude Code skill invocation; querying syntax stays the same.
- MinerU runs on CPU by default and can be upgraded to GPU by asking Claude Code to reconfigure PyTorch -- no manual dependency management required.
- Understanding the two-path architecture (text bucket vs. image bucket) is what lets you debug ingestion failures; treating it as a black box leaves you helpless when a document comes back empty.
Terms worth knowing.
- RAG (Retrieval-Augmented Generation)
- A technique where an LLM is given access to a private knowledge base at query time, enabling it to answer questions about documents it was never trained on.
- LightRAG
- An open-source local RAG framework that stores knowledge as a graph of entities and relationships, enabling richer multi-hop retrieval than vector search alone.
- RAG Anything
- A multimodal extension to LightRAG (same HKUDS team) that ingests non-text documents by parsing them locally before routing into the knowledge graph.
- MinerU
- An open-source document parsing engine that classifies PDF content into text blocks, images, charts, tables, and LaTeX equations using specialized local models -- runs entirely on your machine.
- PaddleOCR
- A local OCR model bundled inside MinerU that converts scanned or image-embedded text regions in documents to machine-readable strings.
- Knowledge Graph
- A database that stores information as a network of named entities and the relationships between them, enabling complex multi-hop queries across a document corpus.
- Vector Database
- A database storing high-dimensional numerical embeddings of text chunks, enabling semantic similarity search for RAG retrieval.
- Embedding
- A numerical representation of text or an image caption in high-dimensional space, where semantically similar content is positioned nearby.
- Embedding double-wrap bug
- A defect in RAG Anything example scripts where embeddings are wrapped in an encoding function twice, causing a dimension mismatch error at ingestion time.
Things they pointed at.
Lines you could clip.
“Almost every RAG system suffers from the exact same problem. They can only handle text documents.”
“Why don't we just treat this entire thing as a screenshot? Because it's expensive and slow.”
“In the end, you didn't notice a dang thing. Again, as the user, all of this is invisible to you.”
Word for word.
Don't just watch it. Burn it in.
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
The bait, then the rug-pull.
The ceiling of most RAG pipelines is not compute or cost -- it is document type. The moment a knowledge base encounters a scanned PDF or a chart embedded in a slide deck, the pipeline silently fails. This video is the fix.
Named ideas worth stealing.
Two-Bucket Document Parsing
- Text bucket (PaddleOCR -> LLM -> embeddings + entities)
- Image bucket (screenshot -> vision LLM -> embeddings + entities)
MinerU splits any document into a text path and an image path before sending to an LLM. Each path produces its own vector DB and knowledge graph. All four artifacts are merged into one.
How they asked for the click.
“Make sure to check out Chase AI plus if you wanna get your hands on that Claude Code masterclass”
Soft pitch at the very end after all value delivered; free community alternative offered throughout











































































