Big Idea

The argument in one line.

RAG Anything solves the plain-text ceiling of most knowledge graph systems by running a free local parser that converts any document type into the same vector database and entity graph LightRAG already uses.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You already have a self-hosted LightRAG server running and want to feed it scanned PDFs, spreadsheets with charts, or image-heavy documents.
You use Claude Code to query a local knowledge graph and keep hitting walls when source material is not plain text.
You are building a personal or agency knowledge base and need multi-modal document ingestion without a cloud RAG subscription.
You want to understand the architecture of a production RAG pipeline well enough to debug it when ingestion breaks.

SKIP IF…

You have not set up LightRAG yet -- this video explicitly assumes the prior episode as a prerequisite.
You need a turnkey hosted solution; every step here requires running local Python scripts and Docker.

TL;DR

The full version, fast.

Most RAG systems can only ingest plain text, which breaks the moment you feed them scanned PDFs, charts, or images. RAG Anything -- from the same team that built LightRAG -- fixes this with a local document parser called MinerU that classifies every element in a PDF (text, image, chart, LaTeX) and routes each through a specialized pipeline. The text bucket and image bucket each produce a vector database and a knowledge graph, which are merged together, then merged with your existing LightRAG instance. The output is structurally identical to a text-only setup, and querying through Claude Code works exactly the same way as before.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:48

01 · Intro

Hook on the plain-text limitation of RAG systems. RAG Anything introduced as the fix. Prerequisites established: assumes LightRAG is already running.

00:48 – 03:22

02 · RAG Anything Overview

High-level: RAG Anything is a multimodal wrapper for LightRAG from the same HKUDS team. Handles PDFs, images, charts. Brief sponsor mention (Chase AI+ masterclass). All output routes to the same knowledge graph.

03:22 – 13:11

03 · How It Works

Architecture walkthrough: MinerU runs locally and classifies PDF elements into text, images, charts, LaTeX. Text bucket -> PaddleOCR -> LLM -> embeddings + entities. Image bucket -> screenshot -> LLM (OCR + vision) -> embeddings + entities. Each path produces a vector DB and knowledge graph. All four are merged into one pair, then merged with the existing LightRAG instance.

13:11 – 18:19

04 · Install and Demo

One-shot Claude Code prompt installs RAG Anything, updates storage paths, swaps models to GPT-5.4 Nano, and patches the embedding double-wrap bug. Non-text ingestion requires a Python script wrapped into a Claude Code skill. Demo: querying a fake NovaTech SaaS PDF with a bar chart -- Claude Code returns monthly revenue data Jan-Sep 2025.

18:19 – 19:20

05 · Final Thoughts

MinerU runs locally on CPU/GPU -- API cost only for the LLM embedding step. Free Skool community has the one-shot prompt and Claude Code skill.

Atomic Insights

Lines worth screenshotting.

RAG systems that only handle text are broken for real-world use -- most business documents are PDFs with charts, not markdown files.
MinerU is a free, locally-running document parser that classifies every PDF element (header, text, chart, image, LaTeX equation) before any LLM sees it.
Separating text from images before sending to an LLM is dramatically cheaper than sending everything as screenshots -- a scalpel, not a shotgun.
From one non-text document, RAG Anything creates four intermediate artifacts (two vector DBs, two knowledge graphs) then merges all four into one.
The LightRAG + RAG Anything merge produces a single unified knowledge graph indistinguishable from a text-only setup at query time.
The only user-facing change after setup is invoking a Claude Code skill instead of dragging files into the LightRAG web UI.
Running MinerU on CPU is slow but free; switching to GPU PyTorch cuts processing time and Claude Code can configure it automatically.
The RAG Anything GitHub repo ships with an embedding double-wrap bug in its example scripts -- a one-shot Claude Code prompt fixes it.
Querying a chart-heavy PDF returns correct structured data values that LightRAG alone would have missed or hallucinated.
Architectural knowledge of the two-path system (text bucket vs. image bucket) is what lets you debug ingestion failures without guessing.

Takeaway

How to give any RAG system a document-type ceiling lift.

WHAT TO LEARN

The plain-text wall breaks most self-hosted RAG pipelines -- and the fix is a local parsing layer that is completely invisible at query time.

01Intro

The plain-text limitation of RAG systems is not an edge case -- scanned PDFs and chart-heavy documents are the norm in most business contexts.

03How It Works

MinerU classifies every PDF element into typed buckets (text, image, chart, LaTeX) before any LLM sees them -- this local pre-sort is what keeps API costs manageable at scale.
Sending images and text through separate LLM prompts rather than one giant screenshot prompt is the core cost-control insight: a scalpel beats a shotgun when processing thousands of documents.
From one non-text document, the system creates four intermediate artifacts (two vector DBs, two knowledge graphs) merged down to one -- structurally identical to a text-only LightRAG output.

04Install and Demo

The user-facing workflow change is exactly one step: replace the drag-and-drop upload with a Claude Code skill invocation; querying syntax stays the same.
MinerU runs on CPU by default and can be upgraded to GPU by asking Claude Code to reconfigure PyTorch -- no manual dependency management required.
Understanding the two-path architecture (text bucket vs. image bucket) is what lets you debug ingestion failures; treating it as a black box leaves you helpless when a document comes back empty.

Glossary

Terms worth knowing.

RAG (Retrieval-Augmented Generation): A technique where an LLM is given access to a private knowledge base at query time, enabling it to answer questions about documents it was never trained on.
LightRAG: An open-source local RAG framework that stores knowledge as a graph of entities and relationships, enabling richer multi-hop retrieval than vector search alone.
RAG Anything: A multimodal extension to LightRAG (same HKUDS team) that ingests non-text documents by parsing them locally before routing into the knowledge graph.
MinerU: An open-source document parsing engine that classifies PDF content into text blocks, images, charts, tables, and LaTeX equations using specialized local models -- runs entirely on your machine.
PaddleOCR: A local OCR model bundled inside MinerU that converts scanned or image-embedded text regions in documents to machine-readable strings.
Knowledge Graph: A database that stores information as a network of named entities and the relationships between them, enabling complex multi-hop queries across a document corpus.
Vector Database: A database storing high-dimensional numerical embeddings of text chunks, enabling semantic similarity search for RAG retrieval.
Embedding: A numerical representation of text or an image caption in high-dimensional space, where semantically similar content is positioned nearby.
Embedding double-wrap bug: A defect in RAG Anything example scripts where embeddings are wrapped in an encoding function twice, causing a dimension mismatch error at ingestion time.

Resources

Things they pointed at.

00:48linkLightRAG ↗

05:12linkRAG-Anything ↗

05:12linkMinerU ↗

02:47productChase AI+ Claude Code Masterclass ↗

18:19linkFree Chase AI Skool Community (prompts + skills) ↗

Quotables

Lines you could clip.

00:08

“Almost every RAG system suffers from the exact same problem. They can only handle text documents.”

Tight problem statement, zero setup needed, instantly relatable to anyone who has built a RAG pipeline→ TikTok hook↗ Tweet quote

09:03

“Why don't we just treat this entire thing as a screenshot? Because it's expensive and slow.”

Counter-intuitive insight delivered as a punchline -- works as a standalone clip→ IG reel cold open↗ Tweet quote

12:29

“In the end, you didn't notice a dang thing. Again, as the user, all of this is invisible to you.”

The payoff line that makes complex architecture feel accessible→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

Almost every RAG system suffers from the exact same problem. They can only handle text documents. So if you try to give it images, charts, graphs, whatever, most RAG systems just can't handle it.

And when I showed you LightRag yesterday, it's separate from the exact same problem. But today, I'm gonna show you the fix.

And that fix is RAG Anything. RAG Anything solves this document problem for us. It can handle images.

It can handle charts. It can handle graphs. And it allows us to create a rag system that actually deals with the documents you use.

Rag anything is from the same team that built LightRag. It plugs in directly into the LightRag system we already built yesterday. So it's really easy to introduce this into our stack.

And so today, I'm gonna show you exactly how to set it up and how it works under the hood so you can begin using one of the most powerful Rag systems out there. So in case it wasn't obvious enough from the opener, I'm going to assume you've already watched yesterday's LiteRag video. I'll put a link above if you haven't done that already Because today, I'm going to assume you've already set up your LightRag server, you understand how Rag works, and you understand this whole knowledge graph thing.

Because Rag Anything is essentially going to be a wrapper around LightRag. We're still gonna have the same LightRag web UI with some differences, but everything that gets pushed into Rag Anything, you know, these non text documents, eventually find their way to the same knowledge graph. We're gonna be asking it the same questions.

We're gonna be using the same API to query it through Claude code that we did yesterday. And the functionality we are going to be adding today is significant. It's not enough to build a rag system that is purely text.

We don't operate in a world that's purely text. How many of you have been given a PDF document that isn't even technically text? It's just scanned.

LightRag can't really handle that. Rag anything can. Now we will go a little technical today.

We'll get under the hood and I'll explain exactly how this whole system works. But big picture, what is it doing?

Rag anything is just looking at the documents that aren't text. It's basically doing exactly what LightRag does except to these non text documents. And after it creates its own knowledge graph and its own vector database, it merges it with the LightRag one, which is why everything ends up being in one nice, neat little place for us to ask questions about.

Now the only downsides about Rag Anything is it's a bit heavier. We have to download some models that live on our computer that help parse some of these non text documents. And when it comes to actually ingesting non text documents, we can't do it really through the light rag UI.

We have to use a script. Luckily, is where Claude code comes in. So for you, the user, after you set all this up, all you have to do to ingest non text documents is tell Claude code, hey.

Go ahead. Use the rag anything skill and ingest this document. It's that simple, and you ask the questions the same way you did before.

So really not too bad. And again, get all this functionality just by doing that.

Now before we go into how rag anything actually works, just wanna give a quick plug for my Cloud Code masterclass. Just came out a couple weeks ago, and it's the number one place to go from zero to AI dev, especially if you don't come from a technical background. I update this literally every week.

There's a new update coming tomorrow. So if you're someone who is really trying to master cloud code and has no idea where to start, well, this is for you.

There's a link to that in the comments. It's inside Chase AI plus. I also have the free Chase AI community.

If this is just too much for you, you're just getting started, link to that is in the description. That is where you also will find the prompts and the skills that I'm gonna talk about today. So make sure you check that out regardless.

Now let's talk about rag anything and how this thing actually works. To be honest, it's pretty simple, pretty self explanatory. So not to waste your time, I'm just gonna keep this image up for, like, ten seconds, and then we'll move on to the next thing.

Alright. Pretty good? Alright.

Let's move on. I'm just kidding. It's it's there's actually a bit going on.

This image makes it more confusing than it actually is. And if you understand what we did the other day with LightRag, remember all this conversation, you're gonna be good.

Rag anything kinda operates in a similar fashion just with a few extra steps. I wanna go through it because I think it's important to understand how these things work. You know, I think in AI in general, it's easy to become, like, super practical focused.

Like, I just wanna know how I install it, chase, and then how to use it. That's fine. You can skip ahead if that's you.

But I think if you wanna become a more mature AI dev and you kinda wanna separate yourself from the monkey I could replace you with that just hits accept accept accept and copies prompts and skills, then I think it's important to have some, you know, understanding of architecture because this is what's gonna separate you from other people and not just in terms of, like, how you can use this rag system, but in bigger level in higher level, bigger projects.

Right? This is how you begin to sort of, like, create your own skills, like, actually become good at this stuff.

So let's talk about it. So rag anything. Let's talk about the problem.

Right? The problem is I have a PDF that is a scanned PDF, and it's not really text, and yet I need to put it into my rag system.

Light rag can't handle it. So in comes RAG anything.

Right? It's got the cool llama with the six shades. So the first thing that happens is I'm going to ingest this document into RAG anything.

And the first thing it's going to do is it's going to use a program called Miner which runs on your computer completely locally for free, and it's gonna essentially break down this document into its component parts.

Miner U is an open source project. Again, it's essentially a document parser that includes a bunch of, like, miniature specialized models. All you need to know is if you're scared of this, it's open source.

I'll put a link down below. And again, this is what's going to be running and doing most of the work for us today. So Minor U is looking at this document and it says, okay.

This is a header. It creates a box around the header. It says this is text.

It says this is a chart. It says this is an image of a bar graph, and it says this is an equation written in latex. What it's done is it's looked at the document and it's broken it out, okay, into its special parts.

Minor u doesn't understand what's inside here. Minor u isn't reading the text. It doesn't get the text.

It doesn't understand what the chart is about. It just knows chart, text, image.

Okay? From there, it's going to send these component parts to individual specialized models that are part of MinorU.

So this is all invisible to you. This is all happening automatically under the hood. So the model one of the models is called, like, Paddle OCR.

That's what's gonna look at the text. So MinorU is sending this text block to Paddle OCR on your computer, and it's gonna pull out the text. Okay?

So now instead of being scanned text, it's actual text that reads company x reported strong q three twenty three. Results with revenue growth blah blah blah blah blah. Right?

Same for this Text. Same for the chart.

Right? It's also gonna turn it into text. Right?

Something an LLM can handle. Same thing with latex equations. It has a whole model that handles that.

Right? This is now no longer latex.

It's actually text except for images. So whether this is a bar chart or just it's really anything that it can't transform to text.

What it's gonna do instead is it's going to take a screenshot of it, and this is important. Alright? So now this is a screenshot.

It's an image. Screenshot. Love that.

So what do we have? We inserted a non text document.

It's been identified into its component parts, and we've taken those component parts and we've broken it down into two buckets. Right? We have the text bucket and we have the image bucket.

It's important to realize this. There's two paths that can go down, image or text. Alright?

You with me? So what it's going to do now is we're done using these internal models. Now we need to bring in the big boys.

Now we need to bring in something like GPT 5.4 mini. Of note, that isn't necessarily the case. You could keep this all local if you wanted to.

You could do something like Ollama. So now I take the text bucket and I push it to GPT 5.4 mini. And I include a prompt that says, I want you to break out this text for two things.

I want you to take that text and break it out into entities and relationships.

Remember entities and relationships? Remember our knowledge graph? Entity, entity, and sort of the relationship between them.

Okay? And I want you to break it out into what will be embeddings for vector database.

So embeddings, embed, and then I'm just going to say entities plus relationships.

Now thinking ahead, what's going to happen there? Well, the embeddings are gonna become embeddings in a vector database, and the entities and relationships are gonna become a knowledge graph just like we did with LightRag.

Right? Same thing. Same thing.

Except now now it's from the text bucket. But what about those images we had?

Right? What are we gonna do with these guys? Same thing.

This is going to get pushed to 5.4 as well, but it's going to be as a screenshot, as an OCR. So we're telling g p t 5.4, take a look at this screenshot and break it out into two things.

Right? Embeddings and also entities plus relationships.

Now why do we do that? Why don't we just shove it all into the same exact prompt and have it just OCR this entire thing? Right?

Why don't we just treat this entire thing as a screenshot? Because it's expensive and slow. What Rag Anything decided to do, and I think it's kinda smart, is it kinda takes a scalpel to this on your computer at the local level, breaking it out into text, breaking it into screenshots.

So when we go through these two paths, you're saving a ton of money and time. Because imagine you were trying to have ChatGPT look at 10,000 screenshots and then break out all the text, and from the text, break it out into embeddings and entities and relationships.

It take a lot of time and money. This is smarter. So entities and relationships from the image side, same exact thing.

It also gets a vector database, and it also gets a knowledge graph.

So what does that mean? That means from one document, we've now created four kind of things.

Right? We have two vector databases, and we have two knowledge graphs from our single non text document.

You with me? Now what do we have to do? Well, it's kind of obvious.

We need to merge these. So it's going to take these four things and just push them together.

Right? They're gonna pretty much overlay on top of one another. It's gonna match and base on entities essentially.

And you're just gonna get, you know, at the end, one vector database and one knowledge graph.

Pretty much the exact same thing we did up here with LightRag. Simple enough? If we were just using Rag anything, that would kinda be the extent of it.

However, remember, we're trying to lay rag anything on top of LightRag. I want all the power of LightRag, and I want all the power of rag anything.

So what happens now? Well, what happens is just a repeat of what you just saw. So let's kinda bring this guy down.

So now we have our rag anything set with a vector database and a knowledge graph and we have our light rag set.

So what do we do? We just merge those together. So then what happens is we get the rag everything and the light rag combined, which gives us finally one vector database and one knowledge graph.

And from there, it's just like it was before with LightRag on its own.

Right? You ask a question about whatever.

That get that question gets turned into a vector up here. It pulls the relevant vectors. And then it also goes down here, finds the correct entity, and then takes a look at what's nearby.

Okay? Maybe that was a little confusing. I hope I explained that okay.

The kind of recap To confuse you even more, what happens when I add a document that cannot be text? It goes into Rag Anything.

Rag Anything breaks out what text it can and then breaks out what images it can as well. It sends both of those to ChatGPT or whatever AI system you want. It breaks that out into embeddings, entities and relationships.

Those get turned into knowledge graphs and vector databases. We then merge those together. We now have one vector database and one knowledge graph for Rag Anything.

And since we've already been running this in LightRag or if you've added any more documents on top of that, you have an existing vector database and an existing knowledge graph.

To solve that, we simply merge them. In the end, you didn't notice a dang thing. Again, as the user, all of this is invisible to you.

Okay? None of this really matters to you. The only thing that might matter to you is what's happening over here with GPT 5.4 because it's gonna cost you some money.

But for educational purposes, that is how the Rag Anything system integrates with the LiteRag system and at the end of the day, it just means that you have a rag system that can handle non text documents.

And if you're still around after all that, now we can go into how you actually install this thing and use it. So now let's talk about the install and how to actually use it and a couple of things you need to watch out for.

So I created a one shot prompt that you can give Claude code that will install everything for you and update the proper models and all of that. All you need to do is just make sure you're in your Lightrag directory when you run this. So there's really three things it's going to be doing.

First of all, it's going to make sure we update that correct storage path since you already have a Docker LiteRag instance running. Two, we want to update the model because based on the GitHub, it, you know, was created a little while ago originally. So all the example scripts and all that use things like GPT four o mini.

So I have it on 5.4 nano. Understand you can change that if you want to. But I had it use 5.4 nano as well as keep text embedding three large so that we can just use OpenAI for everything.

It just keeps it simple. Play with it as you wish. Lastly, since we're using rag anything as essentially a wrapper on top of LightRag, some of the example scripts given in the GitHub repo are kind of wrong.

So there's like this embedding double wrap bug, which again, we just tell Cloud Code to fix and it will fix it. So you're just going to use this prompt. Again, it is inside the free school community.

Link is in the description. Just look up, rag anything, and you will find it there. And once you run that prompt, it will begin downloading everything.

And understand it's a little heavier because it needs to download MinorU and all those dependencies as well. Now let's talk about ingesting documents because this is kind of annoying and a pain in the butt. In a perfect world, the LightRag plus Rag anything situation would be very streamlined, and I could dump whatever I wanted to into LightRag slash Rag anything through a singular interface.

I could come into the UI. I could go to upload, and I could do that. You really can't with rag anything with LightRag.

You can still do this for text documents. So you can still do the normal workflow that I showed in the previous video where you go to the UI or you use the LightRag skill to upload documents.

You can't do that with Rag Anything. It has to go down essentially a different tunnel, a different pathway. But that different pathway with Rag Anything is a Python script.

There's no UI. There's no button to press. It's literally a script.

It's code you have to run. Now luckily, this is where Claude code comes in and makes it very simple because we're just going to turn that script inside of the repo into a skill.

So for you, once that skill's created, all you have to do is say, Claude code, use the rag anything skill to upload all these documents, all these non text documents. And when it does that, it will go through the minor you process.

It will take some time because it has to do all these, you know, things to it like we explained in the kinda technical section. But it will upload it to LightRag, and it will show up inside of your documents and inside of your knowledge graph.

Okay? That's the only weird part you need to know. The other weird part, to be honest, is once you do that, it also requires you to restart the Docker container.

But as part of the skill, that happens automatically. So again, from your point of view as the user, the only difference is you just need to invoke the skill.

Now this skill, the rag anything upload skill is also inside the free community. So just download it and then put it in your dot claud folder, and then it will work just fine. Now the one note on MinorU taking a while, that's because the way rag anything works when you download it, it's going to run on your CPU.

If you want it to run on your GPU, you have to have a different version of PyTorch. If that all went over your head, just if it's too slow for you, just tell Cloud Code, hey. Can we run PyTorch?

Can we run Miner U on our GPU? And it will walk you through it. Or in fact, it'll just do it all on its own.

But by default, it's just gonna run-in your CPU, so just know that. So let's see an example of this in action. So one of the documents we ingested was this PDF of NovaTech.

Right? SaaS revenue analysis. It's totally fake.

But the point is we ingested something that has this sorted bar chart. Right? So this is something that obviously would have been pulled out as an image, sent to chat GBT, yada yada yada.

Normally, LightRag wouldn't be able to handle this because it's just an image. It's chart it's hard for it to sort of break that out. But since we ran this through Rag anything, we can now ask a question via Claude Code about this.

So I asked Claude Code, can we query our LightRag database about monthly revenue trend for Novitech Inc. For January through September 2025? You can see here it actually didn't even use the skill.

It just straight up did the API request, which is fine as well with the query. What was the monthly revenue trend for Novitech Inc. From blah blah blah blah blah?

Now it gave a full response, I can take a look at the raw response if I wanted to. But what did it do? It came back with the full monthly breakdowns.

We see January '6, 4.6, February '9, four point nine, March five point four, five point four, on and on and on. So in terms of asking questions about these new documents, same thing as before. The only difference is the upload.

All you need to do is to invoke that skill that I'm giving you and then tell Cloud Code what you wanna put in there. You could point it at a whole folder. You can point it at a specific download.

It's just as easy. This is the only really weird thing you've gotta get used to is these two upload paths. But the actual question and answer is just plain language.

Plain language, even if you have you have the skills as well, which I also gave in the last video. But Cloud Code's also smart enough to understand the API structure of this whole thing because it's it's local. It's on your computer.

So that's really good when it comes to rag anything. I know the majority of this video was focused sort of on the technical aspects, but as you see, once we built that light rag foundation, actually adding rag anything on top of it isn't too hard, especially if we just use that one shot prompt I gave you.

There are some things you can tweak along the edges like anything when it comes to querying it. But really with Claude code, it's kind of in charge of all the weights that you can tune inside of LightRag.

And for that, I'm talking about if we go to the retrieval section, all the parameters here on the right. Again, Claude code knows which ones tend to be best for you. So overall, I hope this kind of explained how easy it is to set up rag anything and also how easy it is to add this level of functionality to your rag systems, which in many rag systems just isn't possible or it's very expensive.

And this is relatively cheap, especially with that whole minor u local parsing system we're able to set up. So as always, let me know what you thought. Make sure to check out Chase AI plus if you wanna get your hands on that ClodgeCode masterclass, and I'll see you around.

The Hook

The bait, then the rug-pull.

The ceiling of most RAG pipelines is not compute or cost -- it is document type. The moment a knowledge base encounters a scanned PDF or a chart embedded in a slide deck, the pipeline silently fails. This video is the fix.

Frameworks

Named ideas worth stealing.

05:12model

Two-Bucket Document Parsing

Text bucket (PaddleOCR -> LLM -> embeddings + entities)
Image bucket (screenshot -> vision LLM -> embeddings + entities)

MinerU splits any document into a text path and an image path before sending to an LLM. Each path produces its own vector DB and knowledge graph. All four artifacts are merged into one.

Steal forAny pipeline that needs to handle heterogeneous document types cheaply

CTA Breakdown