Big Idea

The argument in one line.

Giving an AI agent real utility requires giving it real access to your accounts, but a properly isolated architecture -- dedicated machine, MCP gateway, VLAN segmentation -- lets you minimize the blast radius to an acceptable level rather than choosing between useless and dangerous.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You already use agentic tools like Cursor or Codex and want to extend that power to non-code tasks like email triage, Slack categorization, and calendar management.
You are comfortable SSH-ing into a remote box, running background services, and babysitting agent work without panicking when something breaks.
You want a privacy-first alternative to cloud-managed assistant products where your OAuth tokens stay on hardware you physically control.
You have a spare machine or are willing to buy a Mac Mini or cheap VPS to dedicate entirely to an agent runtime.

SKIP IF…

You do not have a basic networking mental model -- VLANs, firewall rules, and port whitelisting are assumed throughout with no hand-holding.
You want a point-and-click personal assistant with no configuration overhead; this setup took the author weeks and he still calls it overkill.

TL;DR

The full version, fast.

Hermes Agent (by Nous Research) is a self-improving personal assistant that connects to your real data and builds Skills and cron jobs to stay useful over time. The author runs it on a Mac Mini locked to its own VLAN, with all OAuth tokens living on a NAS behind an Executor MCP gateway on a separate VLAN connected only via a single whitelisted port -- so a fully compromised agent machine still cannot reach your email or cloud accounts. The honest caveat: this is hyper-early-adopter territory. Start with OpenAI frontier models, lock in your workflows, then downgrade to a smaller cheaper model; that switch alone can cut inference cost by a factor of 100.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 02:49

01 · What Hermes Is

Introduction to Hermes as the evolution of OpenClaw; Skills-based memory demo inside a private Discord server; honest pitch that this is not for day-to-day coding.

02:49 – 04:46

02 · Sponsor: WorkOS

WorkOS auth platform -- AuthKit for consumer auth, SSO/SAML for enterprise, auth.md spec and MCP server authentication.

04:46 – 08:32

03 · Real-world usefulness and honest caveats

Personal anecdote: AT&T killed his internet for four days and he genuinely missed the agent. Caveats: requires technical understanding, not ready for non-technical users.

08:32 – 09:50

04 · Why Hermes over OpenClaw + model strategy

Python research project argument; frontier models first, downgrade after workflows lock in; $200/month OpenAI sub as the minimum viable inference budget.

09:50 – 13:30

05 · Security framing and Executor intro

The capability/surface-area tradeoff illustrated with email CRUD access; supply chain attack risk; Executor as an MCP gateway with disclosed YC investment.

13:30 – 15:46

06 · Architecture walkthrough

Mac Mini (VLAN 40) gets full root access; NAS (VLAN 30) hosts Executor with all OAuth tokens; Gmail locked to read-only at gateway level; zero personal accounts on agent machine.

15:46 – 17:09

07 · MCP composability problem

Why raw MCP requires 7-8 sequential tool calls vs. one CLI pipe; Executor code mode as the fix.

17:09 – 18:50

08 · Firewalla VLAN rules

Default-block all NAS traffic; two explicit allow rules -- port 4789 for Executor MCP and a backup folder protocol; already 12,000 hits in the first week.

18:50 – 21:01

09 · Permission system and sub-agent approval

Hermes built-in yes/no permission prompts in Discord; custom hook: sub-agent on low-reasoning model auto-approves low-risk operations, escalates deletes to human.

21:01 – 23:46

10 · Data preservation and OAuth setup

Why data preservation is the top concern; Codex desktop computer use for Google Cloud OAuth grunt work; GOG CLI and Composeo as easier alternatives.

23:46 – 24:41

11 · Wrap-up

Setup diagrams on davis7.sh/home-server; recommendation to paste diagrams into ChatGPT to get guided through your own setup.

Atomic Insights

Lines worth screenshotting.

The more capabilities you give an AI agent, the more useful it is -- and the larger the surface area for damage. There is no architecture that eliminates this tradeoff, only ones that minimize it.
Running a Hermes Agent on a dedicated machine with no signed-in accounts means a fully compromised box still cannot touch your email or cloud data.
MCP is not composable like a CLI: a seven-step bash pipe is one tool call; the same task via raw MCP requires seven sequential round trips.
An MCP gateway running in code mode lets the agent write its own tool chains rather than calling each tool individually, recovering most of the composability advantage of CLIs.
VLAN isolation between the agent machine and the NAS means the only permitted traffic is one whitelisted port -- everything else is blocked by default at the firewall.
Skills in Hermes function as a practical substitute for a general memory system, since no good general memory solution for agents exists yet.
Use frontier models while figuring out the shape of your workflows; drop to a smaller model only once cron jobs and Skills are locked in -- the cost difference can be 100x.
A sub-agent running on low-reasoning inference can auto-approve most permission prompts and only escalate destructive operations, removing the human-approval bottleneck.
OAuth token setup in Google Cloud Console is genuinely painful; Codex desktop computer use is currently the most practical tool for automating that grunt work.
Hermes is hyper-early-adopter territory: the value is real, but you must understand what it is doing under the hood to maintain and optimize it.

Takeaway

The blast radius framework for safe agentic systems.

WHAT TO LEARN

The right question for any AI agent setup is not whether it can be compromised but how much damage a compromise can cause.

01What Hermes Is

Skills function as a practical memory system when no generalized agent memory solution exists yet.

03Real-world usefulness and honest caveats

This category of tool requires enough technical depth to diagnose what the agent is doing when things break -- that bar is still real in 2026.

04Why Hermes over OpenClaw + model strategy

Start every agentic workflow on the most capable frontier model and only downgrade after the workflow is repeatable; a 100x cost reduction is waiting once you have stable cron jobs and Skills.

05Security framing and Executor intro

Giving an agent access to your real accounts is what makes it useful -- isolating where those credentials live is what keeps the risk manageable.
Supply chain attacks through AI-installed packages are a real and underappreciated attack surface for autonomous agents.

06Architecture walkthrough

Put the agent on a dedicated machine with no personal accounts signed in; the worst case for that machine is a token rotation, not a data breach.
Store OAuth tokens on a separate, VLAN-isolated device and route all tool calls through a gateway that enforces read-only or scoped permissions -- the agent machine never needs the keys.

07MCP composability problem

MCP composability gap means code-mode gateways dramatically outperform raw tool-list approaches for complex workflows; pick a gateway that supports it.

08Firewalla VLAN rules

Default-blocking all NAS traffic and whitelisting only the MCP port limits what a compromised agent machine can actually reach.

09Permission system and sub-agent approval

Permission prompts do not have to be a human bottleneck -- a cheap sub-agent on low-reasoning inference can auto-approve low-risk tool calls and escalate only destructive ones.

Glossary

Terms worth knowing.

Hermes Agent: An open-source self-improving AI personal assistant by Nous Research that builds Skills from experience and integrates with tools via MCP, designed to run continuously on dedicated hardware.
OpenClaw: An earlier open-source personal AI agent framework that Hermes is compared against in this video; the predecessor that popularized the Telegram-based interface model.
MCP (Model Context Protocol): A protocol that lets AI models call external tools and data sources. Each tool call is a separate round trip, which limits composability compared to shell piping.
Executor: A YC-backed MCP gateway tool (executor.sh) that acts as a permission layer between an agent and its data sources, storing OAuth tokens off the agent machine and blocking destructive actions at the gateway level.
VLAN (Virtual LAN): A logically isolated segment of a local network. Used here to put the agent machine and the NAS on separate VLANs so they cannot communicate except through explicitly whitelisted ports.
Firewalla: A consumer network security device that acts as a firewall and VLAN manager, used here to enforce the default-block rule between the agent machine and the NAS.
Skills: In Hermes, reusable instruction sets the agent builds and refines over time to remember preferences and workflows -- the practical substitute for a generalized memory system.
Code mode: An MCP gateway feature (available in Executor) where the model writes code to discover and call tools rather than selecting from a flat list, enabling multi-step pipelines in a single invocation.
Supply chain attack: A security attack where malicious code is injected into a legitimate open-source package. Particularly relevant for autonomous agents that install packages without human review.
NAS (Network Attached Storage): A dedicated storage device on the local network. Used here as the trusted, more-secure host for the Executor MCP gateway and all OAuth credentials.

Resources

Things they pointed at.

00:00productHermes Agent ↗

02:49productWorkOS ↗

11:40toolExecutor ↗

18:20productFirewalla ↗

23:00toolGOG CLI

23:10toolComposeo

23:46linkdavis7.sh/home-server ↗

Quotables

Lines you could clip.

10:20

“You should initially build it on the frontier, most powerful models, cost be damned. Once it is locked in, swap it over to a smaller model and you will cut your price down by a factor of a hundred.”

Quotable heuristic with a specific number that applies to any agentic product.→ TikTok hook↗ Tweet quote

10:55

“The more capabilities you give them, the more useful and powerful they get. But at the same exact time, you are also increasing the surface area for damage it can do.”

Clean two-sentence expression of the central tradeoff of agentic systems.→ IG reel cold open↗ Tweet quote

16:02

“The true problem with MCP is not any of the stuff with the spec or anything like that. It is the fact that MCP is not composable in the way that CLIs are.”

Contrarian reframe that cuts through the spec-debate noise; high share value in developer circles.→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

Wanna talk about Hermes agent. This is not a piece of technology that I ever really imagined myself finding all that useful. It is kind of like the evolution of OpenClaw.

If you're not familiar with OpenClaw, what it is is it's this new brand of agent effectively, kind of like a coding agent, except it has a lot more features built into it, like integrations, cron jobs, skills, gateways, all these different things you need to make a customizable personal assistant type program.

And I've tried using these things a couple times over the few months, I just never really could get over a couple hurdles, which we're gonna go over in this video. These things are not perfect. I am very impressed with Hermes, I think it's awesome, and I do highly recommend setting it or something like it up.

But there's catches. The pitch of these things being your own customizable personal assistant, it is very possible to get it there. I have it there at this point where I am doing an insane amount of like, not dev work, like this is not something honestly for day to day dev work writing code.

This is more for everything that surrounds that in like more normal business y type things, which is not what I usually talk about on this channel, but again, just hear me out on this one. It is really, really cool what you can do with this. And I think the best place to start here is actually in a personal Disc Discord server.

I have my personal home that only has me and my bot in it. This is the place where I communicate with my Hermes agent. Most people are using OpenClaw or Hermes with Telegram.

I don't really know how that picked up, but it is pretty convenient to set up, and it is a really cool experience to just send a text and get a response back that sounds vaguely human and is doing personal assistant y type tasks, but it's honestly not a great experience for doing normal work stuff. I apologize that, like, I wanna show deeper stuff within the Discord, but this is all, like, very private sensitive information that is both personal and business related, which I unfortunately just can't really show on camera.

You just gonna have to take my word for it that the fact that you have these threads over here is a much better system for interacting with it. Like, for example, I have this right here where I sent a message. I added Columbia one, which is what I named my bot.

If you wanna know why I named it that, just go read the Sun Eater series. It's from that. I won't say anything more.

But then you just give it a prompt or a question or whatever, and it will go through and actually do it. It's doing some skill views because Hermes is very deeply built into skills. We'll talk about that in a bit.

It's looking at the cron jobs it has set up, and also taking a look at the Discord response style skill, which is not actually an included skill, I believe. I think this is one it automatically added in for me as I was using it. Because a lot of the reason why I've been enjoying Hermes a lot is because the team is doing a ton of work to make it self improving, quote unquote, where like, everyone knows that these agents would be more useful if they had a good memory system.

That's obvious. The problem is we don't have good memory systems. We have good solutions for individual pieces of it, but there's no good generalized memory solution for agents at this point.

It just doesn't exist. So their solution is Skills, and honestly, they work pretty remarkably well. This response is formatted the way I like it formatted.

It's giving me good information, telling me what my skill groups are. I have stuff like Gmail, Google Drive, Notion, Todoist, read only Slack, x analytics, YouTube databases, knowledge management stuff with g brain, because I am actually using g brain.

All all memes aside, it's actually quite good. And, you know, a bunch of other stuff in here. I also have a bunch of cron jobs set up in here for like ingesting stuff into the g brain at the end of the day.

Some reminders coming up for when I'm gonna be flying again in the future, because it has access to my email, it can figure that out, which is really nice. It's also constantly checking my emails for me, because I have way too many inboxes at this point, because I technically speaking have about five jobs right now. It's a mess.

It helps a ton to have a first pass from an agent at this point, also synchronizing a bunch of data from YouTube, it's really good at doing that. But as useful as it is, I don't know if everyone should honestly be using this at this point. I see the vision, I know where it's going, but it's not there yet.

It has a lot of problems, which fortunately, today's sponsor doesn't. Today's sponsor WorkOS is a one of a kind auth solution, because it is the only one that can take you from your first commit all the way up to your first enterprise contract. When you're first getting started, they've got AuthKit to get you a really nice experience of just adding that sign in with Google button to your site and shipping it to some end users.

Then as you grow a little bit more and you need to start selling to businesses, which is happening earlier and earlier for real companies these days, especially as all this AI stuff takes over. They are the one you want when you start doing these enterprise contracts. SSO, SAML, Okta, all of these insane words that I don't even wanna think about.

They handle them for the biggest companies in the Valley, OpenAI, Anthropic, Cursor, Perplexity, Vercel, you get the picture. But they even go beyond just that. They are the only auth provider I know of that is taking agents this seriously.

For one, they have the new auth dot md spec that they introduced that's a full open spec anyone can use that makes it really easy for an agent to sign into a platform that has an auth. Md root on its site, and they're also the only sane way to authenticate MCP servers. I have tried a dozen different solutions for this.

They are the ones who got it right, they invested it early, and it has definitely paid off. Work OS has everything you need to do off right, and considering you get 1,000,000 users for free, there is no reason to not build on top of it at davis7.link/workos. So like I said, Hermes Agent is great.

I do highly recommend looking into this. This is clearly where things are going. The fact that I can just open up my phone, go into Discord, send it a prompt, and get useful information out of that is absurdly useful.

This has changed the way I do a lot of things. I really can't imagine living without this to the point where, like, when I was back in Ohio last week traveling to see family, AT and T, being AT and T, fucked up the account transfer between my old roommate and me, because he moved out. So we needed to transfer the Fiber Line over, and when they did the account transfer, they screwed it up so that I didn't have internet for about four days last week, which meant that while I was back in Ohio, all of my home server stuff, my Hermes agent, my NAS, everything just went offline, and god, I missed it.

Once you get it going and you get it set up right, it's worth it, but I went through hell to set this thing up right. I do not think that everyone needs to go as insane as I did to build like a crazy home server rack for this thing. Like this is very overkill and very much just me being paranoid right now, and this required a pretty deep technical understanding of how all these things fit together.

Am I an expert in any of this? No. But I understand it well enough to talk to the agents and have them configure everything for me, babysit their work, and make sure they're doing everything correctly.

I cannot fathom a nontechnical person setting this up in a way that wouldn't be a horrific security nightmare and cause infinite numbers of problems. Like, will these things get there someday? 100%.

Like, I can see the vision at this point. It's going to happen. It's a matter of when, not if, but they're not there yet.

If you want to use this today, you need to know how all this stuff fits together to get the optimal experience. Even with things like the skills, knowing how these things work helps a ton. There's this weird design decision, which I understand why they made it, and I think it's the correct choice.

Hermes' agent is more abstracted than anything else I've ever worked with, and we're definitely trending in this direction. This is way software's been trending for the last thirty years. We're trending to higher and higher levels of abstraction, and bigger and bigger black boxes.

Where when I'm just looking at the Discord server with my Hermes Agent, I can't intrinsically see what all the skills it has are, what data sources are hooked up in there, what all the cron jobs are, how all this stuff fits together. It's kind of abstracted away from you. You have to either just trust the black box machine to work nicely, or you have to do a lot of investigating yourself by like SSH ing into the Mac Mini, or maybe you're running this on a VPS, and exploring it with Codex or Pi or something like that.

I've had a lot of good luck with that, just like I have a Pi instance on that Mac Mini that I will SSH into. Open up Pi, talk to it about Hermes, give it some documentation, go back and forth, figure out how all this stuff needs to fit together properly. This stuff isn't easy, and at least it's not easy yet.

I think we're very much in the hyper early adopter phase of all this stuff, so if this is cool to you and you've watched all this, fuck yeah. Go play with this, go set up your own setups, go figure out what does and doesn't work, come up with cool creative workflows, because the things this unlocks is insane. I don't have it pulled up here for obvious reasons, but one of the things I've been doing with this is anytime we're working on a deal with a company for my channel or Theo's channel or whatever, there's a lot of emails that go into that, a lot of Slack messages that go into that, a lot of resources that get shared back and forth.

It's kind of annoying to keep on top of all these things, and especially considering I'm spread so damn thin right now. It's hard for me to really be paying attention to these things, but the fact that I now have an agent that runs in the background and can just categorize all these things for me, figure out what's going on, give me deep breakdowns of exactly what's going on, is so powerful.

It is so useful to have. Even dumb little things, like when I was flying out to Ohio a week ago, I wanted to check whether or not my planes would have Starlink on them. I was just like, hey, can you check whether my flights later this week will have Starlink on them?

I went into my email, found the confirmation info, went and checked on the flight, did a bunch of research on it, and came back and gave me an actually useful answer to that question. I this stuff. I've been having more fun with tech over the last, like, six months than I've had in, honestly, a very long time.

The overall problems I'm now working on and solving in tech and when I'm writing code are so cool. Because we don't know what the ultimate form factor and solution for this is. Like, maybe it is the Discord server with sending it a bunch of messages, but maybe it's not.

Maybe it's a really complicated web app or desktop app that just dynamically generates UI on the fly for you, and is like super proactive in recommending things when it has these deep integrations of all your different data sources. Or maybe it's something I haven't even thought of yet.

It's really fun to be working on these problems that we don't have answers to yet, and just kinda get to fuck around and experiment and see what actually works. I love it, and if all of this sounds cool to you, go try this stuff out. When I first tested these things out, I was doing everything with OpenClaw, not Hermes.

I actually found OpenClaw to be pretty good, but there was a little bit of weirdness with it that I didn't love, and I heard some really good things about Hermes. I was initially turned off by the fact that it's a very heavy Python project, because it's like, ugh, Python, gross. But the more I thought about it, the more I'm like, wait, actually, I think that makes a lot of sense, because the type of person who would write a giant Python project like this is the type of person who's probably a researcher, and I think researchers are the right people to be making this.

I know it's just a personal assistant AI blah blah blah, whatever. I don't care. This is a research problem.

They are actively trying to figure out how you make a self improving AI agent powered by the Frontier AI models right now. What does that look like? How do we do that?

They're working on a memory system, they're working on skills, they're working on the cron jobs, they're working on the interfaces for all of this. It's a very cool novel problem, and I've been very impressed with their team overall. Last recommendation here is when you're using Hermes Agent or Open Claw, use the most powerful frontier models.

Clawd will not let you do almost any of this. Your token $200 a month budget on your $200 a month plan, and then you're gonna burn through that in like an evening, because Opus models are slower, more expensive, less efficient, just I wouldn't recommend them, even if Claude can feel a little bit nicer to talk to.

Five Five made a lot of strides in making the OpenAI models nicer to talk to, and I'm hopeful that that trend will continue into the future. If you give it a good SoleMD type thing, and give it a lot of direction on how you want it to sound and act, the OpenAI models are really good at doing what you say. That's actually a big reason why I like them better.

The only reasonable subscription for getting enough inference to run the Hermes agent or OpenClaw agent is definitely OpenAI. The $200 a month OpenAI sub will not get you unlimited usage here.

I've been running five five x high for the main model behind my Hermes agent for a while now. The usage doesn't burn down that fast, and I do intend to switch this to a smaller, faster model, or maybe even like five five low reasoning eventually. The reason why I'm doing high reasoning is because it's an undefined problem.

The way I like thinking about these at this point is extra high reasoning is great for when you're trying to figure out what the shape of this is. You have a general idea, but you're, you need to figure it out, and that's where this is right now. But once my workflows are locked in, I have the skills, I have the cron jobs, I have the entire pipeline really clocked in, then I'll just drop it down to a smaller, more efficient model that can just gun out the workflow.

It's the same thing with development. Like if were developing an agent system of some sort that's being served to end users, you should initially build it on the frontier, most powerful models like cost be damned, whatever, but once it's really good and you know what the workflow looks like, then once it's locked in, you swap it over to a smaller model, let that smaller model do it, and you'll cut your price down by like a factor of a 100, something insane like that.

But the thing I do wanna talk about in this video is how do you set this thing up in a way that is safe ish, like, because there's always gonna be a level of risk, and secure ish. Effectively, what I wanna talk about is minimizing the potential damage this thing can do, because we're at this point where there's this really weird balance that you have to strike with agentic systems, or whatever you wanna call them, where the more capabilities you give them, the more useful and powerful they get, but at the same exact time, you are also increasing the surface area for damage it can do.

To just give you an example of like a very basic one like email, if you want your agent to be optimally useful for email stuff, having it write drafts for you, delete spam emails, clean out all like the marketing nonsense you get is really useful. But in order to do that, you are now giving the agent full access to read, write, update, and delete.

You're giving full crud access to your email to an agent that is a non determinism machine that is vulnerable to things prompt injections, or even more nefariously, in my opinion, stuff like the supply chain attacks. I haven't talked about this huge amount on my channel, but I've definitely gone through the security psychosis thing lately of watching NPM get hammered with these horrific supply chain attacks that are really bad, really fuck up your computer, and they're on big major repos, because GitHub and NPM is just kind of a mess right now, and we're trying to figure out how to deal with the fact that AI can push so much code out, and AI is also doing a lot of installing for us, it's not very hard to imagine an autonomous agent running, doing its thing, accidentally installing a supply chain attacked version of a package, and fucking up your whole machine, leaking all your credentials, and suddenly your email is gone.

Your email has been compromised. It sucks. I did not want this to happen.

I have some pretty damn important email inboxes that I want to keep locked down, so I went overboard on this. And the way I actually did the securing for this is with a program called Executor. Full disclosure, this is created by a very close friend of mine, Reese.

He just got into YC with Executor. He is funded, and I was actually I think I was his first check I invested in his company. So obviously, I'm going to naturally be biased there, and I wanna disclose that fully, but I still do think that this is a very good solution for this, because effectively what it is is an MCP gateway.

The way my Hermes agent works is it is running on my Mac Mini, which is sitting right here, and that Mac Mini, the Hermes agent has full root access to it. Its profiles are on there.

You can do computer use stuff. It can read and write to the file system as much as it wants. It can see whatever it wants.

Like, I don't want to constrain it and sandbox it on the box. So it has a dedicated computer that is isolated at, obviously, the kernel level, because it's its own machine, to do whatever the hell it wants. I'm not signed into 1Password on that thing, not signed into any of my Google accounts on that thing.

It is very much locked down to just be Hermes agent running on there, plus my OpenAI Codec subscription. That's the one thing that's annoying that I have to just put on there. But it's not the end of the world, like worst case scenario, if this thing really did get fucked up, I just roll the token on that, and then it's whatever.

But everything else, like Slack auth, Gmail, Notion, etcetera etcetera, none of that lives on the Mac mini. It lives on Executor. Because what Executor functions as is an MCP gateway type thing, which I think is critical for doing these things in the most secure possible way to minimize damage.

This is the full set of tools that the Hermes agent has access to. You'll notice the IP address up here is like a local address. It's one nine two one six eight thirty ten.

That is the IP address for my NAS on my local network. Again, if you wanna see more networking stuff, go watch the video I just made about my home server. What I can do here is see all the tools that are configured to be used by my Hermes agent on the Mac Mini.

And if you go into any of these, like, specific sources, like, say we go into the Gmail Foundry. This is the Gmail address for my personal company. The way this works is if you go into the tools, I can see I have users, and then you have a bunch of different tools in here like Get Profile.

This is marked as green, so it is allowed to get the profile and call that tool. It is not allowed to delete, or import, or insert, or modify, or send, or trash, or untrash, or do any of these different destructive actions. I have those blocked at the MCP gateway level, and since this MCP gateway is running on the NAS and not on the Mac mini, it's a lot more secure.

None of the OAuth tokens are on the Mac Mini, so even if that machine gets completely compromised, my email is not going to get compromised. And I have the Mac Mini and the NAS isolated from each other because they are each on their own VLAN, like a virtual LAN within my private LAN. Again, crazy networking stuff.

I am entirely figuring this out as I go along. The way it works is if you look at the container manager on my NAS, and you go into executor here, this guy in the YAML configuration is exposing two ports. The first one is the UI, so it's exposing the UI for Executor, which is what I just showed, on 04/07/1988, and then it is exposing the MCP server, which is a thing that the Hermes agent can connect to.

And what's nice about Executor is the way it works is its code mode. So I've talked about this in the past. Oftentimes, of the issues with MCP is, obviously, you're just looking at this list, this is a lot of tools.

Even though we're disabling a lot of these, if we didn't disable all of them, there would be 79 tools exposed just on the Gmail Foundry. That's not even counting all of these other ones, each of which have at least 20 tools, often into the hundreds of tools. That's not feasible to just dump into a model's context.

Even the frontier level ones, I just wouldn't wanna do that. So instead of doing that, you can use an MCP gateway like this, where the model can write code to find the tools that it has access to, and then write code to actually execute the tools, which solves the other major sin of MCP, and this is one that I'm honestly sad that I didn't figure out was such a big issue until I was listening to Mario the creator of Pi talk about it.

The true problem with MCP is not any of the stuff with the spec or anything like that. It is the fact that MCP is not composable in the way that CLIs are. If you think about what a CLI is, you can use CLIs like pipe.

You can do a grep, and then you can pipe that into a bash script, and then you can pipe that into like a file writer or something like that, whatever you wanna do, and you can do effectively like five or six steps in one tool call. The model can write these chains of pipes as it goes. It can't do that with MCP.

It has to call the MCP, get the data back from the MCP, then call the next MCP, get the data back from that, and keep going down and down, to where if you were doing the same task in CLI bash stuff versus in MCP, you would have to do seven or eight tool calls just to do the MCP version, when you could just do one tool call in the CLI version.

So humongous difference there. Code mode does help a ton with this, and since Executor has this built in, it is a much better way for the models to actually use and call tools. But the biggest thing about this is, like I said, the isolation.

If you just look at this kinda shitty diagram I threw together here, you have the Mac Mini running on VLAN 40, you have the NAS running on VLAN 30, and then I am using the Firewalla, which is a really cool piece of tech. I highly recommend getting one of these, even if you're not doing the crazy network I'm doing.

Just get like the Firewalla orange. This is what I set up at my parents' place for my secondary off-site backup NAS in their network. It's very simple, very easy to do, and it effectively acts as a gateway in between the WAN, which is the Internet, and your local network, and gives you a lot of control over what's actually happening there.

They don't have a great desktop experience, but their mobile experience is really, really good. And what I did for the NAS VLAN is I set up a bunch of rules here where by default, I am blocking traffic to agents, which is agents VLAN 40, and I'm also blocking traffic to main, which is VLAN 10. That's the thing that, like, my Eero router is on.

If you, like, connect to my WiFi, you will just be on VLAN 10. That's the default main LAN. All traffic is blocked to and from the NAS by default everywhere else on the network.

So like, if someone came on my network and they had a compromised device or were malicious, they would have no way to actually get into the NAS, because all of that network traffic would be blocked at the firewall level. It's really powerful to have this. But you'll notice at the top here, have two little things, the allow executor m c p, and allow backups to agents.

So what I'm doing with this is I'm making it so that on the Mac Mini, the only way it can communicate over to the NAS on my local network is over port four seven eight nine, which is the actual MCP endpoint for calling and using all of the different tools and sources I have configured here. It gets used a ton.

Like, I've already had this rule hit, like, 12,000 times in setting this up, like, a week or two ago. It's constantly hitting this, constantly doing stuff with this. But again, even if Mac Mini gets compromised, it is not going to be able to do any real damage, because it'll just be able to use the stuff that is available from the executor m c p, still have a lot of permissions in there that I would not love to get leaked, but it's recoverable.

And it's not like they're getting root credentials that they can deeply fuck up my account, it has limited scope and access, and that's the most important thing here, is good permission systems are really hard to do. The other one, of course, is allow backups to agents. That's just because I have a special user and folder set up on my NAS, so that the agent can use one specific folder over I forget what the protocol is called.

It's the thing that lets you do, like, virtual file systems over the network. Again, I'm not a networking guy. I'm figuring this stuff out as I go.

It's nice to give it the ability to, like, save daily snapshots onto the NAS, so again, if it gets compromised, I can restore it later. Same thing with like backing up some AI models I like using, YouTube archive, that kind of thing. Now all of that is to say, the Hermes agent team is still doing some really good work to make this thing as secure as possible.

Their permission system is remarkably robust. Like, I have found that it's just a really good job of catching any time the agent is doing something even remotely dangerous, and it will surface a permission, yes or no, to me in Discord. Just ping me like, hey, is this okay, approve or deny?

And it works really nice within the Discord UI. That is built in, they're doing good work with that. I was even able to extend on top of it, because their API system is quite good.

I added in a hook so that whenever those permission checks come in, I actually have a sub agent go off, which is using a smaller model, well not a smaller model, it's using five five on low reasoning, to take a look at the history of the thread, what the model's asking to do, and give a quick yes or no on whether or not that should be allowed right out of the box.

If it is extremely sensitive, like if it's doing a delete or something like that, it will still escalate that up to me, but for most day to day things, like if it's just like writing to a file or piping between things within bash commands, because God, you should see the bash commands, this thing outputs, it's insane, that I don't have to manually approve, which is quite nice.

It's very similar to the new auto mode in Claude code, if you're familiar with that at And so with all that said, you're still probably looking at this and thinking, okay, cool, but there's definitely still holes in this. There are ways this could fall apart. It still has access over that port, and that port still does expose things like reading email.

Compromised. That is the hard reality of this, that in order to make these things useful, you have to give them a lot of power and take on some risk there. Nondeterministic machines can do nondeterministic machine things, and really what I'm trying to do with this whole setup is I'm just trying to minimize the blast radius of the worst possible case scenario.

The reason I set up the NAS, and the reason I have the like cross country backup and all that stuff, is because the thing I am most worried about right now is data preservation. I do not want to lose my most important stuff. There's a lot of things that I have in there that I want to keep for a very, very long time.

Don't wanna risk that. Everything else is recoverable. Critical old data, photos, things like that is not recoverable, so this whole system does a lot of work to make that as close to impossible as humanly feasibly possible.

The other problem here to run into here is the fact that it is really hard to hook up all these data sources in a way that actually feels good and works well. Honestly, the only reason I have all of these hooked up the way I do is because the Codex desktop app computer use is as good as it is. In order to get all of these different data sources working, you have to actually set up a client within the Google Cloud dashboard with the proper permissions and scoping and all that different stuff to log in to this thing.

That's a pain in the ass to do. One, you have to open up the Google Cloud dashboard, which I would not wish upon my worst enemy. I guess Codex now is my worst enemy, because I made Codex do all of this.

The Codex desktop app was quite good for just like going in, making the credentials, doing the shitty annoying grunt work of clicking through the dashboard, getting all those keys, copy pasting them into the executor dashboard, getting all that set up and ready for me, starting the sign in, getting it all the way up to the finish line, and then asking me, hey, does this look good?

Can I actually add this to your executor profile and sign into it? I don't think it's a perfect solution, but it's the best we've got right now. I hope that these companies come up with ways to make integrating their stuff a lot less painful in the future, but as of right now, OAuth scopes and a bunch of weird stuff like that is something you really do have to deal with, so if you're trying to do this at the most optimal level like the way I am here, you kinda have to go through all of this.

There are other solutions that make this easier, like there's an open source one called the GOG CLI, which I believe was created by the creator of OpenClaw Peter. Effectively what that does is that wraps the Gmail API with a nice little CLI that makes it way easier for you to sign in, manage permissions, do all that stuff for connecting up Gmail.

Again, the way you make these things useful is you give them access to a lot of other stuff. Composeo is also a really good option for all this stuff. It is a much more managed solution that will handle a lot of the permissions and gateway stuff for you.

Much easier to work with out of the box, much less pain that you have to go through, but I, you know, I'm insane. I really like having huge amounts of control over all this stuff, which is why I went for Executor, which as of right now is a much lower level, more customizable version of all this that just worked quite nicely for me.

So yeah, Hermes Agent is dope. If there's something else you wanna hear me talk about on this, like more specific workflows or recommendations on how you should tune yours, happy to make that in the future.

I'll have a link down below a bunch of resources on how I did my setup, like I have this page on my site for my home server. All of the details and instructions aren't in here, but what is nice about this is this diagram, or this diagram for the smaller setup, if you just copy paste these into, like, chat gpt.com, and just start asking it about how you should do this setup, it'll do a really good job of guiding you through it.

This will just give you like the pieces to seed the context with, and then you can go figure it out for yourself. I don't wanna prescribe things right now. We don't know enough, I don't know enough, it's too early.

You need to go experiment and play with this for yourself, and come up with your own solutions, and make sure you go in with the attitude of trying to understand what it's doing, because that will help a ton, from everything to, like, maintenance, and also just making it more powerful. If you understand you can optimize it a lot better than just, like, blindly asking the model to regress to the mean and see what it comes up with.

If you like the video, make sure you like and subscribe, and I'll talk to you again soon.

The Hook

The bait, then the rug-pull.

Ben Davis opens with a confession: Hermes Agent is something he never expected to care about. Twenty-four minutes later, it has changed how he manages email across five jobs, tracks sponsorship deals, and checks whether his flights have Starlink -- all from a Discord DM.

Frameworks