Big Idea

The argument in one line.

Hermes Agent now solves the three things that made every local AI agent impractical: it remembers your goals across sessions, multitasks without blocking, and lets you swap models mid-conversation so you stop paying premium prices for commodity tasks.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You already run Hermes Agent and want to know which of the recent updates are worth setting up first.
You are paying Anthropic or OpenAI API costs for routine tasks and want a concrete way to cut that spend.
You manage a content production workflow and want an agent that can run multi-step jobs in parallel while you film.
You want computer-use automation that works with whichever vision model you already subscribe to, not just Claude.

SKIP IF…

You have not set up Hermes Agent at all -- the video assumes familiarity with Telegram-based agent setup.
You are looking for a first-time introduction to local AI agents rather than a feature update tour.

TL;DR

The full version, fast.

Hermes Agent's latest release ships nine features that address persistent agent drift, memory loss, and cost bloat. The two structural upgrades are slash goal (a pinned multi-turn objective backed by a judge model that watches progress) and full session recall (cross-session memory with zero setup). Parallel execution comes via slash background, raw-idea-to-subtask routing via an auto Kanban, and cost control via slash model which lets you drop to a cheaper model mid-thread without losing context. For coding work, routing tasks to Codex CLI means Anthropic tokens are untouched for line-by-line generation. The host's ranked picks: slash goal, the curator, slash model, and native Codex.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 02:14

01 · Hook + proof claims

Personal usage proof, stakes the 9-update promise, no developer knowledge required

02:14 – 03:35

02 · Webinar sponsor block

June 3 live webinar pitch for AI agency offers

03:35 – 06:53

03 · #1: Slash goal

Pinned multi-turn objectives, judge model, Ralph loop, slash sub-goal

06:53 – 07:35

04 · Goal quality caveat

Vague goals break the judge model -- specificity is the lever

07:35 – 09:32

05 · #2: Memory upgrade

Full session recall, cross-session cache, tool call indexing, zero setup

09:32 – 12:22

06 · #3: Slash background

Five concurrent background tasks, task IDs, foreground chat stays live

12:22 – 15:01

07 · #4: Auto Kanban

Triage to Specifier to subtasks to parallel sub-agents, orchestration=auto, demo with filming day brief

15:01 – 19:00

08 · #5: Computer use (any vision model)

Previously Claude-only; now GPT-5, Gemini, Grok Vision -- ClickUp navigation demo, remote task marking from phone

19:00 – 20:33

09 · #6: The Curator

Auto-running 7-day skill pruning agent, ranked skill list, zero config

20:33 – 24:50

10 · #7: Native video generation

Text-to-video via Grok or Fal.ai natively in Telegram, robot bartender demo

24:50 – 28:10

11 · #8: Slash model

Mid-conversation model swap, preserve full context, auto-selection by task complexity

28:10 – 31:04

12 · #9: Codex as a worker

Route coding to ChatGPT/Codex CLI -- Opus plans, Codex builds, Anthropic API untouched, landing page demo

31:04 – 22:55

13 · Top 4 + CTA

Host ranks slash goal, curator, slash model, native Codex as the four most slept-on; community plug

Atomic Insights

Lines worth screenshotting.

Most AI agents lose your original goal by message 10 -- Hermes slash goal pins the target and spins up a separate judge model to check whether each response is still moving toward it.
The judge model in slash goal does no work itself -- its only job is to watch the primary agent and flag drift, acting as an independent reviewer on every turn.
Session recall costs nothing to set up -- once you update Hermes, your full conversation history, tool call log, and cross-session cache are indexed and searchable by default.
Slash background lets you fire five concurrent research or inbox tasks and keep chatting in the foreground while they run, each with a unique task ID for later reference.
Computer use in Hermes now works with any vision-capable model -- GPT-5, Gemini, Grok Vision -- so you are not locked to Claude to drive your screen remotely.
The Curator runs silently every seven days, ranks your skills by usage frequency, and prunes dead ones without any manual intervention.
Native video generation inside Hermes removes the need for a separate AI video subscription -- text-to-video runs directly from chat using Grok or Fal.ai as the backend.
Slash model lets you drop to a cheaper model mid-conversation without losing a word of context -- most routine tasks do not need the most expensive model.
Instructing Hermes to auto-select the model tier based on task complexity means cost savings happen automatically, not just when you remember to manually downgrade.
Routing coding work to Codex CLI means the build runs on your ChatGPT subscription, not your Anthropic API budget -- Opus plans, Codex executes.
The goal spec quality determines the judge model quality -- a vague goal like build me an app gives the judge nothing to check against; specificity is the lever.
The auto Kanban Specifier takes a one-line brief, produces a full spec, breaks it into subtasks, and dispatches sub-agents in parallel -- what used to take a full afternoon now finishes while filming one video.

Takeaway

Nine ways to stop babysitting your AI agent.

WHAT TO LEARN

The core failure mode of every local AI agent is drift -- losing the goal, losing the context, blocking you while it works -- and these nine updates address all three at once.

Pinning a goal with an explicit success criterion and a judge model is structurally different from just typing instructions -- the judge creates an independent review loop that catches when the agent wanders.
Cross-session memory recall costs nothing to configure; the value comes from being specific enough in your original requests that the indexed history is actually retrievable later.
Running parallel background tasks requires thinking in queues: fire multiple jobs simultaneously rather than waiting for each one to complete before starting the next.
A raw idea dropped into an auto-orchestrated Kanban only produces useful subtasks if the original brief is tight -- vague inputs produce vague specs, regardless of how sophisticated the Specifier agent is.
Model-switching mid-conversation makes economic sense only if you audit which tasks in your workflow actually require high-reasoning models versus which ones are cleanup, formatting, or lookup.
Routing code generation to a CLI worker rather than the primary agent changes the cost structure of building: planning tokens are cheap, but execution tokens on premium models add up fast across long builds.
A self-maintaining skill library compounds over time -- skills you use daily rise to the top, dead weight gets pruned, and the agent response quality on frequent tasks improves without manual intervention.
Computer use gains most of its practical value not from obvious demos but from the edge case: completing a task on your machine while you are physically away from it.

Glossary

Terms worth knowing.

Ralph loop: Hermes term for a goal that stays pinned across every conversation turn until explicitly cleared -- the agent locks on target regardless of how many messages have passed.
Judge model: A second agent spawned by slash goal whose only function is to evaluate whether the primary agent output is progressing toward the stated goal, flagging drift rather than doing content work itself.
Slash background: A Hermes command that dispatches a task to run asynchronously in the background, leaving the foreground chat fully interactive. Each background job gets a unique task ID.
Auto Kanban: Hermes built-in project board with a Specifier agent that converts a raw triage idea into a full spec, decomposes it into subtasks, and routes each to a sub-agent when orchestration is set to auto.
Curator: A background maintenance agent built into Hermes that runs on a 7-day cycle, scores all skills by usage frequency, and prunes low-use ones to keep the skill library clean without user intervention.
Codex CLI: OpenAI command-line coding agent that Hermes can use as a worker node, routing line-by-line code generation to the user's ChatGPT subscription rather than consuming Anthropic API tokens.
Slash model: A Hermes command that swaps the active language model mid-conversation to a cheaper tier, a different provider, or a locally-run model without losing any prior context in the thread.
Computer use: An agent capability that lets Hermes control the host machine's GUI by clicking, navigating, and interacting with desktop apps, driven by any vision-capable model the user has configured.

Resources

Things they pointed at.

02:14productAI Accelerators webinar ↗

01:34linkSkool community Systems to Scale ↗

20:33toolFal.ai ↗

24:50toolOpenRouter ↗

24:50toolDeepSeek ↗

28:10toolCodex CLI ↗

Quotables