You Should Install Hermes Agent This Weekend
Cheap 1M-context models changed the model layer. Claude Code and Codex changed the coding layer. Hermes is starting to look like the runtime layer.
This article was originally published on X last Friday (8 May).
The mistake is treating it as another Claude Code competitor. But the more useful setup is Hermes next to Claude Code or Codex.
Hermes is starting to look like the runtime layer that keeps long-running software work alive after the chat ends.
Hermes Agent v0.13.0 shipped May 7 with 864 commits, 588 pull requests, 295 contributors, and 8 closed P0 security issues in 7 days! Last update was 1,096 commits.
The reason to test it this weekend is that all three layers changed at once: cheaper 1M-context models that proved it can hold up under real coding work, stronger coding agents, and a Hermes release focused on persistence instead of demo features.
TLDR
Install Hermes v0.13 this weekend as the runtime layer next to Claude Code or Codex, not as a replacement.
What it ships: durable Kanban, persistent /goal, Checkpoints v2 with /rollback, gateway auto-resume, post-write linting, 8 P0 security fixes.
The weekend test: one real test-failure cleanup on one real repo. By Monday: one repo, one goal, one model route, checkpoints, one failing test path closed.
Provider: OpenCode Go bundles DeepSeek V4-Pro and MiMo-V2.5-Pro behind one key for $5 first month, $10 after.
What changed in the layers around the agent
DeepSeek V4-Pro went live April 24, 2026 at 1M context, currently 75% off through May 31 at $0.435 input and $0.87 output per million. MiMo-V2.5-Pro lands at $1 input and $3 output per million on OpenRouter under MIT license, also at 1M context. It scores 78.9 on SWE-bench Verified and 68.4 on TerminalBench 2.0.
OpenCode Go bundles both models into a $5-first-month, $10-after subscription. Model access at this context length is becoming less of the bottleneck.
Claude Code and Codex are the strongest foreground coding workers most teams already use.
The runtime layer is what breaks first when developers try to use these tools for real work. The chat ends. The context evaporates. The goal is forgotten. The bad edits stay. Models route to whatever was configured an hour ago. The repo’s conventions are not inherited. This is the layer Hermes is going after.
Their Tenacity Release
Hermes Agent v0.13.0, tag v2026.5.7, shipped one week after the Curator Release. The feature names read like a runtime. Durable Kanban. Persistent /goal. Checkpoints v2. Gateway auto-resume. Post-write linting. A security wave. Providers as plugins.
Kanban is a durable task board backed by local SQLite, designed for work that crosses agent boundaries, restarts, and human handoffs. Workers run as full OS processes with their own identity. Heartbeats prove liveness, missed heartbeats trigger reclaim.
Zombie detection catches workers that stopped responding. Per-task retry budgets prevent infinite loops. A hallucination-recovery gate verifies completion claims before tasks close, catching workers that report “done” on patches that never landed.
Kanban is less interesting as a board and more interesting as a survival layer for agent work.
/goal <text> sets a standing objective and starts the first turn. After each turn, an auxiliary judge model checks whether the goal is done. If not, the agent continues, up to a 20-turn budget. Subcommands cover /goal status, /goal pause, /goal resume, /goal clear. The state lives in the session DB and survives /resume.
So most agents remember the last message. Hermes is trying to remember the mission.
Checkpoints v2 rewrites state persistence with real pruning and disk guardrails. Snapshots run before write_file, patch, and destructive terminal commands. /rollback lists checkpoints. /rollback <N> restores to checkpoint N and undoes the last chat turn. /rollback diff <N> previews the diff. /rollback <N> <file> restores a single file. Gateway auto-resume keeps sessions alive across restarts and source-file reloads.
The release also closes 8 P0 security issues. Secret redaction is on by default. Discord role-allowlists are guild-scoped, closing a CVSS 8.1 cross-guild DM bypass. WhatsApp rejects strangers by default. TOCTOU windows close across auth.json and MCP OAuth. These fixes matter most when running Hermes as a self-hosted gateway exposed to chat apps. For local-only dev work, redaction-on-by-default is the relevant change. The unsexy half of long-running agent work, shipped in one release.
What Hermes was already doing
Most agent demos work for ten minutes. The hard part is the eleventh minute. By the time a real software task is half done, the agent has to remember the goal, inherit the repo’s conventions, route work to the right model, roll back bad edits, schedule background checks, survive rate limits, and hand specialized work to the best foreground coder for the job.
Hermes was already covering most of that surface before v0.13.
It reads .hermes.md, AGENTS.md, CLAUDE.md, and .cursorrules, and walks deeper into the repo as tools touch new directories supporting 7 terminal backends: local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox and also supports 26+ providers including Anthropic, OpenCode Go, OpenRouter, GitHub Copilot, and any OpenAI-compatible endpoint.
It runs cron jobs in two modes: full agent runs, or no_agent script-only watchdogs that cost zero LLM tokens. It rotates credential pools across multiple keys when one hits a rate limit and also exposes itself as an MCP server so Claude Code or Cursor can read its messaging state. It exposes delegate_task for short, synchronous subtasks where Kanban’s durability would be overkill. Delegation is a function call. Kanban is the durable queue.
Last week’s Curator Release added autonomous skill maintenance on a 7-day cycle. Last week solved skill decay. This week moved to work durability.
A coding agent that cannot roll back is not a developer tool. It is a patch generator with confidence. This is not a replacement pitch. It is a layer pitch.
So, What goes next to Claude Code
Hermes ships bundled skills for Claude Code, Codex. They are not competitors. They are the foreground coding workers Hermes hands implementation work to. Each skill teaches two modes: one-shot print mode for structured tasks, and interactive tmux mode for multi-turn refactor cycles.
The use-case split is the cleanest way to explain it.
If the task is “edit this file with me watching,” Claude Code or Codex may be the better tool. If the task is “keep this repo workflow alive across models, workers, schedules, memory, checkpoints, and messaging channels,” Hermes becomes the more interesting layer.
Hermes can be more powerful than Claude Code or Codex when the bottleneck is harness orchestration, not one-shot code generation.
How to actually use this on a real repo this weekend
Not an install tutorial. The setup commands are in the Appendix at the end. The point of this section is the first workflow.
The more useful question is what to do once Hermes is running. One workflow exercises the whole runtime in a single pass: a real test-failure cleanup on a real repo. It touches repo context, terminal commands, checkpoints, post-write linting, provider routing, and worker delegation in one sweep.
Drop into the repo and start Hermes:
cd /path/to/your/repo
hermesIf the repo has an AGENTS.md, CLAUDE.md, or .cursorrules, Hermes loads it on the first turn. If not, write a short .hermes.md with the test command, the code-style conventions, and the rule for when to fix the test versus the code.
Set the goal:
/goal Fix the failing tests in this repo. Run the test command, identify the smallest failing surface, patch one safe change at a time, stop when the test command passes. Use checkpoints before risky edits.
The point is to see whether Hermes can run the test command, identify failures, and patch them safely. Each write_file or patch lands a checkpoint first. Each Python, JSON, YAML, or TOML write triggers a delta lint. The judge model checks goal completion after each turn. A bad edit rolls back with /rollback <N>, or preview the diff first with /rollback diff <N>.
For multiple parallel failures, stand up a Kanban board:
hermes kanban init
hermes kanban create “Fix failing auth tests” --workspace dir:/absolute/path/to/repo
hermes kanban dispatch --max 1
hermes kanban watchEach task spawns a worker as a separate OS process. Heartbeats prove liveness, zombies get reclaimed, retries are budgeted.
For the actual code-writing, delegate. Hermes ships bundled skills that hand work to Claude Code, Codex, or OpenCode, either in one-shot print mode or in an interactive tmux session for multi-turn refactor cycles. Hermes keeps the harness. The foreground worker writes the code.
Use OpenCode Go as the provider. The $5-first-month subscription bundles DeepSeek V4-Pro and MiMo-V2.5-Pro (and more amazing models) behind a single key. Switch between models depending on whether the workload leans more toward code edits or agentic tool use. OpenCode Go caps spend at $12 per 5-hour window, $30 per week, and $60 per month, so plan a fallback key for long-running cleanup work.
Change the default endpoint stored in Hermes according to OpenCode docs.
The Monday outcome is simple: Hermes can read one repo, hold one goal, use one model route, create checkpoints, and help clean one failing test path without losing the thread.
Install if, wait if
Install if developers already use Claude Code or Codex daily, want long-running workflows that survive the chat closing, and want to test whether persistent goals, checkpoints, and durable Kanban actually change how a real repo gets worked on. One install, one provider config, one real test-failure cleanup is the whole weekend.
Wait if production controls, mature audit trails, or zero setup friction matter more than the runtime upside. macOS users with Python 3.13 hit installer conflicts. Docker users hit Node version mismatches and .venv permission issues. Remote terminal backends have a known environment-variable passthrough bug.
Discord lacks proxy support for restricted networks. Native Windows is not supported (coming soon) but you can use WSL. /goal uses a judge model, so completion checks can be wrong in either direction: a goal can be marked done before it is, or stay open after it should have closed.
Hermes v0.13 is “Tenacity,” not “Production.” Treat the install as a serious test, not a deployment.
OpenClaw has the louder marketplace story. 370,000+ GitHub stars, ClawHub registry, 50+ messaging channels.
Hermes is making a different bet. Fewer “install another skill” moments, more persistent runtime machinery around goals, workers, memory, checkpoints, and provider routing. Different bets, same category.
The next useful jump in coding agents may not be a smarter model. It may be the runtime that keeps the work alive after the chat window closes. Hermes Agent v0.13 looks like the strongest installable answer to that gap today.
Follow @AlphaSignalAI for more content like this. Also, Check our Harness Engineering workshop, 30 seats available, 150$ each.
Subscribe at AlphaSignal.ai for daily AI signals. Read by 280,000+ developers.
Appendix: install commands
Single-curl install. macOS or Linux only. Use WSL2 for Windows. Native Windows is not supported.
# 1. Install Hermes
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.zshrc
# 2. Configure provider + foreground worker
hermes setup
hermes model # pick provider opencode-go
hermes doctorFor full setup walkthroughs including provider OAuth flows, messaging-platform connections, and Docker backend setup, the official Hermes docs at https://hermes-agent.nousresearch.com/docs/ are the canonical reference.








