MemoireMemoire
Reference

Status & health

Three ways to check Memoire is healthy, in increasing order of detail.

1. Public health badge

For uptime monitors, status pages, or just a sanity check:

curl -s https://api.trymemoire.com/readyz | jq .ready

Returns true when all subsystems are green, false otherwise. The HTTP status code follows the same rule (200 vs 503), so you can wire it into any readiness-check-aware tool without parsing JSON.

2. Dashboard status page

Sign in and visit /dashboard/status. You get:

  • Live ready/degraded banner with uptime
  • Build version, commit, build time
  • Fly.io machine ID, region, and image ref
  • Per-subsystem status (memory, runners, LLM router, Slack, watchdog, activity bus)
  • Last 20 errors with stack traces — useful when something flaked and you want to know if it was a known issue
  • Live event stream — Slack status-emoji transitions, runner spawns, procedure phase changes — auto-refreshing every 10s

3. Raw diagnostics endpoint

For programmatic access or to grep with tools you already use:

curl 'https://api.trymemoire.com/diagnostics?limit=50&category=slack-status' | jq

See the full schema on the Gateway API page.

What the subsystem checks mean

  • memory— Core memory blocks loaded. Failing here usually means the org directory wasn't initialised (rare).
  • runners — At least one coding agent (Claude or Codex) is reachable. Both being down means new code-shipping tasks will fail; planning/research still works.
  • llm_router — At least one LLM provider has a healthy key. A direct API key counts as healthy even when the multi-account router is offline.
  • slack— Number of connected workspaces. Zero is fine if you're not using Slack; non-zero means socket-mode is live.
  • watchdog — How many subprocess runners the watchdog is currently tracking. Non-zero is normal; means a task is in flight.
  • activity_bus— How many SSE clients are subscribed (the dashboard's live activity feed).

What we monitor on our side

Internally, the gateway pages on:

  • /readyznon-200 for > 60s (uptime monitor)
  • Runner mortality rate > 5% (process watchdog metric)
  • Webhook signature failures > 0 (security alert)
  • Stripe webhook backlog > 30s (billing alert)

Public status page at status.trymemoire.com (planned).