— the system message is the contract
Your AI agent is replying.
But is it right?

Production-time monitoring for conversational AI. We turn your system message into rules, score every reply, and alert when behavior drifts.

↗ no demo gate · no SDR call
SYSTEM_MESSAGE · acme-pet-shop
MONITORED
You are a customer service agent for Acme Pet Shop.
Be friendly. Use the customer's first name when known.
// rule_001 · tone soft · qualitative
Never quote prices below R$ 50.
// rule_002 · constraint hard · deterministic
Always escalate angry customers to a human supervisor.
// rule_003 · escalation hard · qualitative
Never invent products outside the catalog.
// rule_004 · hallucination critical · qualitative
Reply in the customer's language.
// rule_005 · process soft · qualitative
02 · the judge

Every reply, scored against your rules.

An LLM judge reads every conversation, line by line, and scores it against the rules extracted from your system message. Pass, fail, or partial — with the exact span that triggered the verdict.

acme-pet-shop online · AI agent
WHATSAPP
Judge verdict · gpt-4o-mini · 1.4s
RULE FAILED
overall
0 / 100
passes
4/5
fails
1/5
latency
1.4s
per-rule breakdown
rule_001 · tone
✓ pass
92
rule_002 · constraint
✗ FAIL
0
Agent quoted R$ 32,90 — below the R$ 50 minimum stated in the system message.
"…is R$ 32,90 a pack…"
rule_003 · escalation
— n/a
rule_004 · hallucination
✓ pass
100
rule_005 · process
✓ pass
100
03 · the dashboard

Watch it all, in one room.

Score timeline. Per-rule violation rate. Recent conversations with the exact failures highlighted. Drift events caught and tagged.

All your bots side by side. One workspace, no tab juggling.

acme-pet-shop
clinica-vetmais
restaurante-tio-paco
+ add bot
range
24h 7d 30d
overall score
0.0 ↓ 4.2
vs 7d baseline
conversations · 24h
0 ↑ 11%
241 evaluated · 10% sample
drift events · 7d
0
1 OPEN
last: 3h ago · rule_002
eval cost · 7d
$0.00
8.4% of agent cost
Score timeline last 7 days
healthy 80+
warn 60–79
critical 0–59
DRIFT · rule_00287.4apr 30may 1may 2may 3may 4may 5today
04 · the alert

Your bot just slipped. You'll know first.

When the score drifts past your threshold, rendfly fires through every channel you've wired. Whoever is on call gets the exact rule that broke and the conversation that triggered it — within seconds.

rendfly-ops · #alerts
rendfly
APP
3:14 AM
DRIFT · CRITICAL
acme-pet-shop · score down 12 pts in 6h rule_002 violated 17× — see flagged conversations
LATEST FLAGGED "…is R$ 32,90 a pack…"
3:14
rendfly monitoring · acme-pet-shop
TODAY · 3:14 AM
DRIFT · CRITICAL
acme-pet-shop · score down 12 pts in 6h rule_002 violated 17× in last 6h. Agent quoted prices below R$ 50 minimum.
LATEST FLAGGED "…is R$ 32,90 a pack…"
View 17 flagged conversations →
3:14 AM
rendfly alerts@rendfly.com
to you · 3:14 AM
CRITICAL
Drift detected · acme-pet-shop down 12 pts rule_002 violated 17× in the last 6h. Average score dropped from 91.6 → 79.4.
LATEST FLAGGED SPAN "…is R$ 32,90 a pack…"
Open in dashboard →
05 · how to connect

Three paths in. One workspace out.

Pick whichever fits your stack today. Proxy mode is the default — change a single URL, ship in 30 seconds. API and SDK modes are there when you need them.

01 · PROXY
RECOMMENDED
Change one URL. Universal across providers — OpenAI, Anthropic, Gemini, Groq, Mistral. Sub-10ms overhead.
main.py
from openai import OpenAI
client = OpenAI(
base_url="https://api.rendfly.com/openai",
api_key="sk-rendfly-prod-…",
)
# everything else stays the same
30s setup
· all providers
Read docs →
02 · API
ZERO CODE
Touch nothing. OpenAI Stored Completions. Paste a read-only admin key and we'll pull. No deploys, no SDK swap.
1
Enable Stored Completions platform.openai.com / settings
2
Create read-only admin key scope: completions:read
3
Paste it in rendfly first eval ≤ 5 min
2 min setup
· openai only
Read docs →
03 · SDK
FRAMEWORK
One import. For LangChain, LlamaIndex, or vendored SDKs you can't easily reroute. Patches the client at runtime.
app.py
from rendfly import patch_openai
import openai
patch_openai(api_key="sk-rendfly-…")
# use openai SDK normally
openai.chat.completions.create(...)
60s setup
· python · node
Read docs →
06 · the blind spot

Your dashboard says green.
Your customer says no.

Existing observability watches infrastructure — HTTP, latency, error rates. Production conversational quality is a different category. The reply is delivered, the system is healthy, and yet the answer is wrong.

infrastructure observability What your APM sees
HTTP status
200 OK ✓
p95 latency
142ms ✓
error rate
0.01% ✓
uptime
99.97% ✓
ALL SYSTEMS HEALTHY
rendfly · conversational quality
What we actually see
64 / 100 overall · same hour
rule_002 · constraint
17 violations ✗
drift
−12 pts / 6h ▲
FLAGGED SPAN · acme-pet-shop "…is R$ 32,90 a pack…"
DRIFT CAUGHT · ALERT FIRED
07 · pricing

Pick the contract that fits your shape.

USD. Self-serve up to Agency. No demo gate, no SDR call. Cancel anytime — your data exports as JSON, even on the free tier.

INDIE

For one builder shipping one bot.

$29 /mo
5,000 conversations / mo
1 project · 1 user
email + slack alerts
single-judge eval, 10% sample
30-day PII retention
AGENCY
DEFAULT

For one team shipping ten clients.

$199 /mo
50,000 conversations / mo
10 projects · 5 users
+ whatsapp + pagerduty + webhooks
multi-judge consensus, 100% sample
1-year retention · multi-tenant
ENTERPRISE

For platform teams with compliance load.

Custom
unlimited conversations
unlimited projects + users
+ sso + byok + self-host
custom judge + conversation replay
SOC 2 Type II · 4h SLA

Your system message is already the contract.

We just turn it into rules, watch every reply, and tell you when behavior drifts. Change one URL — first eval shows up in 60 seconds.

no demo gate · no SDR call · 5k convs/mo free forever