Your Rules.
Content Moderation
Enforced.

CriteriaBot blends classical ML with a multi-model LLM consensus engine to evaluate content against any criteria you define.

How It Works

01

Define Your Criteria

Write your rules in plain English. Group related criteria together for easy reuse across requests. Use the community library or start from scratch.

02

Submit Content

Post content to the API alongside the criteria you want to evaluate against. Use synchronous endpoints for real-time results, or async webhooks for high-volume batch jobs.

03

Consensus Evaluation

Each criterium is evaluated by the Arbiter - a panel of LLMs that vote and reach a weighted consensus. Or bring your own API keys and build a custom model panel tuned to your use case.

04

Structured Verdicts

Get a clean JSON verdict for each criterium - pass or fail, per rule. Wire results directly into your existing pipeline for automated approval, flagging, or remediation.

Under the hood

Meet the Arbiter

A panel of LLMs and trained ML models whose individual verdicts are combined into a single result via a dynamically learned weighted consensus. No single model decides — agreement has to be earned.

Multi-model LLM panel

Every piece of content is evaluated independently by a curated panel of LLMs. Each model returns its own verdict — no single model has final say.

Classical ML calibration

Trained ML models run alongside the LLM panel — calibrated on real verdict history to catch patterns and biases that prompt-based models miss.

Dynamically learned weights

Model weights are not fixed. For each prompt, the Arbiter computes per-model weights based on historical agreement with human verdicts on semantically similar content — so the models that have proven reliable on content like this count more.

It gets smarter as you use it

Every verdict correction you submit feeds back into the Arbiter's weighting. Models that agree with your standards earn more influence over time; those that drift lose it. The Community Arbiter improves for all Free and Starter users collectively — Pro and Enterprise users' corrections fine tune a private, dedicated model.

Simple, Transparent Pricing

Pay for what you use. Start free, scale as you grow. No hidden fees.

Free

$0 / month

Everything you need to get started.

  • 1,000 Arbiter verdicts / month - no keys required
  • Full access to a library of predefined criteria
  • 10 custom criteria
Get started free
Most popular

Starter

$40 / month

For teams running real workloads.

  • 10,000 Arbiter verdicts / month
  • Unlimited custom criteria
  • BYOK - use any supported LLM provider
Subscribe - $40 / mo

Pro

$200 / month

A dedicated model trained on your data.

  • 60,000 Arbiter verdicts / month
  • Dedicated LoRA fine-tuned on your verdicts
  • BYOK — unlimited custom criteria
Subscribe - $200 / mo

Credits

$10 one-time

Need more? Top up any time.

  • 2,000 Arbiter verdict credits
  • Stack on top of your plan
  • Never expire
Buy credits - $10

Need 500,000+ verdicts or priority fine-tuning? Talk to us about Enterprise.

Built for the Messiness of Real Content

Single-model classifiers break on edge cases. Keyword filters miss context. CriteriaBot is designed differently.

The Arbiter

A curated panel of LLMs evaluates each criterium independently and reaches a weighted consensus. No single model can skew the result - agreement is required.

Bring Your Own Models

Plug in your own OpenAI, Anthropic, or other API keys. Build custom consensus groups from any combination of models to match your accuracy, cost, and latency requirements.

Sync or Async

Use synchronous REST endpoints for real-time verdicts, or fire-and-forget with webhooks for high-volume batch processing.