# iGregulator — full documentation

Concatenated markdown of every guide page at https://igregulator.io/docs.
Source files in the order they appear in the sidebar.
Generated at build time; served verbatim at https://igregulator.io/llms-full.txt.

For the canonical machine-readable API schema, fetch the OpenAPI spec:
https://api.igregulator.io/openapi.json

---

# File: docs/index.mdx

---
title: Introduction
description: iGregulator API — iGaming licensing intelligence across UKGC, MGA, CGA, Kahnawake, Anjouan, and Tobique.
template: doc
sidebar:
  order: 1
---

:::caution[Legal notice]
Information provided by iGregulator is sourced from public regulator
records and is intended for **informational purposes only**. License
verification results do not constitute legal advice. Customers are
responsible for their own compliance, KYB, and AML decisions. Always
confirm critical licensing decisions directly with the issuing
regulator. Full terms at [/terms](https://igregulator.io/terms).
:::

iGregulator is a REST API for verifying **iGaming operator licences**
against the public registers of six regulators — UK Gambling Commission
(UKGC), Malta Gaming Authority (MGA), Curaçao Gaming Authority (CGA,
post-LOK), Kahnawake Gaming Commission (KGC), Anjouan Gaming Authority
(AGA), and Tobique Gaming Commission (TGC). Daily-refreshed; updated
within 24 hours of regulator changes.

> **Building with AI?** → [/docs/for-ai-agents](/docs/for-ai-agents/)
> covers MCP, structured errors, `_meta` provenance, and
> machine-readable resources (llms.txt, OpenAPI) for LLM integrations.

## What you can do with it

- **Verify a domain** — hit `GET /v1/check?domain=X` and get an operator +
  licence + confidence score in one round-trip.
- **Look up an operator** — search by name or trading name via
  `GET /v1/operators/search?q=…` and drill into licences, domain portfolios,
  and regulatory actions.
- **Pull a whole jurisdiction** — paginate
  `GET /v1/jurisdictions/:code/operators` for a clean dataset of everyone
  currently licensed.
- **Track regulatory actions** — enforcement decisions (fines, warnings,
  licence revocations) surface via the operator detail endpoint.

## Data coverage (live as of April 2026)

| Jurisdiction | Licences | Source | Cadence |
| --- | --- | --- | --- |
| UKGC | ~3,460 | Public register ZIP | Daily 03:00 UTC |
| MGA | ~310 | Playwright-scraped SPA | Daily 03:15 UTC |
| CGA | ~650 | OGL PDF parse | Daily 03:30 UTC |
| KGC | ~60 | Interactive Gaming + CSPA HTML | Daily 03:45 UTC |
| AGA | ~1,275 | Embedded JSON on register page | Daily 04:00 UTC |
| TGC | ~160 | Static HTML table (via CF Worker proxy) | Daily 04:15 UTC |

## Who it's for

Four buyer profiles drive the product today:

- **Affiliate sites** — verify that a brand they're promoting is still
  licensed before writing reviews and paying out referral payments.
- **Compliance + AML teams** — weekly sweeps of their operator
  counterparties for status changes and enforcement actions.
- **Payment providers** — merchant onboarding checks + ongoing KYB.
- **Investment intelligence** — correlate licence churn, enforcement fines,
  and domain-expiry signals into early-warning scores.

## Start here

1. [Getting started](/docs/getting-started/) — first curl call in under a minute.
2. [Authentication](/docs/authentication/) — when you graduate from the public 10 req/hr limit.
3. [Confidence scoring](/docs/confidence/) — how the `/v1/check` endpoint picks between `high`, `medium`, and `low`.
4. [API playground](/docs/playground/) — try any endpoint interactively.
5. [Full endpoint reference](/docs/api/) — detailed schemas for every route.

---

# File: docs/getting-started.mdx

---
title: Getting started
description: First curl call in under a minute. No API key required.
template: doc
sidebar:
  order: 2
---

The fastest path to a working integration. You won't need an API key for
this walk-through — the `/v1/check` endpoint is public at 10 requests per
IP per hour.

## 1. Send your first request

```bash
curl https://api.igregulator.io/v1/check?domain=bet365.com
```

Response:

```json
{
  "query": { "domain": "bet365.com" },
  "match": {
    "confidence": "high",
    "match_type": "domain_exact",
    "operator": "Hillside (UK Sports) ENC",
    "operator_slug": "hillside-uk-sports-enc",
    "jurisdiction": "UKGC",
    "license_number": "055148-R-331498-001",
    "status": "active",
    "expires_at": null,
    "domain_association": "direct"
  },
  "alternatives": [],
  "confidence": "high"
}
```

That's it — no signup, no key. The `match` object is the answer;
`alternatives[]` populates when we're not 100% sure. See
[confidence scoring](/docs/confidence/) for the semantics.

## 2. Verify by licence number

Compliance teams often receive a licence number from a regulator and
need the reverse lookup — who holds it and what's its status? Same
endpoint, different query param:

```bash
curl "https://api.igregulator.io/v1/check?license_number=055148-R-331498-001"
```

Returns the same `{ query, match, alternatives, confidence }` shape;
`confidence: high` when the licence number exists in the register,
`none` otherwise. Pass `?domain=` **or** `?license_number=`, not
both.

## 3. Try a fuzzy match

```bash
curl https://api.igregulator.io/v1/check?domain=paddypower.com
```

This domain isn't in our authoritative registry, but the trading-name
fuzzy fallback finds it:

```json
{
  "query": { "domain": "paddypower.com" },
  "match": {
    "confidence": "medium",
    "match_type": "trading_name_fuzzy",
    "operator": "Power Leisure Bookmakers Limited",
    "license_number": "001034-R-315831-012"
  },
  "alternatives": [
    { "operator": "PPB Counterparty Services Limited", "similarity": 1 },
    { "operator": "PPB Entertainment Limited", "similarity": 1 },
    { "operator": "PPB GE Limited", "similarity": 1 }
  ]
}
```

Because `paddypower.com` trigram-matches four Flutter group entities at
similarity `1.0`, primary selection falls through a documented
tiebreaker cascade — see
[confidence scoring → Tiebreaking](/docs/confidence/#tiebreaking-for-equal-similarity).

## 4. Graduate to authenticated requests

When you hit the 10-per-hour ceiling, or you need:

- Higher volume (10k / 100k / unlimited depending on tier)
- The authenticated endpoints: `/v1/operators/:slug`, `/v1/licenses/*`
- Full search results (unauthenticated search caps at 3 rows)

[Create a free account](https://app.igregulator.io/signup) — founding
members get the full Starter plan free, no card. Generate a key at
[app.igregulator.io/api-keys](https://app.igregulator.io/api-keys) and
attach it with a Bearer header:

```bash
curl -H "Authorization: Bearer YOUR_KEY" \
  https://api.igregulator.io/v1/operators/search?q=paddy
```

## 5. Explore interactively

Paste any endpoint into the **[API playground](/docs/playground/)** on this
site — it's a Scalar-powered try-it-out that runs against the live
production API. For authenticated endpoints, paste your key into the
Authorize dialog and execute without leaving the page.

## Stability guarantees

All `/v1/*` endpoints are **maintained indefinitely**. When `/v2/*` lands,
both versions will run in parallel for a minimum of 12 months. Individual
fields inside v1 get at least **90 days notice** before removal, surfaced
via `Deprecation: true` + `Sunset` response headers (RFC 9745). Full
policy in the [changelog](/docs/changelog/).

## Next

- [Authentication](/docs/authentication/) — how to create + rotate keys.
- [Rate limits](/docs/rate-limits/) — quotas, headers, 429 handling.
- [Code examples](/docs/code-examples/) — JS/Python snippets.

---

# File: docs/authentication.mdx

---
title: Authentication
description: Bearer tokens — how to create, use, rotate, and revoke API keys.
template: doc
sidebar:
  order: 3
---

import { Tabs, TabItem, Aside } from '@astrojs/starlight/components';

iGregulator uses **Bearer tokens**: a single opaque API key sent in the
`Authorization` header. No OAuth, no JWTs, no per-request signatures.
Every authenticated endpoint on `api.igregulator.io` uses the same
scheme.

## 1. Overview

| | Public | Authenticated |
| --- | --- | --- |
| Needs a key | — | ✓ |
| Rate limit | 10 req / IP / hour | per-plan quota (Starter 10k/mo, Pro 100k/mo, Business fair-use) |
| Example endpoints | `/v1/check`, `/v1/jurisdictions`, `/v1/operators/search` | `/v1/operators/:slug`, `/v1/licenses/:id`, `/v1/jurisdictions/:code` |
| Who it's for | quick lookup, demo, embed in a landing page | production integrations, bulk jobs, compliance sweeps |

See [/pricing](https://igregulator.io/pricing) for the full plan
comparison. Signup is open and free for founding members (full Starter
plan); create an account at
[app.igregulator.io/signup](https://app.igregulator.io/signup). Paid
self-serve billing lands with Phase 2.

## 2. Generating keys

1. Sign in at [app.igregulator.io](https://app.igregulator.io/login).
2. Go to **[API keys](https://app.igregulator.io/api-keys)** in the nav.
3. Click **+ generate new key**. Give it a descriptive label
   ("Production server", "Local dev", "Staging job").
4. The raw key is displayed **once** — copy it now. We store a SHA-256
   hash only and cannot recover the plaintext. Lose it → rotate.

Key format: `igk_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX` (36 chars total).

The `igk_` prefix enables GitHub secret-scanning detection, so if you
accidentally commit a key it gets flagged before a bot finds it.

## 3. Using keys

<Tabs>
<TabItem label="curl">

```bash
curl -H "Authorization: Bearer igk_yourkeyhere" \
  https://api.igregulator.io/v1/operators/paddy-power-holdings-limited
```

</TabItem>
<TabItem label="JavaScript">

```js
const res = await fetch(
  'https://api.igregulator.io/v1/operators/paddy-power-holdings-limited',
  { headers: { Authorization: `Bearer ${process.env.IGREGULATOR_API_KEY}` } },
);
if (!res.ok) throw new Error(`${res.status} ${await res.text()}`);
const operator = await res.json();
```

</TabItem>
<TabItem label="Python">

```python
import os, requests

r = requests.get(
    'https://api.igregulator.io/v1/operators/paddy-power-holdings-limited',
    headers={'Authorization': f'Bearer {os.environ["IGREGULATOR_API_KEY"]}'},
    timeout=10,
)
r.raise_for_status()
operator = r.json()
```

</TabItem>
</Tabs>

Public endpoints work without a key too, but attaching one **skips the
10/hour IP cap** and uses your plan quota instead — useful when serving
dashboards that can burst past the public ceiling.

## 4. Key rotation

Rotation is overlap-based — create the new key, deploy it, then revoke
the old one. No grace period is needed at our end; old keys stay
valid until you explicitly revoke them.

1. Generate a new key. Label with the rotation reason.
2. Deploy the new key to every consumer (CI variables, running services,
   teammates' `.env` files).
3. Verify traffic shifted — check the *Last used* column on the
   [API keys](https://app.igregulator.io/api-keys) page; the old key
   should show no recent usage.
4. Click **Revoke** on the old key in the dashboard. Confirmation is
   required. Revocation is immediate — the next request carrying the
   old key returns `401 auth_revoked`.

<Aside type="caution" title="If a key is compromised">
Don't wait to rotate. Revoke immediately, generate a replacement, then
investigate the leak — the damage window closes at revocation, not at
replacement.
</Aside>

## 5. Security

- **Never commit keys to version control.** We don't scan public repos
  for you; a leaked key is your risk. GitHub's secret-scanning may flag
  `igk_` strings, but don't rely on it as a safety net.
- **Use environment variables.** `process.env.IGREGULATOR_API_KEY` in
  Node, `os.environ['IGREGULATOR_API_KEY']` in Python, Docker secrets
  in containerised deploys.
- **One key per client.** Separate keys per environment (prod / staging /
  dev / CI) make revocation surgical — you kill the leaked instance
  without affecting every consumer.
- **Keys are stored hashed.** SHA-256, never plaintext. If the DB is
  ever read out, the keys themselves don't leak — only their prefixes
  (displayed in the UI anyway).
- **HTTPS only.** The API doesn't listen on port 80; HTTP would leak
  the key in plain text.
- **Report compromises** to founder@igregulator.io. We'll help
  triage and can check for anomalous usage patterns on our side.

## 6. Rate limits and quotas

Two independent ceilings enforced on every authenticated request:

- **Per-second rate limit** — sustains your plan's burst ceiling
  (Starter 5/s, Pro 20/s, Business 100/s, Enterprise unlimited).
  Breach → `429 rate_limited`.
- **Monthly request quota** — plan quota per calendar month, UTC
  reset at the first of the month. Breach → `429 quota_exceeded`.

Every authenticated response carries headers you can read to stay
ahead of either ceiling:

| Header | Meaning |
| --- | --- |
| `X-Monthly-Quota-Limit` | Your plan's monthly ceiling, or `unlimited` |
| `X-Monthly-Quota-Used` | Count so far this month (omitted when `unlimited`) |
| `X-Monthly-Quota-Remaining` | Quota minus used (omitted when `unlimited`) |
| `X-Monthly-Quota-Reset` | ISO-8601 timestamp when the counter rolls over |
| `X-Monthly-Quota-Warning` | Present when usage ≥ 80%: `80% of monthly limit used` |
| `X-RateLimit-Limit` | Per-second ceiling for the current plan |
| `X-RateLimit-Policy` | Human-readable: `tier=starter;limit=5;window=second` |
| `RateLimit-Policy` | IETF draft format: `"default";q=5;w=1` |

See the [rate limits guide](/docs/rate-limits/) for parser examples and
the full tier table.

## 7. Errors

Authenticated endpoints return structured JSON on every non-2xx. Branch
on `code` for behaviour, `details.reason` for refinement, and use
`details.suggestion` verbatim in user-facing messaging when present.

| Status | code | When |
| --- | --- | --- |
| 401 | `auth_required` | No `Authorization` header. |
| 401 | `auth_invalid` | Header malformed or key not recognised. |
| 401 | `auth_revoked` | Key was revoked via the dashboard. |
| 402 | `payment_required` | Your plan is `null` or `canceled`. |
| 429 | `rate_limited` | Per-second ceiling breached. Sleep, retry once. |
| 429 | `quota_exceeded` | Monthly quota exhausted. Wait for `reset_at` or upgrade. |

Full code reference lives in the [error handling guide](/docs/errors/).

Example body (401):

```json
{
  "error": "API key has been revoked",
  "code": "auth_revoked",
  "details": {
    "reason": "api_key_revoked",
    "suggestion": "Generate a new API key at https://app.igregulator.io/api-keys. Revoked keys cannot be restored."
  }
}
```

---

# File: docs/rate-limits.mdx

---
title: Rate limits
description: Public per-IP caps, authenticated per-plan quotas, and 429 handling.
template: doc
sidebar:
  order: 4
---

Two independent ceilings: **public per-IP** (no key) and **authenticated
per-plan** (monthly quota). Authenticated requests skip the IP ceiling
entirely.

## Public endpoints (no key)

| Endpoint | Limit |
| --- | --- |
| `GET /v1/check` | 10 req / IP / hour |
| `GET /v1/jurisdictions` | 10 req / IP / hour |
| `GET /v1/operators/search` | 10 req / IP / hour (+ 3-row cap on `limit`) |

The window is clock-aligned: the counter resets at the top of each UTC
hour. A caller that burns 10 requests at 14:58 waits two minutes, not
sixty.

## Authenticated endpoints

| Plan | Monthly quota | Burst |
| --- | --- | --- |
| Starter | 10,000 calls | 5 req/sec |
| Pro | 100,000 calls | 20 req/sec |
| Business | Fair use | No fixed cap |
| Enterprise | Custom | Negotiated |

**Per-plan enforcement lands with Phase 2 billing.** Plan-aware
monthly quotas and burst ceilings activate once Stripe goes live.
Until then a lightweight **pre-launch daily cap** of 10,000
requests/day per API key acts as a safety net — see below.

## Pre-launch limitations

While Stripe integration is pending, authenticated keys on any plan
tier except **business** / **enterprise** carry an additional soft
cap of **10,000 requests/day per key**, counted per UTC day and reset
at `00:00:00Z`. Rationale: a key shared with a reviewer or journalist
shouldn't be able to silently exhaust our Cloudflare bandwidth budget
before billing lands. The 10k/day ceiling is deliberately picked to
exceed the Pro plan average (~3,333 req/day over a 30-day month at
100k/mo), so a realistic Pro customer never hits it.

Hit the cap → `429 rate_limited`:

```json
{
  "error": "Pre-launch rate limit exceeded",
  "code": "rate_limited",
  "details": {
    "reason": "prelaunch_daily_cap",
    "current_usage": 10000,
    "limit": 10000,
    "reset_at": "2026-04-21T00:00:00.000Z",
    "suggestion": "Pre-launch keys are capped at 10000 requests/day. Upgrade to a paid plan at https://igregulator.io/pricing for production use."
  }
}
```

Response carries `X-Prelaunch-Daily-Limit`, `-Used`, `-Reset` headers
so clients can monitor consumption. Cap lifts automatically when the
plan enforcement layer takes over; no client change needed.

## Response headers

Every public-endpoint response includes:

| Header | Meaning |
| --- | --- |
| `X-RateLimit-Limit` | Ceiling for this caller on this endpoint in the current window (e.g. `10`). |
| `X-RateLimit-Remaining` | Requests left. Never negative. |
| `X-RateLimit-Reset` | Unix epoch seconds when the window rolls over. |
| `X-RateLimit-Policy` | Human-friendly policy string: `tier=public;limit=10;window=hour`. Hand-readable, easy to `awk`. |
| `RateLimit-Policy` | IETF draft format ([draft-ietf-httpapi-ratelimit-headers-09](https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/)): `"default";q=10;w=3600`. Modern HTTP clients (Cloudflare SDK, Kong, etc.) auto-parse this. |
| `X-Upgrade-URL` | `https://igregulator.io/pricing` — surfaced so a UI can link "upgrade to keep going" on 429. |

### Parsing the policy headers

```js
// X-RateLimit-Policy — custom: tier=public;limit=10;window=hour
const policyCustom = Object.fromEntries(
  res.headers.get('X-RateLimit-Policy').split(';').map((kv) => kv.split('=')),
);
// { tier: 'public', limit: '10', window: 'hour' }

// RateLimit-Policy — IETF: "default";q=10;w=3600
const policyIetf = res.headers.get('RateLimit-Policy');
const q = policyIetf.match(/q=(\d+)/)?.[1];  // quota
const w = policyIetf.match(/w=(\d+)/)?.[1];  // window in seconds
```

## Handling 429

```http
HTTP/2 429
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1776604800
X-Upgrade-URL: https://igregulator.io/pricing

{
  "error": "Public rate limit reached (10/hour/IP).",
  "code": "rate_limited",
  "details": {
    "limit": 10,
    "window_seconds": 3600,
    "upgrade_url": "https://igregulator.io/pricing"
  }
}
```

Recommended client behaviour:

1. Check `X-RateLimit-Remaining` before every call.
2. On 429, sleep until `X-RateLimit-Reset`, then retry once.
3. If the same caller keeps hitting 429, that's a signal to authenticate or upgrade — not to back-off-and-retry indefinitely.

## Tips

- **Don't scrape the public endpoint.** Paginate the authenticated `/v1/jurisdictions/:code/operators` list once a day and cache — the data only refreshes at 03:00 UTC anyway.
- **Front-ends that surface check results to end-users** — apply the 10/hour IP limit on *your* server and call the API with a single authenticated key; don't let every browser session hit us directly or the shared IP will burn out your quota.
- **Bulk re-verification** (weekly AML sweep of 2,000 operators) — use the authenticated operator/licence endpoints, not `/v1/check`.

---

# File: docs/endpoints.mdx

---
title: Endpoints
description: Map of every public and authenticated endpoint.
template: doc
sidebar:
  order: 5
---

Every endpoint has full schemas in the [API reference](/docs/api/) and an
interactive executor in the [playground](/docs/playground/). This page
is a map, not a reference — use it to pick which surface you need.

## Public (no key)

| Endpoint | Use |
| --- | --- |
| `GET /v1/check` | Verify a domain or licence number in one round trip. Confidence-scored. See the [confidence guide](/docs/confidence/). |
| `GET /v1/jurisdictions` | List the six jurisdictions we cover, with name / country / currency / licence types. |
| `GET /v1/operators/search?q=…` | Type-ahead search by operator display name or trading name. Unauthenticated: capped at 3 rows. |

## Authenticated (Bearer token)

| Endpoint | Use |
| --- | --- |
| `GET /v1/jurisdictions/:code` | Single jurisdiction detail. |
| `GET /v1/jurisdictions/:code/operators` | Paginated operators under that jurisdiction. Sort = display_name asc. |
| `GET /v1/operators/:slug` | Full operator detail — metadata + licences + domains. |
| `GET /v1/operators/:slug/licenses` | Licences for one operator. Append `?include_history=true` for the status-change log. |
| `GET /v1/licenses/:license_id` | Licence by uuid. Useful when you want a pinned detail page. |
| `GET /v1/licenses/:license_id/history` | Status-change timeline for a single licence. |

## System

| Endpoint | Use |
| --- | --- |
| `GET /v1/health` | Liveness probe. 200 if postgres + redis are reachable. |
| `GET /v1/health/coverage` | Per-jurisdiction scraper freshness — last successful scrape, age in hours, fresh vs stale flag per our SLA (UKGC 24 h, MGA / CW / KH 48 h). Public at 10 req / IP / hour. |
| `GET /openapi.json` | Canonical OpenAPI 3.1 spec. Consume it, generate a client, etc. |

## Response shape conventions

- **Top-level list endpoints** return an envelope:
  `{ q, total, limit, offset, <rows>, _meta }`. These support
  pagination via `?limit=` + `?offset=`. Example:
  `GET /v1/jurisdictions/:code/operators`,
  `GET /v1/operators/search`,
  `GET /v1/operators/:slug/licenses`,
  `GET /v1/licenses/:id/history`.
- **Single-row endpoints** return the row directly — no wrapping
  envelope.
- **Nested arrays on detail endpoints are bare arrays, not
  paginated.** `GET /v1/operators/:slug` returns a full operator
  with its `licenses[]` + `domains[]` inline, without limit /
  offset. Need pagination over an operator's licences? Use
  `GET /v1/operators/:slug/licenses` instead — the paginated form
  ships a proper envelope.
- Timestamps are ISO-8601 UTC (`2026-04-19T12:00:00Z`).
- Dates are `YYYY-MM-DD`.
- UUIDs are lowercase, hyphenated, v4.

---

# File: docs/confidence.mdx

---
title: Confidence scoring
description: How /v1/check picks between high / medium / low, and when it refuses.
template: doc
sidebar:
  order: 6
---

The `/v1/check` endpoint returns a `match` object with a `confidence`
field. This page explains what each level means, how we pick it, and
how UIs should render it.

## The three levels

| `confidence` | What it means | Render as |
| --- | --- | --- |
| `high` | Exact or root-domain match in our authoritative registry. | Green check. Safe to say "this site is licensed by X". |
| `medium` | Domain root matched a trading name or operator name. We can identify the operator but can't *prove* this domain is theirs. | Amber / neutral. "Likely operated by X" phrasing. |
| `low` | A weak fuzzy match (operator returned, but below the strong-similarity bar), **or** the domain root is a generic gambling term (`casino.com`, `poker.com`) where too many operators share the label to pick one. | Gray / warning. "We can't confirm this domain." |

A fourth value — `none` — appears in the top-level `confidence` field
(not `match.confidence`) when `match` is `null`.

## Why a query missed: `match_absence_reason`

When `match` is `null`, `low`/`none` on its own is ambiguous — it used to
conflate "generic term" with "we checked and it isn't there". So on a miss
the response carries two extra fields (present **only** when `match` is
`null`) so you can phrase the answer precisely instead of guessing:

| `match_absence_reason` | Meaning | Say to your user |
| --- | --- | --- |
| `generic_term` | The label is an ultra-generic gambling word (`casino.com`); we can't map it to one operator. | "Can't identify a specific operator from this domain." |
| `no_record_found` | A specific query we checked against **every** covered register and did not find. | "Not found in any of the N jurisdictions iGregulator covers." — **never** an unqualified "unlicensed". |

`checked_jurisdictions` accompanies the miss — the exact register codes we
checked (e.g. `["AN","CW","KH","MGA","TGC","UKGC"]`) — so a "not licensed"
claim is always scoped to our coverage, never stated as an absolute.

```json
{
  "query": { "domain": "some-unknown-site.com" },
  "match": null,
  "confidence": "none",
  "match_absence_reason": "no_record_found",
  "checked_jurisdictions": ["AN", "CW", "KH", "MGA", "TGC", "UKGC"]
}
```

These fields are **absent** when there's a match — check `match === null`
first, then branch on `match_absence_reason`.

## match_type

Tells you *how* we arrived at the match, useful for debugging and UX
differentiation.

| `match_type` | Source |
| --- | --- |
| `domain_exact` | Found the domain in the `domains` table — sourced from the regulator's official register. Carries `domain_association` (`direct` or `white_label`). |
| `trading_name_fuzzy` | Trigram similarity ≥ 0.55 against `operators.trading_names[]` after stripping the TLD. Used when the domain isn't registered but the brand exists. 0.55 was picked empirically against the UKGC register: it catches legitimate variants (`paddypower` ↔ `paddy-power`, `skybet` ↔ `sky-bet`) while rejecting the long tail of single-syllable collisions (`gold`, `star`, `royal`) where the label is too generic to mean one operator. Below 0.55 we land in `low`-confidence territory either way; above it the trigger is stable. |
| `name_similarity` | Last-chance similarity against `operators.display_name` — rarely fires for B2C domains, useful when no trading name was populated upstream. |

## domain_association

When `match_type = domain_exact`, we differentiate:

- **`direct`** — the licensee runs the site themselves. The `operator` field is the company your end-user is gambling with.
- **`white_label`** — the licensee has authorised a third-party brand to trade on the domain under their permit. The `operator` field is the *licensee*, not the brand. UK-licensed white-label arrangements are legal and common; surfacing the relationship lets you show "operated by Brand X under ProgressPlay's UKGC permit".

Fuzzy matches (`trading_name_fuzzy`, `name_similarity`) don't populate
`domain_association` — we don't have a domain row to read it from, so
the field is `null`.

## Tiebreaking for equal similarity

Brand names like *Paddy Power* trigram-match several sister companies
(PPB Counterparty, PPB Entertainment, PPB GE, Power Leisure
Bookmakers) at similarity `1.0`. To keep the primary match **stable
across DB reindex and VACUUM**, `/v1/check` applies a documented
tiebreaker cascade whenever the top candidates are tied on similarity:

1. **similarity DESC** — closeness wins first, as ever.
2. **has_active DESC** — operators with at least one `active` licence
   are preferred over operators whose licences are all expired /
   revoked.
3. **oldest_active_issued ASC** — among active-licence candidates, the
   one whose *oldest* active licence issued first wins. Stability
   signal: a parent entity that has been licensed longest is the most
   useful "who actually runs this brand" answer.
4. **total_licenses DESC** — more licences across the register → more
   likely a parent entity rather than a single-purpose subsidiary.
5. **operator_slug ASC** — lexicographic final fallback. Always
   deterministic even when every previous rank is tied.

Clients that cache domain → operator mappings can rely on the primary
result remaining stable between index rebuilds; any change in primary
reflects a change in the underlying registry data, not PG query
randomness.

## alternatives[]

Up to 3 runner-up candidates, sorted by similarity descending.

- On `confidence: medium`, these are operators with the same or similar trading name that we ranked below the primary match.
- On `confidence: low` (generic label), this is always `[]` — we refuse to guess when the label is ambiguous.
- On `confidence: high` / `none`, also `[]`.

## Why the generic-label filter exists

Without it, `GET /v1/check?domain=casino.com` would return "Casino MK
Limited" with `confidence: medium` — deterministically, because "casino"
matches that trading name at similarity 1.0. But the actual casino.com
is licensed elsewhere (MGA, Gibraltar) which we don't cover, and the
answer "yes, Casino MK owns it" would be wrong.

The blocklist is a substring regex over the normalised label —
`casino`, `poker`, `bingo`, `gambling`, `bet`, `slot`/`slots`,
`sportsbook`, `roulette`, `blackjack`, `wager`, `lottery`,
`gaming`. Matches anywhere in the label, so `casino.com`,
`bestcasino.com`, and `casino-bonus.com` all return `confidence:
low` with empty `alternatives[]`. Licensed brands containing one
of these keywords (`bet365.com`, `pokerstars.com`) still resolve
to `high` because the domain-exact match runs before the generic
gate.

---

# File: docs/batch.mdx

---
title: Batch domain check
description: Verify many domains in one request — a KYB sweep or affiliate-list audit without N sequential calls.
template: doc
sidebar:
  order: 8
---

`POST /v1/check/batch` resolves up to **100 domains in one request**, so a
KYB sweep or affiliate-list audit is one round trip instead of N. 200
merchants = 2 calls, not 200.

Authenticated (the single `GET /v1/check` stays keyless); domains only.

## Request

```bash
curl -s -X POST https://api.igregulator.io/v1/check/batch \
  -H "Authorization: Bearer $IGREGULATOR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"domains":["bet365.com","www.virginbet.com","casino.com"]}'
```

## Response

`checked_jurisdictions` is returned **once** at the top (it's the same for
every row — keeps the payload lean). Each result mirrors the single-check
shape: `match`, `confidence`, and `match_absence_reason` on a miss.

```json
{
  "count": 3,
  "checked_jurisdictions": ["AN", "CW", "KH", "MGA", "TGC", "UKGC"],
  "results": [
    { "query": { "domain": "bet365.com" }, "match": { "operator": "Hillside (UK Sports) ENC", "...": "…" }, "confidence": "high" },
    { "query": { "domain": "www.virginbet.com" }, "match": { "operator": "Virgin Bet Limited", "domain_association": "white_label", "...": "…" }, "confidence": "high" },
    { "query": { "domain": "casino.com" }, "match": null, "confidence": "low", "match_absence_reason": "generic_term" }
  ]
}
```

## Partial success

A malformed hostname doesn't fail the batch — that row comes back with an
`error` and `match: null`, and every other domain still resolves:

```json
{ "query": { "domain": "not a domain" }, "match": null, "confidence": "none", "error": "invalid_hostname" }
```

## Limits & semantics

- **Max 100 domains** per request; paginate beyond.
- Domains are resolved with bounded concurrency server-side — order of
  `results` follows the order you sent.
- Counts as **one request** against your plan quota today.
- Each result uses the same matching as `GET /v1/check` (exact host →
  eTLD+1 → fuzzy), so `www`/apex variants resolve identically.

## Clients

```python
import requests

r = requests.post(
    "https://api.igregulator.io/v1/check/batch",
    headers={"Authorization": f"Bearer {KEY}"},
    json={"domains": domains[:100]},
)
for row in r.json()["results"]:
    m = row["match"]
    if m and row["confidence"] in ("high", "medium"):
        verdict = f"{m['operator']} ({m['status']})"
    elif row.get("error"):
        verdict = f"invalid: {row['error']}"
    else:
        verdict = f"no match ({row.get('match_absence_reason')})"
    print(row["query"]["domain"], "→", verdict)
```

```javascript
const res = await fetch('https://api.igregulator.io/v1/check/batch', {
  method: 'POST',
  headers: { Authorization: `Bearer ${KEY}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({ domains: domains.slice(0, 100) }),
});
const { results } = await res.json();
for (const row of results) {
  if (row.match && ['high', 'medium'].includes(row.confidence)) {
    console.log(row.query.domain, '→', row.match.operator, row.match.status);
  } else {
    console.log(row.query.domain, '→', row.error ?? `no match (${row.match_absence_reason})`);
  }
}
```

---

# File: docs/point-in-time.mdx

---
title: Point-in-time lookups (as_of)
description: Reconstruct a licence's status as of a past date — strictly within iGregulator's observation window.
template: doc
sidebar:
  order: 9
---

iGregulator keeps the **transition history** of every licence, so you can ask
"what was this operator's status on date X" — the question a compliance
review asks constantly ("was this merchant licensed at the time of the
transaction three months ago?") and the one no incumbent can answer, because
none keeps the history.

Pass `?as_of=` to `/v1/check`, `/v1/licenses/{id}`, or `/v1/operators/{slug}`.

## The one rule that matters

**`as_of` answers only within our observation window. We never extrapolate a
status before `tracking_since` — the moment we first recorded the licence.**

Our history begins when our scraper first saw a record (`change_type:
"created"`). We do not know what was true before that, so we never guess.
Asking about a date before `tracking_since` returns `knowledge:
"before_tracking"` with a **null** status — not a fabricated "active". A tool
that invented pre-observation history would force you to assert a historical
fact the data never witnessed; that is worse than not having the feature.

> **iGregulator answers "as of date X" only within its observation window —
> it tells you when it started watching, and never invents a status it didn't
> observe.**

## The three states of knowledge

| `knowledge` | When | `status_as_of` |
| --- | --- | --- |
| `observed` | The date is within our window (≥ `tracking_since`). | The real status then. |
| `before_tracking` | The date predates when we started watching. | `null` — unknowable, **not** a guess. `tracking_since` tells you the lower bound. |
| `no_such_license` | We have no history for this licence at all. | `null`. |
| `no_license_resolved` | (`/v1/check` only) A fuzzy match with no specific licence to time-travel. | `null`. |

The `as_of` object also returns `established_by` — the exact history
transition in effect on your date (`changed_at`, `new_status`, `change_type`,
`source_url`) — so you can see *when* that status was last confirmed relative
to your query.

## Date semantics

- A bare `YYYY-MM-DD` is interpreted as **end of that day, UTC** (status at
  close of day).
- A full ISO-8601 datetime is honoured as given.
- A date **in the future** returns `400` — we never answer about a date we
  haven't observed. It is never silently clamped to "now".

## Examples

`before_tracking` — asking before we started watching:

```json
// GET /v1/licenses/140a822c-…?as_of=2026-01-01
{
  "as_of": "2026-01-01T23:59:59.999Z",
  "knowledge": "before_tracking",
  "status_as_of": null,
  "established_by": null,
  "tracking_since": "2026-04-17T15:15:40.055Z"
}
```

`observed` — a date after a revocation transition:

```json
// GET /v1/licenses/140a822c-…?as_of=2026-05-20
{
  "as_of": "2026-05-20T23:59:59.999Z",
  "knowledge": "observed",
  "status_as_of": "revoked",
  "established_by": {
    "changed_at": "2026-05-13T01:00:05.215Z",
    "new_status": "revoked",
    "change_type": "status_change",
    "source_url": "https://www.gamblingcommission.gov.uk/downloads/business-licence-data.zip"
  },
  "tracking_since": "2026-04-17T15:15:40.055Z"
}
```

## Endpoint notes

- **`/v1/licenses/{id}?as_of=`** — cleanest: one licence, one `as_of` object.
- **`/v1/operators/{slug}?as_of=`** — resolved **per licence** (each licence
  in the array gets its own `as_of`); we don't collapse a multi-jurisdiction
  operator into a single status — you aggregate as your policy requires.
- **`/v1/check?domain=X&as_of=`** — the domain→operator attribution is taken
  as **current**; only the licence **status** is time-travelled. Historical
  domain attribution (we keep `first_seen` on domains) is a future addition.

---

# File: docs/pagination.mdx

---
title: Pagination
description: limit and offset on list endpoints.
template: doc
sidebar:
  order: 7
---

List endpoints accept `limit` and `offset` query parameters and return a
`total` in the envelope so callers know when they've walked to the end.

## Endpoints that paginate

- `GET /v1/operators/search?q=…`
- `GET /v1/jurisdictions/:code/operators`
- `GET /v1/operators/:slug/licenses`

## Parameters

| Param | Default | Max | Notes |
| --- | --- | --- | --- |
| `limit` | 50 | 200 (authenticated) / 3 (unauth on `/operators/search`) | Negative values rejected with 400. |
| `offset` | 0 | — | Zero-based. High offsets have linear scan cost; prefer stable cursor if you're walking 10k+ rows. |

## Response envelope

```json
{
  "q": "...",
  "total": 1420,
  "limit": 50,
  "offset": 100,
  "operators": [ ]
}
```

- `total` — rows matching the query, ignoring limit/offset.
- `operators[].length <= limit`.

## Walking a result set

```bash
# Bash loop — fetch all operators for UKGC.
offset=0
while :; do
  resp=$(curl -sH "Authorization: Bearer $KEY" \
    "https://api.igregulator.io/v1/jurisdictions/UKGC/operators?limit=200&offset=$offset")
  rows=$(echo "$resp" | jq '.operators | length')
  [ "$rows" -eq 0 ] && break
  echo "$resp" | jq '.operators[]'
  offset=$((offset + rows))
done
```

## Why not cursor-based?

Offset pagination is simpler to document, easier for UIs that render
page numbers, and cheap for our table sizes (~3,700 operators, ~4,500
licences). When any list crosses 100k rows we'll add a `cursor` query
param alongside — offset stays supported for back-compat.

## Rate-limit interplay

Each paginated request is one API call against your quota. A full
sweep of 3,700 UKGC operators at `limit=200` is 19 calls — well
within Starter's 10k monthly quota, trivial within Pro's 100k.

---

# File: docs/errors.mdx

---
title: Error handling
description: Error code reference, retry strategy, deprecation warnings.
template: doc
sidebar:
  order: 8
---

Errors always return JSON with a stable `code` field. Use `code` for
branching in clients, not the HTTP status or the human message (the
human message can change).

## Response shape

```json
{
  "error": "Human-readable explanation.",
  "code": "rate_limited",
  "details": {
    "limit": 10,
    "window_seconds": 3600
  }
}
```

## Code reference

| HTTP | code | When | Retry? |
| --- | --- | --- | --- |
| 400 | `invalid_query` | Missing / malformed query params. | No — fix the request. |
| 400 | `invalid_slug` | Slug path param failed validation. | No. |
| 401 | `auth_required` | No `Authorization` header on a gated endpoint. | No — attach the header. |
| 401 | `auth_invalid` | Header malformed or key not recognised. | No. |
| 401 | `auth_revoked` | Key has been revoked. | No — provision a new key. |
| 404 | `not_found` | Slug / id not in the registry. | No. |
| 429 | `rate_limited` | Hit the public 10/hr or per-plan ceiling. | Yes — wait until `X-RateLimit-Reset`. |
| 500 | `server_error` | Unhandled upstream failure. | Yes — exponential backoff, 3 attempts. |

## Retry strategy

- **4xx** — fix the request, don't retry. The same bad input will always 4xx.
- **429** — sleep until `X-RateLimit-Reset` (Unix epoch seconds), retry once. If you hit 429 again, you're under-provisioned — upgrade or authenticate, don't loop.
- **5xx** — exponential backoff up to 3 attempts (1s, 2s, 4s). If a 5xx persists past 4 seconds, you're better off surfacing a failure state than holding the UI hostage.

## Reference implementation (JavaScript)

```js
async function igRequest(path, init = {}, attempt = 0) {
  const r = await fetch('https://api.igregulator.io' + path, init);
  if (r.ok) return r.json();

  const body = await r.json().catch(() => ({ code: 'parse_error' }));
  const code = body.code ?? 'unknown';

  if (r.status === 429) {
    const reset = Number(r.headers.get('X-RateLimit-Reset')) * 1000;
    const sleepFor = Math.max(0, reset - Date.now());
    if (sleepFor < 5 * 60_000) {
      await new Promise((res) => setTimeout(res, sleepFor + 1_000));
      return igRequest(path, init, attempt + 1);
    }
  }

  if (r.status >= 500 && attempt < 3) {
    await new Promise((res) => setTimeout(res, 2 ** attempt * 1_000));
    return igRequest(path, init, attempt + 1);
  }

  const err = new Error(body.error ?? r.statusText);
  err.status = r.status;
  err.code = code;
  err.details = body.details;
  throw err;
}
```

---

# File: docs/code-examples.mdx

---
title: Code examples
description: curl / JavaScript / Python snippets for common operations.
template: doc
sidebar:
  order: 9
---

Copy-paste snippets for the two things every integration does first —
check a domain, then walk an authenticated list. No SDKs yet; the API
is small enough that 15 lines of `fetch` / `requests` does the job.

## Domain check — curl

```bash
curl -sG https://api.igregulator.io/v1/check \
  --data-urlencode 'domain=paddypower.com' | jq
```

## Domain check — JavaScript (fetch)

```js
const res = await fetch(
  'https://api.igregulator.io/v1/check?domain=paddypower.com'
);
const { match, confidence } = await res.json();

if (confidence === 'high' || confidence === 'medium') {
  console.log(
    `${match.operator} (licensed by ${match.jurisdiction}, ${match.license_number}) — confidence ${confidence}`,
  );
} else {
  console.log('No confident match found.');
}
```

## Domain check — Python (requests)

```python
import requests

r = requests.get(
    'https://api.igregulator.io/v1/check',
    params={'domain': 'paddypower.com'},
    timeout=5,
)
r.raise_for_status()
data = r.json()

match = data.get('match')
if match and data['confidence'] in ('high', 'medium'):
    print(f"{match['operator']} — {match['jurisdiction']} {match['license_number']}"
          f" (confidence={data['confidence']})")
else:
    print('No confident match.')
```

## Walk all UKGC operators — JavaScript

```js
const KEY = process.env.IGREGULATOR_KEY;
const BASE = 'https://api.igregulator.io';

async function* paginate(jurisdiction) {
  let offset = 0;
  const limit = 200;
  while (true) {
    const r = await fetch(
      `${BASE}/v1/jurisdictions/${jurisdiction}/operators?limit=${limit}&offset=${offset}`,
      { headers: { Authorization: `Bearer ${KEY}` } },
    );
    if (!r.ok) throw new Error(`HTTP ${r.status}`);
    const body = await r.json();
    if (body.operators.length === 0) return;
    for (const op of body.operators) yield op;
    offset += body.operators.length;
  }
}

for await (const op of paginate('UKGC')) {
  console.log(op.slug, op.display_name);
}
```

## Walk all UKGC operators — Python

```python
import os, requests

KEY = os.environ['IGREGULATOR_KEY']
BASE = 'https://api.igregulator.io'
session = requests.Session()
session.headers['Authorization'] = f'Bearer {KEY}'

def paginate(jurisdiction):
    offset, limit = 0, 200
    while True:
        r = session.get(
            f'{BASE}/v1/jurisdictions/{jurisdiction}/operators',
            params={'limit': limit, 'offset': offset},
            timeout=10,
        )
        r.raise_for_status()
        rows = r.json()['operators']
        if not rows:
            return
        for op in rows:
            yield op
        offset += len(rows)

for op in paginate('UKGC'):
    print(op['slug'], op['display_name'])
```

## Bulk domain verification — rate-limit aware

If you're verifying a list of 500 domains as part of a nightly sweep,
authenticate and sleep between requests to stay under the per-second
burst cap. Simpler than retry-on-429.

```python
import os, time, requests

KEY = os.environ['IGREGULATOR_KEY']
DOMAINS = open('domains.txt').read().splitlines()
S = requests.Session()
S.headers['Authorization'] = f'Bearer {KEY}'

for d in DOMAINS:
    r = S.get(
        'https://api.igregulator.io/v1/check',
        params={'domain': d}, timeout=5,
    )
    if r.status_code == 429:
        reset = int(r.headers.get('X-RateLimit-Reset', 0))
        wait = max(1, reset - int(time.time()))
        time.sleep(wait + 1)
        r = S.get(
            'https://api.igregulator.io/v1/check',
            params={'domain': d}, timeout=5,
        )
    r.raise_for_status()
    print(d, r.json()['confidence'])
    time.sleep(0.05)  # 20 req/sec ceiling headroom for Pro
```

---

# File: docs/webhooks.mdx

---
title: Webhooks
description: Push-based alerts on licence changes, expiries, and regulatory actions. HMAC-signed, retried, deliverable to any HTTPS endpoint.
template: doc
sidebar:
  order: 11
---

import { Tabs, TabItem, Aside } from '@astrojs/starlight/components';

iGregulator delivers change alerts via HTTP POST to a URL you
control. Ten event types, HMAC-SHA256 signed, retried seven times
with jittered backoff. Sign up for alerts in the
**[dashboard](https://app.igregulator.io/webhooks)** — no code
needed, create a URL, select events, copy the secret once.

> Just want to wire it up fast? Skip to
> [/docs/webhooks/quickstart](/docs/webhooks/quickstart/) for a
> 2-minute tour using webhook.site as the receiver.

## 1. Event types

Ten dot-notation events. Subscribe to any subset per endpoint.

| Event | Fires |
| --- | --- |
| `license.status_changed` | Any transition between `active` / `suspended` / `revoked` / `expired`. |
| `license.expiring_30d` | Active licence with `expiry_date` exactly 30 days from now. |
| `license.expiring_60d` | Same, 60 days. |
| `license.expiring_90d` | Same, 90 days. |
| `license.expired` | Status became `expired` (either explicitly or via date). |
| `license.issued` | New licence first observed in a scraper run. |
| `regulatory_action.added` | Fine / warning / revocation / licence_suspension entry added. |
| `coverage.degraded` | `/v1/health/coverage` transitions to `degraded` for a jurisdiction. |
| `coverage.restored` | `/v1/health/coverage` transitions back to `healthy`. |
| `webhook.endpoint_degraded` | Self-notification: one of your endpoints is failing >20% of deliveries. Fires only to your OTHER endpoints. |

<Aside type="caution" title="Subscribing to multiple expiry windows">
Subscribing to all three (`license.expiring_30d`, `_60d`, `_90d`)
delivers three separate webhooks per licence as it approaches
expiry. By design — they're semantically distinct warnings, not
duplicates. Subscribe only to the warning period(s) you act on.

```js
function onWebhook(event) {
  switch (event.event) {
    case 'license.expiring_90d':
      // Notify, no blocker — 3 months is plenty
      notifyCompliance(event.data);
      break;
    case 'license.expiring_60d':
      // Compose renewal paperwork
      kickoffRenewalFlow(event.data);
      break;
    case 'license.expiring_30d':
      // Escalate, block new referrals to the operator
      lockOperator(event.data);
      break;
  }
}
```
</Aside>

## 2. Envelope

Every event — production or test — carries the same outer shape:

```json
{
  "event": "license.status_changed",
  "event_id": "evt_01HX8EGQK3J7WA6MYTP7ZGYF21",
  "api_version": "2026-04-20",
  "timestamp": "2026-04-20T14:32:00.000Z",
  "livemode": true,
  "data": {
    "license_id": "uuid",
    "license_number": "039028-R-319297-013",
    "operator_id": "uuid",
    "operator_slug": "888-uk-limited",
    "jurisdiction_code": "UKGC",
    "previous_status": "active",
    "new_status": "suspended",
    "changed_at": "2026-04-20T03:04:12.000Z",
    "source_url": "https://www.gamblingcommission.gov.uk/..."
  }
}
```

- `event_id` — ULID, sorted lexicographically. Dedupe on this.
  `previous_status` may be `null` when a licence is first observed
  (`change_type: created` — status is its initial state).
- `api_version` — date constant. Bumped when a `data` shape gets a
  breaking change; old subscribers keep receiving the previous
  version until they upgrade.
- `livemode` — `false` only for `test.ping` events from the
  dashboard Test button.
- `timestamp` — when we emitted the event. Not when the change
  happened (that's in `data.*_at`).

### Regulatory action amounts

`regulatory_action.added` includes `amount_minor_units` — in the
**smallest currency unit** (pence for GBP, cents for USD). £5
million = `5000000000`. We store it this way so a £5M fine never
gets confused with £5,000.

```json
{
  "action_type": "fine",
  "amount_minor_units": 5000000000,
  "currency": "GBP"
}
```

## 3. Headers

Every delivery, including retries:

```
Content-Type: application/json
User-Agent: iGregulator-Webhook/1 (+https://igregulator.io)
X-iGregulator-Event: license.status_changed
X-iGregulator-Event-Id: evt_01HX8EGQK3J7WA6MYTP7ZGYF21
X-iGregulator-Timestamp: 1776717845
X-iGregulator-Delivery-Id: 3b1e2c4a-f8d1-4a7c-8b5f-111111111111
X-iGregulator-Attempt: 2
X-iGregulator-Signature: t=1776717845,v1=abcd…ef01
```

`X-iGregulator-Attempt` — tells your receiver this is retry N.
`X-iGregulator-Missed-Deliveries` appears on the next successful
delivery after one or more deliveries were abandoned (all 7
attempts exhausted). Fetch them via
`GET /v1/webhooks/:id/deliveries?status=abandoned`.

## 4. Signature verification

The `X-iGregulator-Signature` header is a Stripe-style CSV:

```
t=<unix-epoch-seconds>,v1=<hex>[,v1=<hex>…]
```

- `t` — the timestamp used in the HMAC input.
- `v1` — HMAC-SHA256 hex digest. May appear multiple times when a
  secret rotation is in progress; each `v1` is the signature
  computed with a different active secret. Accept the delivery if
  **any** `v1` value matches.

Signed input = `${t}.${raw_body}` — the HTTP body is included byte-
for-byte; do not re-serialise the JSON before verifying, or
whitespace drift will break the MAC.

<Tabs>
<TabItem label="Node.js">

```js
import crypto from 'node:crypto';

function verify(req, rawBody, secret) {
  const header = req.headers['x-igregulator-signature'];
  if (!header) return false;
  const parts = Object.fromEntries(
    header.split(',').map((p) => p.split('=')),
  );
  const signed = `${parts.t}.${rawBody}`;
  const expected = crypto.createHmac('sha256', secret)
    .update(signed).digest('hex');

  // Header may have several v1= values — iterate and accept any.
  const candidates = header
    .split(',')
    .filter((p) => p.startsWith('v1='))
    .map((p) => p.slice(3));
  return candidates.some((candidate) =>
    candidate.length === expected.length &&
    crypto.timingSafeEqual(Buffer.from(candidate), Buffer.from(expected)),
  );
}
```

</TabItem>
<TabItem label="Python">

```python
import hmac, hashlib

def verify(headers, raw_body: bytes, secret: str) -> bool:
    header = headers.get('x-igregulator-signature')
    if not header:
        return False
    parts = dict(p.split('=', 1) for p in header.split(','))
    signed = f"{parts['t']}.{raw_body.decode()}"
    expected = hmac.new(secret.encode(), signed.encode(), hashlib.sha256).hexdigest()
    candidates = [p[3:] for p in header.split(',') if p.startswith('v1=')]
    return any(hmac.compare_digest(c, expected) for c in candidates)
```

</TabItem>
<TabItem label="curl + jq">

```bash
# Debug verification from a saved request. Pass raw body on stdin.
SIG_HEADER="t=1776717845,v1=abcd...ef01"
SECRET="whsec_..."
T=$(echo "$SIG_HEADER" | tr ',' '\n' | awk -F= '/^t/{print $2}')
EXPECTED=$(printf '%s.%s' "$T" "$(cat)" \
  | openssl dgst -sha256 -hmac "$SECRET" -hex | awk '{print $2}')
echo "$SIG_HEADER" | tr ',' '\n' | grep "^v1=" | awk -F= '{print $2}' \
  | grep -Fqx "$EXPECTED" && echo "ok" || echo "mismatch"
```

</TabItem>
</Tabs>

<Aside type="caution" title="Replay protection">
Reject the delivery if `|now - X-iGregulator-Timestamp|` exceeds
300 seconds — an attacker who captured a delivery can't re-send it
an hour later. Our retries never push timestamps past this window
(the worker re-signs on each attempt with the current time).
</Aside>

## 5. Retry policy

Failed deliveries retry on a jittered schedule:

| Attempt | Delay after previous | With ± 20 % jitter |
| --- | --- | --- |
| 1 | — | fired immediately |
| 2 | 30 s | 24 – 36 s |
| 3 | 2 m | 1:36 – 2:24 m |
| 4 | 10 m | 8 – 12 m |
| 5 | 1 h | 48 – 72 m |
| 6 | 6 h | 4.8 – 7.2 h |
| 7 | 24 h | 19.2 – 28.8 h |
| — | abandon after 7 total attempts | |

Failure = any non-2xx response or network error (DNS, timeout, TLS,
connection reset). **3xx redirects are treated as failures on
purpose** — point your URL at the final destination. Following them
silently would let an attacker redirect deliveries to an internal
metadata service after the URL passed creation-time checks. Common
gotcha: API gateways that issue a transparent `https://` upgrade
on `http://` URLs — register the `https://` form directly to
avoid the redirect.

Timeout per attempt: **10 seconds**. Long-running receivers should
ack fast and process async (return 2xx immediately, queue the body
for your worker).

## 6. Delivery guarantees

- **At-least-once.** A network blip may have us deliver the same
  event twice. Dedupe on `event_id`.
- **No ordering.** Events from different operators run independently;
  even events on the same operator can arrive out of order during
  a retry burst. Use `data.*_at` timestamps inside the payload to
  sequence consumer-side state.
- **Scraper outages queue.** If a jurisdiction's scraper stalls,
  detected changes queue up and fire on the next successful run
  — no lost events, just a delayed batch.
- **Retention:** 30 days for both `webhook_events` (replay window)
  and `webhook_deliveries` (delivery history). Fetch deliveries
  via `GET /v1/webhooks/:id/deliveries` while they're still in
  the window; older rows are pruned daily.

## 7. Integration patterns

### Pattern A — Webhooks primary

The simplest setup. Create an endpoint, subscribe to events,
process them in real time.

```js
app.post('/igregulator-webhook', async (req, res) => {
  if (!verify(req, rawBody, process.env.WEBHOOK_SECRET)) {
    return res.status(400).send('bad signature');
  }
  res.sendStatus(200); // ACK fast, process async
  void queueForProcessing(req.body);
});
```

### Pattern B — Polling fallback

Agent can't accept inbound webhooks (locked-down corporate network,
local dev). Use `GET /v1/watchlist/events` — see
[watchlist docs](/docs/watchlist/). Bootstrap with `since=<ISO>`,
then switch to cursor pagination.

### Pattern C — Hybrid webhook + polling

Run both. Webhooks are the low-latency primary; polling covers
the few-hour window where your receiver was down and deliveries
might abandon. Because the same `event_id` ships on both channels,
dedupe on it and there's no double-processing.

<Tabs>
<TabItem label="Node.js">

```js
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
const DEDUPE_TTL = 7 * 24 * 60 * 60; // match webhook_deliveries retention

async function processOnce(event) {
  // SET NX returns null if the key already exists.
  const set = await redis.set(
    `event_seen:${event.event_id}`, '1',
    'EX', DEDUPE_TTL, 'NX',
  );
  if (set === null) return; // already processed
  await handleEvent(event); // your business logic
}

// Webhook handler:
app.post('/webhook', async (req, res) => {
  if (!verify(req, rawBody, secret)) return res.sendStatus(400);
  res.sendStatus(200);
  await processOnce(req.body);
});

// Hourly polling fallback:
async function pollBackfill() {
  let cursor = await redis.get('watchlist:cursor');
  while (true) {
    const url = new URL('https://api.igregulator.io/v1/watchlist/events');
    if (cursor) url.searchParams.set('cursor', cursor);
    else url.searchParams.set('since', new Date(Date.now() - 3600_000).toISOString());
    url.searchParams.set('limit', '100');
    const r = await fetch(url, { headers: { Authorization: `Bearer ${apiKey}` } });
    const { events, next_cursor, has_more } = await r.json();
    for (const event of events) await processOnce(event);
    if (next_cursor) await redis.set('watchlist:cursor', next_cursor);
    if (!has_more) break;
  }
}
```

</TabItem>
<TabItem label="Python">

```python
import redis, time, requests

r = redis.Redis.from_url(os.environ['REDIS_URL'])
DEDUPE_TTL = 7 * 24 * 3600

def process_once(event):
    # SET NX returns None if key already exists.
    if r.set(f"event_seen:{event['event_id']}", '1',
             ex=DEDUPE_TTL, nx=True) is None:
        return  # already processed
    handle_event(event)  # your business logic

# Webhook handler (Flask):
@app.post('/webhook')
def webhook():
    if not verify(request.headers, request.get_data(), SECRET):
        abort(400)
    event = request.get_json()
    # ACK fast, process async in a worker queue
    spawn(process_once, event)
    return '', 200

# Hourly backfill:
def poll_backfill():
    cursor = r.get('watchlist:cursor')
    while True:
        params = {'limit': 100}
        if cursor: params['cursor'] = cursor
        else: params['since'] = (time.time() - 3600).isoformat() + 'Z'
        resp = requests.get(
            'https://api.igregulator.io/v1/watchlist/events',
            headers={'Authorization': f'Bearer {API_KEY}'},
            params=params, timeout=10,
        ).json()
        for event in resp['events']: process_once(event)
        if resp.get('next_cursor'):
            r.set('watchlist:cursor', resp['next_cursor'])
        if not resp.get('has_more'): break
```

</TabItem>
</Tabs>

## 8. Testing

- **Dashboard Test button** — fires a synthetic `test.ping` event
  at your endpoint. Does NOT create delivery history rows. Use
  during wiring to confirm the signature path.
- **webhook.site** — paste your URL there, hit Test, inspect
  headers + body. Fastest way to see what a delivery looks like
  before your server exists.
- **ngrok / cloudflared** — tunnel a local dev server to a public
  URL. `http://localhost` is blocked by our SSRF filter at
  creation time; a tunnel gives you a real routable host.

## 9. Secret rotation

Rotation overlap is 7 days. The rotation flow:

1. Click **Rotate** on the endpoint in the dashboard.
2. We issue a new secret and stamp the previous one with
   `expires_at = NOW() + 7 days`.
3. Deliveries sign with **both** secrets during the overlap —
   every delivery carries `v1=<hex>,v1=<hex>` in the signature
   header.
4. Update your server to accept the new secret. Existing code that
   uses the old one keeps verifying until day 7.
5. After day 7, the old secret expires and deliveries sign only
   with the new one.

<Aside type="caution" title="Silent fail after rotation">
If your server isn't updated to accept the new secret before the
old one expires (day 7), every delivery after that starts failing
signature verification — silently, from your end, because we still
deliver successfully (we don't know your verification is wrong).
The dashboard's Deliveries modal will show a rising failure rate
when this happens; it's the canary. Set an alert on
`webhook.endpoint_degraded` (self-notification to your other
healthy endpoints) for a second layer.
</Aside>

## 10. Errors at creation time

- `400 invalid_webhook_url` + `details.reason: private_ip_blocked` —
  URL resolved to a private / loopback / link-local IP
  (including `169.254.169.254`, AWS + GCP metadata). Use a public
  host or a tunnel.
- `400 invalid_webhook_url` + `details.reason: invalid_scheme` —
  Only `http://` and `https://` accepted; https strongly preferred.
- `400 invalid_query` + `details.reason: invalid_event_type` —
  You passed an unrecognised event name. Compare against the list
  in §1.
- `403 quota_exceeded` — You hit your plan's
  `max_webhook_endpoints`. Pause or delete an endpoint, or upgrade.

## 11. Best practices

- **Respond 2xx in < 5 seconds, process async.** We time out at
  10 s; if your receiver regularly takes 5+ s you'll start hitting
  retries.
- **Verify every delivery.** Skip only for `test.ping` if you
  treat test events as connectivity checks and not real data.
- **Dedupe on `event_id`.** Always. At-least-once delivery means
  duplicates will happen eventually.
- **Don't assume ordering.** Use `data.*_at` timestamps.
- **Keep a polling fallback for critical paths** — see Pattern C.
- **Alert on `webhook.endpoint_degraded`.** If one of your
  endpoints is failing, we notify your OTHER endpoints so you
  don't have to hear about it from a missing downstream action.

---

# File: docs/webhooks/quickstart.mdx

---
title: Webhooks quickstart
description: First webhook delivery in two minutes using webhook.site — no server required.
template: doc
sidebar:
  order: 1
---

This is "get it working in two minutes" — no signature verification
yet, no production-grade receiver. Point Ctrl+F at
[/docs/webhooks](/docs/webhooks/) when you're ready to harden.

## 1. Set up a receiver

Open [webhook.site](https://webhook.site). It hands you a unique
URL like `https://webhook.site/#!/<random-uuid>`. Copy the URL —
that's your temporary endpoint. Leave the page open; deliveries
show up in real time.

## 2. Create the webhook

1. Sign in to the
   [iGregulator dashboard](https://app.igregulator.io/webhooks).
2. Click **+ create webhook**.
3. Paste the webhook.site URL.
4. Subscribe to any event type for now — `license.status_changed`
   is the most common; you can change later.
5. Submit. You'll see a reveal dialog with a
   `whsec_<base64url>` secret — **copy it** (you'll need it when
   you harden later) and click "copy + close".

## 3. Fire a test delivery

Back in the dashboard, click **test** on your new endpoint row.
A result dialog pops up showing:

- `delivered: true`
- HTTP status your endpoint returned
- Latency in ms
- Response body

Switch to the webhook.site tab: the request is there, complete
with headers, including:

```
X-iGregulator-Event: test.ping
X-iGregulator-Event-Id: evt_test_...
X-iGregulator-Signature: t=...,v1=...
```

That's a real production-shape delivery — just flagged
`livemode: false` in the envelope so you don't treat it as
business data.

## 4. Harden for production

- **Verify signatures.** See
  [/docs/webhooks § signature verification](/docs/webhooks/#4-signature-verification) —
  includes Node / Python / curl examples.
- **Replace webhook.site.** Stand up a real HTTP server; the same
  envelope + headers will land there.
- **Dedupe on `event_id`.** At-least-once delivery means duplicates
  during retries. See [Pattern C](/docs/webhooks/#7-integration-patterns).
- **Understand retries.** 7 attempts, jittered backoff — start at
  30 s, end at ~24 h. Details in
  [§5 retry policy](/docs/webhooks/#5-retry-policy).

Main reference: [/docs/webhooks](/docs/webhooks/).

---

# File: docs/watchlist.mdx

---
title: Watchlist
description: Track specific operators for automated alerts on licence changes, expiries, and regulatory actions. Webhook push or polling fallback.
template: doc
sidebar:
  order: 12
---

import { Aside } from '@astrojs/starlight/components';

A watchlist is your list of operators you care about, plus the
automation layer that fires events when any of them changes. Add
an operator once, get alerted whenever its licence status flips,
a regulatory action lands, or an expiry date approaches — without
writing polling loops against every endpoint we offer.

## 1. Overview

Plan limits (also in [pricing](/pricing)):

| Tier | Watchlist cap | Webhooks | Polling |
| --- | --- | --- | --- |
| Starter | 25 operators | 1 endpoint | 10 / hour |
| Pro | 250 operators | 5 endpoints | 60 / hour |
| Business | unlimited | 20 endpoints | 600 / hour |
| Enterprise | unlimited | unlimited | unlimited |

Webhooks are the primary alert channel. Polling exists for agents
that can't accept inbound HTTP (corporate networks, air-gapped
analytics, local dev) and as a backfill mechanism during webhook
outages.

## 2. Managing the watchlist

### Dashboard

Sign in and open
[app.igregulator.io/watchlist](https://app.igregulator.io/watchlist).
Type an operator name — we typeahead against every operator
slug we know about. Click to add. Click remove to drop.

### API

Same surface via bearer token:

```bash
# Current watchlist + count + plan cap
curl -H "Authorization: Bearer igk_..." \
  https://api.igregulator.io/v1/watchlist

# Add an operator by slug (lowercase, hyphens)
curl -X POST -H "Authorization: Bearer igk_..." \
  -H "Content-Type: application/json" \
  -d '{"operator_slug":"888-uk-limited"}' \
  https://api.igregulator.io/v1/watchlist/operators

# Remove (idempotent)
curl -X DELETE -H "Authorization: Bearer igk_..." \
  https://api.igregulator.io/v1/watchlist/operators/888-uk-limited

# Paginated listing with current licence status per operator
curl -H "Authorization: Bearer igk_..." \
  "https://api.igregulator.io/v1/watchlist/operators?limit=50&offset=0"
```

Discover slugs with `GET /v1/operators/search?q=<name>`.

## 3. Receiving events

### Webhook push (primary)

Create an endpoint with the `watchlist_only: true` flag (default).
Deliveries fire for operators in your watchlist only — no
firehose, no noise. Events covered:

- `license.status_changed`
- `license.expiring_30d` / `_60d` / `_90d`
- `license.expired`
- `license.issued` (only if you were watching the operator when
  the new licence was detected)
- `regulatory_action.added`

See [/docs/webhooks](/docs/webhooks/) for the signing + retry
protocol. Quickstart in 2 minutes:
[/docs/webhooks/quickstart](/docs/webhooks/quickstart/).

### Polling (fallback)

`GET /v1/watchlist/events` returns the same envelope events,
pulled. Cursor-paginated so you don't re-process events between
runs. Rate-limited per plan (see §1). Every response carries
`X-Poll-RateLimit-Limit`, `-Remaining`, `-Reset`, `-Window`, and a
`X-Poll-Recommended-Interval` hint in seconds.

```bash
# Bootstrap: 30-day window (matches event retention)
curl -H "Authorization: Bearer igk_..." \
  "https://api.igregulator.io/v1/watchlist/events?since=2026-04-01T00:00:00Z&limit=100"

# Steady state: use the next_cursor from the previous response
curl -H "Authorization: Bearer igk_..." \
  "https://api.igregulator.io/v1/watchlist/events?cursor=eyJ0cy...&limit=100"
```

Response:

```json
{
  "events": [
    {
      "event": "license.status_changed",
      "event_id": "evt_...",
      "api_version": "2026-04-20",
      "timestamp": "...",
      "livemode": true,
      "data": { ... }
    }
  ],
  "next_cursor": "eyJ0c...",
  "has_more": true
}
```

`events` payloads are **identical** to the webhook envelope
(minus the signature — polling authenticates via your API key,
not per-delivery HMAC). Dedupe on `event_id` whether you're
receiving via webhook or poll; you'll use the same key for both.

### Polling best practices

- **Persist the cursor.** Save `next_cursor` after each successful
  batch to your own DB / disk. On restart, resume from it.
- **Dedupe on `event_id`.** Crash windows can cause you to
  re-process the last batch; same dedupe path you'd use for
  webhooks covers this.
- **Respect `X-Poll-Recommended-Interval`.** It's `ceil(3600 / limit)` —
  sleeping that long between polls guarantees you never hit the
  hour ceiling. Starter = 360 s, Pro = 60 s, Business = 6 s.
- **On 429, wait until `reset_at`.** Don't exponential-backoff —
  the window resets deterministically on the hour boundary. See
  [/docs/rate-limits](/docs/rate-limits/).
- **Hybrid webhooks + polling is the resilient pattern.** Webhooks
  for low latency, polling for the few-hour windows where your
  receiver was down and deliveries abandoned. Both ship the same
  `event_id`, so dedupe makes double-processing a no-op. Example
  in [/docs/webhooks § Pattern C](/docs/webhooks/#7-integration-patterns).

### `since=` is capped at 30 days

That matches our event retention. Older values fail with
`400 since_exceeds_retention_window`. Rare in practice — the
first call uses `since`, every subsequent call uses the cursor.

## 4. Plan limits in detail

If you exceed your watchlist cap mid-month, the next `POST
/v1/watchlist/operators` returns `403 watchlist_quota_exceeded`
with the current count + cap. Remove an operator or upgrade.

Polling ceiling hits return `429 watchlist_events_poll_limit` with
a concrete `reset_at` timestamp — not exponential backoff,
deterministic hour-boundary reset. Webhooks do not count against
this limit.

## 5. What events don't fire for watched operators

- **`coverage.degraded` / `coverage.restored`** — these are
  jurisdiction-level, not operator-level. They're emitted
  regardless of watchlist membership; anyone subscribed to the
  event type gets them.
- **`webhook.endpoint_degraded`** — self-notification about your
  own endpoints. Ignores watchlist.

## 6. Troubleshooting

- **No events arriving** — your watchlist may be empty, or the
  operators you track haven't had any changes this month. Run
  `GET /v1/watchlist/events?since=2026-04-01T00:00:00Z` to confirm
  what would have fired over a month.
- **Too many events** — too-broad subscription. Each of the three
  expiry windows fires separately; narrow to the one(s) you act
  on.
- **Events for operators I don't watch** — your webhook might
  have `watchlist_only: false`. Check `PATCH /v1/webhooks/:id`
  with `{ "watchlist_only": true }`.

Related: [/docs/webhooks](/docs/webhooks/),
[/docs/rate-limits](/docs/rate-limits/).

---

# File: docs/for-ai-agents.mdx

---
title: For AI agents
description: Machine-readable resources, structured errors, MCP server, and integration patterns for LLM-powered integrations.
template: doc
sidebar:
  order: 10
---

Building an AI agent, LLM-powered integration, or automated compliance
tooling? Everything you need to wire iGregulator into a language-model
workflow lives here.

## Quick discovery

Three machine-readable resources document the entire product. One
fetch each — no crawling required.

- **[llms.txt](/llms.txt)** — structured index per the
  [llmstxt.org spec](https://llmstxt.org). ~3 KB, complete product
  map with links into the longer docs.
- **[llms-full.txt](/llms-full.txt)** — every `/docs/*` page
  concatenated into one file. ~37 KB. One fetch buys full context.
- **[OpenAPI 3.1 spec](https://api.igregulator.io/openapi.json)** —
  authoritative API schema. Every endpoint, request/response shape,
  auth scheme, rate-limit description.

Every page on this site also emits `<link rel="alternate">` pointers
to all three URLs so a crawler hitting the landing can auto-discover
them. The homepage additionally returns `Link` response headers
(RFC 8288 / RFC 9727) pointing at the catalog, OpenAPI spec, docs, and
server card — so headless agents find them without parsing HTML.

### Well-known discovery endpoints

Standards-based entrypoints under `/.well-known/` (and the apex), for
agents that look there first:

- **[/.well-known/mcp/server-card.json](/.well-known/mcp/server-card.json)**
  — MCP Server Card (SEP-1649): server info, transport endpoint, and the
  full tool list. (Legacy [/.well-known/mcp.json](/.well-known/mcp.json)
  is still served too.)
- **[/.well-known/api-catalog](/.well-known/api-catalog)** — API catalog
  (RFC 9727) linking the OpenAPI spec, docs, `llms.txt`, and the
  `/v1/health` status endpoint.
- **[/.well-known/agent-skills/index.json](/.well-known/agent-skills/index.json)**
  — Agent Skills discovery index (v0.2.0). Currently ships a
  `verify-gambling-license` skill (digest-pinned `SKILL.md`).
- **[/auth.md](/auth.md)** — how to authenticate: bearer API keys (we run
  no OAuth server, so there are deliberately no `/.well-known/oauth-*`
  documents).

Content usage is declared via `Content-Signal` in
[robots.txt](/robots.txt) — `search`, `ai-input`, and `ai-train` are all
permitted.

## Agent-friendly features

Deliberate design choices that make integration cleaner for agents.

### Structured error details

Every error response carries `details.reason` + `details.suggestion`.
Agents branch on `reason` instead of free-form text and surface
`suggestion` to the user / caller as-is.

```json
{
  "error": "domain is not a valid hostname",
  "code": "invalid_query",
  "details": {
    "field": "domain",
    "reason": "not_a_valid_hostname",
    "suggestion": "Pass a bare hostname — no scheme, no path, no underscores. Example: 'paddypower.com' or 'www.bet365.com'."
  }
}
```

The full `reason` vocabulary is stable; see
[error handling](/docs/errors/) for the code + reason matrix.

### Response `_meta` field

Data-returning endpoints include a `_meta` envelope with provenance:

- `scraped_at` — ISO-8601 timestamp when we last pulled this record.
- `source_url` — exact regulator-register URL, or null if the row
  doesn't map to one URL.
- `confidence_hint` — `authoritative` (direct register dump),
  `scraped` (HTML / PDF), or `derived` (fuzzy match, not a direct
  lookup).
- `source_modified_at` — regulator-side modification timestamp if
  exposed (UKGC only), else null.

An agent can say "verified via UKGC official register, scraped 6 h
ago" with a real evidence trail, not a gloss.

### `/v1/health/coverage`

Public endpoint exposing per-jurisdiction scraper freshness.
`status: healthy` vs `degraded`, `age_hours`, `record_count` per
regulator. See [endpoints](/docs/endpoints/). Useful for SLA
dashboards and status pages.

### Stable `operationId`s

Every endpoint has a stable `operationId` (`checkDomain`,
`searchOperators`, `getOperator`, `listJurisdictions`, `getLicense`,
`getLicenseHistory`, `checkCoverage`). SDK generators and MCP
servers use these as function names — no `getV1CheckDomain` slug
noise.

### Dual rate-limit headers

Both custom and IETF-draft standard formats ship on every response.
Parse either, both authoritative:

```
X-RateLimit-Policy: tier=public;limit=10;window=hour
RateLimit-Policy: "default";q=10;w=3600
```

The IETF draft (`draft-ietf-httpapi-ratelimit-headers`) is what
Cloudflare, Kong, and similar gateways auto-parse; the custom format
is human-readable for logs.

### Deprecation lifecycle

Breaking changes broadcast via standard headers (example shape):

```
Deprecation: true
Sunset: <RFC 9745 timestamp>
Link: </docs/api#check>; rel="deprecation"; type="text/html"
```

Per RFC 9745 + RFC 8594. Minimum 90 days notice. An agent caching
request shapes can inspect `Sunset` before assuming stability. No fields
are currently deprecated; the headers will reappear when the next
removal cycle starts.

## MCP server

Live at [`mcp.igregulator.io`](https://mcp.igregulator.io). Streamable
HTTP transport (current MCP spec — SSE used for streaming responses).
Compatible with Claude Desktop, Cursor, Windsurf, Cline, and any other
client that speaks MCP.

Tools exposed (lean output — a compact verdict, not the full REST payload):

- `check_domain` — verify a licence by domain (supports `as_of`)
- `check_domain_batch` — up to 100 domains in one call (KYB sweep)
- `search_operators` — search the register by name
- `get_operator` — full operator detail (supports `as_of`)
- `get_operator_regulatory_actions` — enforcement history (fines, suspensions)
- `check_coverage` — data freshness per jurisdiction
- `list_jurisdictions` — all covered regulators
- `get_jurisdiction` — single regulator metadata
- `get_license` — single licence detail
- `get_license_history` — status-change timeline

On a no-match, `check_domain` returns `match_absence_reason` +
`checked_jurisdictions` — never collapse a miss into a bare "unlicensed".

Auth = same API keys as the direct HTTP API. Setup walkthrough +
example prompts at [/docs/mcp](/docs/mcp/). Discovery via the
[MCP Server Card](/.well-known/mcp/server-card.json) (SEP-1649) and the
legacy [/.well-known/mcp.json](/.well-known/mcp.json).

## WebMCP (in-browser tools)

Distinct from the server above: the homepage registers
[WebMCP](https://webmcp.org/) tools via `navigator.modelContext`, so an
agent **driving a browser** can act without wiring up the HTTP/MCP
integration at all. Two read-only tools, backed by the public API
(10 req/IP/hour, no key):

- `check_gambling_license` — verify by domain or licence number.
- `search_gambling_operators` — search operators by name.

They're feature-detected, so they simply don't appear in browsers
without the WebMCP API. Use the server-side MCP for production
integrations; WebMCP is the zero-setup path for browser agents.

## Integration patterns

### Merchant onboarding verification

Payment-processor KYB flow:

1. User submits a merchant application with a domain.
2. Agent calls `GET /v1/check?domain=X`.
3. Branch on the response:
   - `confidence: high` + `status: active` → approve.
   - `confidence: medium` → flag for manual compliance review.
   - `confidence: low` → **manual review, not reject** — `low` fires
     when the domain root is a generic gambling label
     (`casino.com`, `poker.com`, etc.) and `/v1/check` refuses to
     guess which operator runs it. The site may well be licensed,
     just not identifiable from the label alone. Auto-rejection
     here blocks legitimate operators.
   - `match: null` → **branch on `match_absence_reason`, don't blanket-reject**:
     - `generic_term` → manual review (ambiguous label, see above).
     - `no_record_found` → not in any covered register. Scope the verdict to
       `checked_jurisdictions` ("not licensed in any of the 6 jurisdictions we
       cover") rather than an unqualified "unlicensed".
   - Any match with `status: revoked` / `suspended` / `expired` →
     reject regardless of confidence.

### Daily regulatory sweep

Portfolio-monitoring automation:

1. Agent keeps a list of N operator slugs in its CRM / knowledge base.
2. Daily cron: iterate the list, call `GET /v1/operators/:slug`.
3. Diff against previous day's snapshot. Detect status changes,
   regulatory actions, expiry windows.
4. Alert compliance team on anomalies.

### Domain reputation scoring

Risk-scoring for gambling-adjacent domains:

1. Agent receives an unknown gambling-related domain.
2. Calls `GET /v1/check?domain=X`.
3. For a confident match, fetches the operator's enforcement history from
   `GET /v1/operators/{slug}/regulatory-actions` and combines `confidence`
   + `match_type` + any regulatory actions into a composite risk score.
4. Score feeds the downstream decision (list / delist / require
   extra verification).

### Bulk KYB sweep (batch)

Auditing a whole merchant book or affiliate list in one shot:

1. Collect the domains (chunks of 100).
2. `POST /v1/check/batch` with `{ "domains": [...] }` — one tool-call per
   100 instead of one per domain.
3. Iterate `results`: each row carries `match` + `confidence` +
   `match_absence_reason` (same semantics as the single check); a malformed
   entry comes back with `error` and doesn't fail the batch.
4. `checked_jurisdictions` is returned once at the top — use it to scope
   every "not found" verdict. See [Batch domain check](/docs/batch/).

### Retrospective transaction check (as_of)

"Was this merchant licensed at the time of the transaction?":

1. `GET /v1/check?domain=X&as_of=2026-03-01` (or `/v1/licenses/{id}?as_of=`).
2. Read the `as_of` object, and **honour `knowledge`**:
   - `observed` → `status_as_of` is the real status then; `established_by`
     shows when it was last confirmed relative to your date.
   - `before_tracking` → the date predates our observation window. Do **not**
     assert a status — tell the user we weren't watching before
     `tracking_since`. This is the difference between a defensible answer and
     a fabricated one. See [Point-in-time lookups](/docs/point-in-time/).

## Start integrating

1. Fetch [llms-full.txt](/llms-full.txt) for complete docs context in
   a single request.
2. Review the [OpenAPI spec](https://api.igregulator.io/openapi.json).
3. Try endpoints interactively in the
   [playground](/docs/playground/).
4. [Create a free account](https://app.igregulator.io/signup) for an
   API key and MCP server access — free for founding members.

---

Questions, integration help, feedback — founder@igregulator.io.

---

# File: docs/changelog.mdx

---
title: Changelog
description: API-level changes, deprecations, breaking-change policy.
template: doc
sidebar:
  order: 99
---

API-level changes only. Internal refactors, scraper updates, and infra
changes don't appear here unless they surface in a response shape or
error behaviour.

## Versioning policy

All `/v1/*` endpoints are maintained **indefinitely**. We do not silently
retire versioned routes. Future major versions will be introduced under a
new path (`/v2/*`), with **v1 and v2 running in parallel for a minimum of
12 months** after v2 launch. Migration guidance lands in this changelog at
v2 launch.

## Breaking-change policy

- Breaking changes **always** land behind a new URL path (`/v2/...`) — we never break a stable endpoint's response shape in place.
- Additive changes (new fields, new endpoints, loosened validation) ship at any time without a version bump.
- Deprecation flow for individual fields inside a stable version:
  - `Deprecation: true` header on affected responses the day the field is marked.
  - `Sunset` header (RFC 9745) with the removal date, **minimum 90 days** out.
  - Changelog entry below with the migration guidance.
  - For agents: `Link: …; rel="deprecation"` surfaced alongside the header.

## 1.8.0 — 2026-05-29

All additive — no breaking changes.

- **Batch domain check.** New `POST /v1/check/batch` (authenticated) resolves up to 100 domains in one request; each result mirrors the single-check shape, `checked_jurisdictions` is returned once at the top, and a malformed hostname comes back as a per-row `error` without failing the batch. See [Batch domain check](/docs/batch/).
- **`match_absence_reason` + `checked_jurisdictions` on `/v1/check`.** When `match` is `null`, the response now says *why* — `generic_term` (ambiguous label) vs `no_record_found` (checked, not present) — and lists the exact registers checked, so a "not licensed" verdict is scoped to coverage, never absolute. Present only on a miss. See [Confidence scoring](/docs/confidence/).
- **Point-in-time lookups (`?as_of=`).** `/v1/check`, `/v1/licenses/{id}`, and `/v1/operators/{slug}` accept `?as_of=` to reconstruct a licence's status as of a past date from transition history — **strictly within our observation window** (`knowledge: observed | before_tracking | no_such_license`; never extrapolated before `tracking_since`; future dates `400`). See [Point-in-time lookups](/docs/point-in-time/).
- **`domains[].association` on operator detail.** `GET /v1/operators/{slug}` now returns `direct` / `white_label` per domain (the column was always populated; the field was missing from the response).
- **`/v1/check` domain matching now collapses www / apex / subdomain variants** via the Public Suffix List (registrable-domain fallback after exact-host), so `virginbet.com` and `www.virginbet.com` resolve to the same operator. Single-owner guard prevents over-matching shared white-label platform domains.
- **Rate-limit headers on authenticated responses.** `X-RateLimit-Limit` / `-Remaining` / `-Reset` + `X-Upgrade-URL` now accompany authenticated `/v1/*` responses, not just the public path.
- **OpenAPI completeness.** `GET /v1/operators/{slug}/regulatory-actions` and `GET /v1/health/coverage` are now in the spec; `ApiError.code` documents `payment_required` + `quota_exceeded`.
- **MCP server caught up to the API.** New tools `check_domain_batch`, `get_operator_regulatory_actions`, `check_coverage`; `check_domain` + `get_operator` accept `as_of`. Tool output is now **lean** (a compact verdict, not the full REST payload) — and `match_absence_reason` + `checked_jurisdictions` are preserved on a miss, so agents never collapse "not found" into "unlicensed". See [MCP server](/docs/mcp/).

## 1.7.0 — 2026-04-29

- **Tobique Gaming Commission (TGC) re-enabled — 6th jurisdiction live.** The Cloudflare Worker proxy at `igregulator-scraper-proxy.scvgr-agent.workers.dev` now bridges btc → thetgc.ca, bypassing the IP-reputation block that paused us in 1.6.1. Same scraper code, same daily 04:15 UTC slot — only the transport changed (HMAC-signed POST to the Worker, the Worker fetches the upstream CF→CF). Scraper opts in via `USE_PROXY=true`; other scrapers unaffected.
- **New `@igregulator/scraper-utils` package** carrying the reusable `proxyFetch` helper. Future jurisdictions whose upstream blocks our IP can opt in by setting two env vars (`USE_PROXY=true`, `PROXY_HMAC_SECRET=…`) and adding their hostname to the Worker's `ALLOWED_DOMAINS` list — no code change in the scraper itself.
- **Migration 0018** re-inserts the TGC jurisdiction row removed in 0017.

## 1.6.1 — 2026-04-28

- **Tobique Gaming Commission (TGC) ingestion deferred.** PR #119 shipped the scraper, but the upstream `thetgc.ca` blocks the production origin IP at the Cloudflare edge (HTTP 403 across all UAs and header shapes). Other hosts return 200; this is an IP-reputation block on the Hetzner range. We've surgically rolled the public surfaces back to 5 jurisdictions while the scraper code stays merged. Re-enables cleanly once we land a Cloudflare Worker proxy on `scvgr-agent.workers.dev`. Investigation + path-forward documented in [docs/scrapers/tobique-investigation.md](https://github.com/igregulator/igregulator/blob/main/docs/scrapers/tobique-investigation.md).

## 1.6.0 — 2026-04-28

- **Tobique Gaming Commission (TGC) added — 6th jurisdiction.** *(Reverted on the public surface — see 1.6.1. Scraper code stays merged for re-enable when the proxy lands.)* ~160 licences ingested daily from [thetgc.ca/license-holders/](https://thetgc.ca/license-holders/). License-type vocabulary `B2C` / `B2B`. License numbers are synthesised as `TGC/<TYPE>/<slug>` (TGC doesn't publish IDs, same convention as KH). Cron 04:15 UTC; regulatory tier shifted +15 min. The fuzzy + domain-exact match flow on `/v1/check` includes TGC automatically.
- **Domain coverage 0% out of the box for TGC.** Same upstream-doesn't-publish-websites bucket as Curaçao. Documented at [/docs/coverage-methodology](/docs/coverage-methodology/). Phase 4 WHOIS / Tranco enrichment will close this for both regulators in one pass.

## 1.5.0 — 2026-04-28

- **Trust-signals + positioning sweep.** New pages: [/about](https://igregulator.io/about), [/terms](https://igregulator.io/terms), [/privacy](https://igregulator.io/privacy). Footer reorganised — Company column now lists About / Changelog / Terms / Privacy. Stale "MCP server (soon)" replaced with a live link.
- **Legal disclaimer surfaced.** Now rendered as a callout at the top of [/docs](https://igregulator.io/docs/) and embedded in the OpenAPI spec's top-level `info.description` so agents reading the spec see it. Same wording: results are informational, customers responsible for their own compliance decisions.
- **Hero copy iteration — pain-driven.** "iGaming licensing intelligence API" → **"Verify gambling operator licenses before they cost you."** Sub-copy mentions the buyer profiles (compliance teams, payment providers, affiliate networks) explicitly. Meta description, og:description, llms.txt opening line aligned. No data or endpoint changes.

## 1.4.0 — 2026-04-28

- **MGA domain enrichment.** The MGA scraper now fans out from each B2C license to the legacy `authorisation.mga.org.mt/verification.aspx` page and pulls the **Website URL(s)** field. Runs daily in the same 03:15 UTC slot as the primary register pass, ~30 s wall time at p-limit 5, no separate cron entry. Recovers ~110 B2C operator domains from a previous baseline of zero. B2B / CRP licenses are skipped — the upstream verification page omits the Website URL section for non-consumer-facing license classes.
- **Coverage methodology update.** `/v1/health/coverage` now exposes `domain_coverage.{operators_with_domain, normative_operators, coverage_pct}` per jurisdiction. The denominator is operators where domain disclosure is normative for their license type — excludes B2B-only types (CSPA, B2B, Non-Remote, Ancillary, supplier permits, etc.) that don't have consumer-facing domains by design. Methodology change only; no data changes. **Affected metrics under the new denominator:**
  - **AN** — 98% of B2C operators have ≥1 domain
  - **CW** — 0% (upstream ceiling, no scraper-side work possible — see [scope doc](https://github.com/igregulator/igregulator/blob/main/docs/scrapers/cw-domain-investigation.md))
  - **KH** — 69% of Interactive Gaming Permit holders (gap is upstream non-disclosure)
  - **MGA** — 0% pre-1.4.0 enrichment, ~90% post
  - **UKGC** — 40% of `Remote`-license operators (upstream meaningful ceiling, the rest don't operate consumer sites)
- **UKGC parser audit closed without code changes.** Time-boxed re-audit confirmed the ~17.3% raw figure is 96% of the 482-operator meaningful ceiling (Active + White Label distinct accounts in `domain-names.csv`). Inactive-domain rows aren't ingested deliberately to avoid stale `/v1/check` matches. Findings documented in [docs/scrapers/ukgc-domain-gap.md](https://github.com/igregulator/igregulator/blob/main/docs/scrapers/ukgc-domain-gap.md).

## 1.3.1 — 2026-04-28

- **MCP server verified live in production.** End-to-end smoke against `https://mcp.igregulator.io/mcp`: `tools/list` returns all 7 tools, `check_domain` resolves UKGC + AN matches with correct confidence hints, `api_request_log` rows tagged `source='mcp'`. DNS via Cloudflare proxy, TLS via the existing Origin CA cert (extended to cover the new subdomain). Pricing page lists MCP support on Starter onwards. Claude Code path documented at /docs/mcp via `claude mcp add --transport http`.

## 1.3.0 — 2026-04-28

- **MCP server live at `mcp.igregulator.io`.** Streamable-HTTP transport (current MCP spec, SSE for streaming). Seven tools exposed: `check_domain`, `search_operators`, `get_operator`, `list_jurisdictions`, `get_jurisdiction`, `get_license`, `get_license_history`. Bearer-token auth — same API keys as the direct HTTP API; tool calls forward to api.igregulator.io and count against your existing per-key quota. Setup walkthrough at [/docs/mcp](/docs/mcp/). Discovery manifests at `/.well-known/mcp.json` on all three iGregulator surfaces.
- **`api_request_log.source` column added.** New requests are tagged `http` or `mcp` so admin analytics can split MCP usage from direct-HTTP usage. No client-visible change.

## 1.2.0 — 2026-04-28

- **Anjouan jurisdiction live.** 5th regulator: Anjouan Gaming Authority (AGA), ~1,275 licences ingested daily from [anjouangaming.com/license-register/](https://anjouangaming.com/license-register/). License-type vocabulary `B2C` / `B2B` / `White Labeling`. The fuzzy + domain-exact match flow on `/v1/check` includes AN automatically — no client-side change needed. Coverage table at [/docs/](/docs/) refreshed.
- **Marketing copy correction.** Hero / meta / llms.txt now say "Daily-refreshed … updated within 24 hours" instead of "Real-time." The data was never real-time; the new wording matches what scrapers actually deliver (cron 03:00–04:00 UTC, depending on jurisdiction).

## 1.1.0 — 2026-04-28

- **Flat field removal on `/v1/check`.** Top-level legacy keys (`licensed`, `jurisdiction`, `license_number`, `operator`, `status`, `expires_at`) **removed** along with the `Deprecation` / `Sunset` / `Link` headers. Read `match.*` instead. Removed ahead of the announced 2026-05-19 sunset because no customers are integrated against the flat shape yet.
- **Pre-launch surface gating.** `trial` keys are now restricted to `/v1/check` only (1,000/day per-key cap). Other authenticated endpoints return `402 payment_required` with `details.reason=endpoint_requires_paid_plan`. Behaviour lifts automatically once `PRELAUNCH_DAILY_CAP=0` (post-Stripe).
- **Webhook retention 7 → 30 days.** `webhook_deliveries` history now matches the `webhook_events` replay window. Existing rows live longer immediately; nothing to migrate.
- **Generic-label blocklist on `/v1/check` widened.** Now substring-match instead of exact-match — `bestcasino.com`, `casino-bonus.com` return `confidence: low` like `casino.com` already did. Licensed brands containing a generic keyword (`bet365.com`, `pokerstars.com`) still resolve via the domain hit before the gate runs.
- **OpenAPI spec** documents the per-jurisdiction `license_types` vocabulary inline so client-side enums can be coded against the audited values (UKGC `Remote`/`Non-Remote`/`Ancillary Remote`, MGA `Type 1-4`/`B2B`/`B2C`, CW `B2C`/`B2B`, KH `Interactive Gaming Permit`/`CSPA`).

## 1.0.0 — 2026-04-19

**Initial public docs release.**

- **/v1/check** response shape: `{ query, match, alternatives, confidence }`. `match.confidence` ∈ `high | medium | low`, `match_type` ∈ `domain_exact | trading_name_fuzzy | name_similarity`, `domain_association` ∈ `direct | white_label | null`.
- **Public endpoints**: `/v1/check`, `/v1/jurisdictions`, `/v1/operators/search`. Rate-limited 10 req / IP / hour.
- **White Label ingestion** live for UKGC — domains with `Status = 'White Label'` in the UKGC register now load with `association = 'white_label'` on the domain row.
- **OpenAPI 3.1 spec** available at [api.igregulator.io/openapi.json](https://api.igregulator.io/openapi.json) (canonical source; Scalar + starlight-openapi both consume it). The old Swagger UI on `api.igregulator.io/docs` now 301-redirects to [/docs/api/](/docs/api/) — unified docs at [/docs](/docs/) supersede it.

## 0.3.0 — 2026-04-17

- **Regulatory actions** surfaced on the operator detail page. Cross-jurisdiction feed: UKGC Public Register, MGA Decisions, CGA Warnings.
- New endpoint: `/v1/operators/:slug/regulatory-actions` (authenticated).

## 0.2.0 — 2026-04-08

- Kahnawake Gaming Commission jurisdiction added.
- Curaçao scraper migrated to the post-LOK OGL PDF source.
- Licence category harmonisation: `remote | non-remote | ancillary | permit | other`.

## 0.1.0 — 2026-03-28

- First operational release. UKGC-only coverage.
- Core schema: `operators`, `licenses`, `domains`, `jurisdictions`.
- Dashboard lives at `app.igregulator.io`.

---