Skip to content

Confidence scoring

The /v1/check endpoint returns a match object with a confidence field. This page explains what each level means, how we pick it, and how UIs should render it.

confidenceWhat it meansRender as
highExact or root-domain match in our authoritative registry.Green check. Safe to say “this site is licensed by X”.
mediumDomain root matched a trading name or operator name. We can identify the operator but can’t prove this domain is theirs.Amber / neutral. “Likely operated by X” phrasing.
lowA weak fuzzy match (operator returned, but below the strong-similarity bar), or the domain root is a generic gambling term (casino.com, poker.com) where too many operators share the label to pick one.Gray / warning. “We can’t confirm this domain.”

A fourth value — none — appears in the top-level confidence field (not match.confidence) when match is null.

When match is null, low/none on its own is ambiguous — it used to conflate “generic term” with “we checked and it isn’t there”. So on a miss the response carries two extra fields (present only when match is null) so you can phrase the answer precisely instead of guessing:

match_absence_reasonMeaningSay to your user
generic_termThe label is an ultra-generic gambling word (casino.com); we can’t map it to one operator.”Can’t identify a specific operator from this domain.”
no_record_foundA specific query we checked against every covered register and did not find.”Not found in any of the N jurisdictions iGregulator covers.” — never an unqualified “unlicensed”.

checked_jurisdictions accompanies the miss — the exact register codes we checked (e.g. ["AN","CW","KH","MGA","TGC","UKGC"]) — so a “not licensed” claim is always scoped to our coverage, never stated as an absolute.

{
"query": { "domain": "some-unknown-site.com" },
"match": null,
"confidence": "none",
"match_absence_reason": "no_record_found",
"checked_jurisdictions": ["AN", "CW", "KH", "MGA", "TGC", "UKGC"]
}

These fields are absent when there’s a match — check match === null first, then branch on match_absence_reason.

Tells you how we arrived at the match, useful for debugging and UX differentiation.

match_typeSource
domain_exactFound the domain in the domains table — sourced from the regulator’s official register. Carries domain_association (direct or white_label).
trading_name_fuzzyTrigram similarity ≥ 0.55 against operators.trading_names[] after stripping the TLD. Used when the domain isn’t registered but the brand exists. 0.55 was picked empirically against the UKGC register: it catches legitimate variants (paddypowerpaddy-power, skybetsky-bet) while rejecting the long tail of single-syllable collisions (gold, star, royal) where the label is too generic to mean one operator. Below 0.55 we land in low-confidence territory either way; above it the trigger is stable.
name_similarityLast-chance similarity against operators.display_name — rarely fires for B2C domains, useful when no trading name was populated upstream.

When match_type = domain_exact, we differentiate:

  • direct — the licensee runs the site themselves. The operator field is the company your end-user is gambling with.
  • white_label — the licensee has authorised a third-party brand to trade on the domain under their permit. The operator field is the licensee, not the brand. UK-licensed white-label arrangements are legal and common; surfacing the relationship lets you show “operated by Brand X under ProgressPlay’s UKGC permit”.

Fuzzy matches (trading_name_fuzzy, name_similarity) don’t populate domain_association — we don’t have a domain row to read it from, so the field is null.

Brand names like Paddy Power trigram-match several sister companies (PPB Counterparty, PPB Entertainment, PPB GE, Power Leisure Bookmakers) at similarity 1.0. To keep the primary match stable across DB reindex and VACUUM, /v1/check applies a documented tiebreaker cascade whenever the top candidates are tied on similarity:

  1. similarity DESC — closeness wins first, as ever.
  2. has_active DESC — operators with at least one active licence are preferred over operators whose licences are all expired / revoked.
  3. oldest_active_issued ASC — among active-licence candidates, the one whose oldest active licence issued first wins. Stability signal: a parent entity that has been licensed longest is the most useful “who actually runs this brand” answer.
  4. total_licenses DESC — more licences across the register → more likely a parent entity rather than a single-purpose subsidiary.
  5. operator_slug ASC — lexicographic final fallback. Always deterministic even when every previous rank is tied.

Clients that cache domain → operator mappings can rely on the primary result remaining stable between index rebuilds; any change in primary reflects a change in the underlying registry data, not PG query randomness.

Up to 3 runner-up candidates, sorted by similarity descending.

  • On confidence: medium, these are operators with the same or similar trading name that we ranked below the primary match.
  • On confidence: low (generic label), this is always [] — we refuse to guess when the label is ambiguous.
  • On confidence: high / none, also [].

Without it, GET /v1/check?domain=casino.com would return “Casino MK Limited” with confidence: medium — deterministically, because “casino” matches that trading name at similarity 1.0. But the actual casino.com is licensed elsewhere (MGA, Gibraltar) which we don’t cover, and the answer “yes, Casino MK owns it” would be wrong.

The blocklist is a substring regex over the normalised label — casino, poker, bingo, gambling, bet, slot/slots, sportsbook, roulette, blackjack, wager, lottery, gaming. Matches anywhere in the label, so casino.com, bestcasino.com, and casino-bonus.com all return confidence: low with empty alternatives[]. Licensed brands containing one of these keywords (bet365.com, pokerstars.com) still resolve to high because the domain-exact match runs before the generic gate.