Methodology & Data Transparency

What KeyAudit indexes

KeyAudit is a public-facing index of 60M+ leaked wallet records across 36 blockchains, aggregated from 31 publicly credited data sources. Every record is tagged with the chain it was derived from, the source it came from, and a confidence tier that reflects how the leak was discovered.

The index is not a breach corpus in the credential-stuffing sense — most entries are addresses with derivable private keys, not stolen account credentials. The provenance breaks down roughly as: confirmed on-chain theft incidents, OFAC and similar sanctions lists, academic brain-wallet research datasets, community-curated scam address lists, and dictionary-derived theoretical matches.

Confidence tiers

Every record sits on a five-tier confidence ladder. The tier determines how much weight to put on a positive match.

confirmed_stolen — the address is documented as the destination or origin of an on-chain theft (rug pulls, exchange hacks, bridge exploits with public post-mortems).
sanctioned — listed by OFAC, EU, UK, or similar authorities. A match here is a hard legal-risk signal, not just a security one.
academic_dataset — extracted from peer-reviewed brain-wallet or weak-key research (e.g. Vasek et al., Trezor analyses). The address derived from a key the researchers showed to be guessable.
community_curated — phishing-tracker lists (ScamSniffer, Chainabuse), CryptoScamDB, and similar volunteer-maintained corpora. Higher coverage, lower individual verification.
dict_derived — addresses computed from common wordlists, leaked password dumps, and brain-wallet seed candidates. A hit here is theoretical — it means the input parses to an address an attacker could trivially derive, not that funds were actually stolen.

How a query is processed

The leak checker accepts three input shapes: a raw public address, a BIP-39 mnemonic, or a private key (hex or WIF). For mnemonics and keys, the input is hashed in your browser via SubtleCrypto.digest('SHA-256', ...) before any network request leaves your device. The server only sees a 32-byte hash.

The hash is checked first against an in-memory Bloom filter for O(1) rejection of non-matches, then against a MySQL index on address_hash for exact confirmation. No plaintext seed or key is ever transmitted, logged, or persisted server-side.

Data sources

Every entry in KeyAudit links back to its public source. The full list is at /en/source with per-source coverage statistics. Major contributors include the CryptoScamDB phishing corpus, OFAC SDN list, ScamSniffer indicator feeds, Vasek et al.'s 2014 brain-wallet study, and SecLists' top-1000 password dump derivations. We index nothing that isn't already public.

Limits and what we do not claim

A dict_derived hit is not evidence of theft. It means the input you queried derives to an address an attacker could trivially recompute from a common wordlist or password dump. If your wallet shows a dict_derived match, the prudent response is to migrate funds to a hardware-wallet-generated BIP-39 seed — but no third party necessarily knows your specific phrase.

Conversely, a clean lookup is not a guarantee of safety. Targeted attacks (SIM swaps, malware key exfiltration, supply-chain compromise) leave no trace in dictionary or research datasets. KeyAudit catches commodity-grade key compromise, not bespoke ones.

We do not run on-chain transaction-graph analysis (Chainalysis territory). We do not track wallet activity over time. We do not deanonymize. Every dataset we index is already public.

Update cadence

The index is refreshed on a rolling basis as upstream sources publish updates: sanctions lists weekly, scam-address feeds daily, academic corpora when new research lands. Aggregate statistics on /en/stats are recomputed every six hours from the live database.