This page is my digression into crypto trading and fraud detection - both of which are Thoroughly Modern problems. If, however, you just want to skip to the fun, animated part...
TL;DR this button will skip to the filmstrip...
Science Series #37 Crypto-Clustering Visualisations
Understanding the language is the first step to understanding the data. Since blockchain and crypto are relatively new, I feel it's important to get our vocab straight. Admittedly, this is also in some part for my wife and kids so when we speak about things like Fungible Tokens we're all on the same page. Here are some vocab essentials.
A digital record book shared across many computers. Once a page is filled and added to the book, it cannot be torn out or changed, making the history permanent and secure.
The public "account book" for a cryptocurrency. It records every transaction ever made, allowing anyone to verify the history of a coin without needing a bank.
A digital tool that stores the "keys" (passwords) needed to access your crypto on the blockchain. It doesn't hold the coins themselves, but proves you own them.
A string of letters and numbers (like a username or email) that represents a specific destination for crypto funds. It is safe to share publicly.
On-chain transactions are recorded directly on the public blockchain. Off-chain transactions happen outside the blockchain (like an IOU between friends) and are not immediately visible to the public.
Something that can be exchanged one-for-one with another identical item. One dollar bill is worth exactly the same as another dollar bill.
A unique digital asset that cannot be replaced. Unlike a dollar, a specific digital artwork or trading card is one-of-a-kind and cannot be swapped evenly.
A self-running program stored on the blockchain. It automatically executes actions (like sending money) when specific conditions are met, without needing a lawyer or middleman.
A small payment made to the network to process a transaction. Think of it like a toll fee you pay to drive on a highway; it pays the computers that keep the network running.
The process where powerful computers solve complex math problems to verify transactions. As a reward for this work, the computer earns new coins.
Breaking a large amount of money into smaller chunks to move it unnoticed. Criminals do this to avoid triggering "suspicious activity" reports that banks or exchanges are required to file.
Software programs that automatically buy and sell assets based on pre-set rules. They act faster than humans and are often used for high-frequency trading.
Moving funds specifically to platforms where trading is active and it is easy to cash out. It helps hide illegal funds in the massive flow of legitimate money.
A secret password (like a long PIN) that proves you own your crypto wallet. If you lose this, you lose access to your money forever.
Financial services (like loans, trading, or savings) built on blockchains without banks. They run automatically on code and are open to anyone with an internet connection.
A unit of value created on an existing blockchain. Unlike a coin (which has its own network), a token lives on a network like Ethereum.
The process businesses use to verify the identity of their clients. It is a standard check to prevent money laundering and ensure users are who they say they are.
I might define these in the future: Mixer/Tumbler, Atomic Swap, Bridge, Governance Token, Stablecoin, Hash Rate, 51% Attack
Finding illicit activity in a sea of transactions is similar to how streaming services recommend movies. It is fundamentally a similarity problem. If two wallets interact with each other far more often than random chance would suggest, they are likely coordinating their behavior.
Our approach builds a Co-occurrence Matrix. Imagine a giant grid where every row and column is a wallet address. If Wallet A sends money to Wallet B, we mark a "1" in that cell. Over time, these numbers grow, revealing hidden networks.
Wallet Addresses
+-----------+-----------+-----------+-----------+
| | Wallet A | Wallet B | Wallet C |
+-----------+-----------+-----------+-----------+
| Wallet A | 0 | 4 | 1 |
+-----------+-----------+-----------+-----------+
| Wallet B | 4 | 0 | 22 | <- High Score
+-----------+-----------+-----------+-----------+
| Wallet C | 1 | 22 | 0 |
+-----------+-----------+-----------+-----------+
Interpretation:
- Wallets A and B interacted 4 times (Suspiciously High?)
- Wallets A and C interacted only 1 time (Likely Random)
- Wallets B and C interacted 22 times (Curiously High)
In the visualization on the next page, we a version this matrix we built earlier, but in principle it could be built in real-time. We group transactions into "time buckets" (e.g., 1 hour windows). If two wallets appear in the same bucket repeatedly, the edge between is weighted higher and shown as thicker.
To find potential shenanigans I implemented a "Time-Window Clustering" algorithm ovwer that matrix. Instead of just looking at who sent money to whom, we look at when they transacted. This helps distinguish legitimate high-volume traders and intentional coordinated structuring.
function calculateClusters(timeWindowMinutes, minThreshold) {
// 1. Create Time Buckets
// Divide the timeline into chunks (e.g., 45 mins)
const buckets = {};
rawTransactions.forEach(tx => {
const bucketID = Math.floor(tx.timeStamp / timeWindowMinutes);
// Group addresses that were active at the same time
if (!buckets[bucketID]) buckets[bucketID] = new Set();
buckets[bucketID].add(tx.from);
});
// 2. Build Adjacency Matrix (Pairwise Counting)
// Count how often two wallets appear in the same bucket
const matrix = {};
Object.values(buckets).forEach(bucket => {
// Iterate through every pair of addresses in the bucket
// If they are both there, increment their "closeness" score
countCoOccurrences(bucket, matrix);
});
// 3. Apply Threshold
// Only keep connections that happen more than 'minThreshold' times
// This filters out noise and random one-off transactions
const edgeList = filterByThreshold(matrix, minThreshold);
return { nodes: nodeList, edges: edgeList };
}
By adjusting the Time Window and Threshold sliders on the next page, you can tune the sensitivity. A small time window catches rapid-fire bot activity, while a larger window reveals long-term structuring schemes. I find stuff like this so cool, I dedicated a whole page to it!
Once we have identified a cluster of wallets, the next challenge is determining intent. How do we differentiate between a high-frequency trading bot and a "smurf" (a person hired to structure illicit funds)? Both move fast and both move often, but they leave different digital footprints.
Bots: Operate on precise math. They trade odd amounts like 1.342 ETH.
Smurfs: Operate on human logic. They prefer round numbers or amounts just under reporting limits (e.g., $9,900).
Bots: Provide liquidity to the market. They interact with thousands of random wallets.
Smurfs: Work in closed loops. They primarily interact with a small, specific group of wallets (a cluster) to move money in a circle.
Bots: React to market signals. They are erratic and execute in milliseconds.
Smurfs: Follow a schedule. Transactions often appear at regular intervals (e.g., within 5 minutes, or every 45 minutes exactly) to avoid detection.
By applying these heuristics to the clusters found in our matrix, we can programmatically flag suspicious behavior for further investigation, separating legitimate "liquidity chasing" from illicit structuring.
Experiment with time windows and thresholds to identify clusters.