Crypto Trading and Wallet Fraud

This page is my digression into crypto trading and fraud detection - both of which are Thoroughly Modern problems. If, however, you just want to skip to the fun, animated part...

TL;DR this button will skip to the filmstrip...
Science Series #37 Crypto-Clustering Visualisations

Key Terminology

Understanding the language is the first step to understanding the data. Since blockchain and crypto are relatively new, I feel it's important to get our vocab straight. Admittedly, this is also in some part for my wife and kids so when we speak about things like Fungible Tokens we're all on the same page. Here are some vocab essentials.

Blockchain

A digital record book shared across many computers. Once a page is filled and added to the book, it cannot be torn out or changed, making the history permanent and secure.

Public Ledger

The public "account book" for a cryptocurrency. It records every transaction ever made, allowing anyone to verify the history of a coin without needing a bank.

Wallet

A digital tool that stores the "keys" (passwords) needed to access your crypto on the blockchain. It doesn't hold the coins themselves, but proves you own them.

Address

A string of letters and numbers (like a username or email) that represents a specific destination for crypto funds. It is safe to share publicly.

On-Chain / Off-Chain

On-chain transactions are recorded directly on the public blockchain. Off-chain transactions happen outside the blockchain (like an IOU between friends) and are not immediately visible to the public.

Fungible

Something that can be exchanged one-for-one with another identical item. One dollar bill is worth exactly the same as another dollar bill.

Non-Fungible Token (NFT)

A unique digital asset that cannot be replaced. Unlike a dollar, a specific digital artwork or trading card is one-of-a-kind and cannot be swapped evenly.

Smart Contract

A self-running program stored on the blockchain. It automatically executes actions (like sending money) when specific conditions are met, without needing a lawyer or middleman.

Gas Fee

A small payment made to the network to process a transaction. Think of it like a toll fee you pay to drive on a highway; it pays the computers that keep the network running.

Mining

The process where powerful computers solve complex math problems to verify transactions. As a reward for this work, the computer earns new coins.

Smurfing (Structuring)

Breaking a large amount of money into smaller chunks to move it unnoticed. Criminals do this to avoid triggering "suspicious activity" reports that banks or exchanges are required to file.

Robot Trading (Bots)

Software programs that automatically buy and sell assets based on pre-set rules. They act faster than humans and are often used for high-frequency trading.

Liquidity Chasing

Moving funds specifically to platforms where trading is active and it is easy to cash out. It helps hide illegal funds in the massive flow of legitimate money.

Private Key

A secret password (like a long PIN) that proves you own your crypto wallet. If you lose this, you lose access to your money forever.

DeFi (Decentralized Finance)

Financial services (like loans, trading, or savings) built on blockchains without banks. They run automatically on code and are open to anyone with an internet connection.

Token

A unit of value created on an existing blockchain. Unlike a coin (which has its own network), a token lives on a network like Ethereum.

KYC (Know Your Customer)

The process businesses use to verify the identity of their clients. It is a standard check to prevent money laundering and ensure users are who they say they are.

I might define these in the future: Mixer/Tumbler, Atomic Swap, Bridge, Governance Token, Stablecoin, Hash Rate, 51% Attack

The Algorithm

Finding illicit activity in a sea of transactions is similar to how streaming services recommend movies. It is fundamentally a similarity problem. If two wallets interact with each other far more often than random chance would suggest, they are likely coordinating their behavior.

Our approach builds a Co-occurrence Matrix. Imagine a giant grid where every row and column is a wallet address. If Wallet A sends money to Wallet B, we mark a "1" in that cell. Over time, these numbers grow, revealing hidden networks.

The Matrix Concept


          Wallet Addresses
          +-----------+-----------+-----------+-----------+
          |           |  Wallet A |  Wallet B |  Wallet C |
          +-----------+-----------+-----------+-----------+
          | Wallet A  |     0     |     4     |     1     |
          +-----------+-----------+-----------+-----------+
          | Wallet B  |     4     |     0     |    22     |  <- High Score
          +-----------+-----------+-----------+-----------+
          | Wallet C  |     1     |    22     |     0     |
          +-----------+-----------+-----------+-----------+
          
          Interpretation:
          - Wallets A and B interacted 4 times (Suspiciously High?)
          - Wallets A and C interacted only 1 time (Likely Random)
          - Wallets B and C interacted 22 times (Curiously High)
        

In the visualization on the next page, we a version this matrix we built earlier, but in principle it could be built in real-time. We group transactions into "time buckets" (e.g., 1 hour windows). If two wallets appear in the same bucket repeatedly, the edge between is weighted higher and shown as thicker.

The Core Logic - in Pseudocode

To find potential shenanigans I implemented a "Time-Window Clustering" algorithm ovwer that matrix. Instead of just looking at who sent money to whom, we look at when they transacted. This helps distinguish legitimate high-volume traders and intentional coordinated structuring.


  function calculateClusters(timeWindowMinutes, minThreshold) {
    // 1. Create Time Buckets
    //    Divide the timeline into chunks (e.g., 45 mins)
    const buckets = {};
    rawTransactions.forEach(tx => {
      const bucketID = Math.floor(tx.timeStamp / timeWindowMinutes);
      // Group addresses that were active at the same time
      if (!buckets[bucketID]) buckets[bucketID] = new Set();
      buckets[bucketID].add(tx.from);
    });

    // 2. Build Adjacency Matrix (Pairwise Counting)
    //    Count how often two wallets appear in the same bucket
    const matrix = {};
    Object.values(buckets).forEach(bucket => {
      // Iterate through every pair of addresses in the bucket
      // If they are both there, increment their "closeness" score
      countCoOccurrences(bucket, matrix); 
    });

    // 3. Apply Threshold
    //    Only keep connections that happen more than 'minThreshold' times
    //    This filters out noise and random one-off transactions
    const edgeList = filterByThreshold(matrix, minThreshold);

    return { nodes: nodeList, edges: edgeList };
  }
        

By adjusting the Time Window and Threshold sliders on the next page, you can tune the sensitivity. A small time window catches rapid-fire bot activity, while a larger window reveals long-term structuring schemes. I find stuff like this so cool, I dedicated a whole page to it!

Heuristics & Analysis

Once we have identified a cluster of wallets, the next challenge is determining intent. How do we differentiate between a high-frequency trading bot and a "smurf" (a person hired to structure illicit funds)? Both move fast and both move often, but they leave different digital footprints.

1. The Round Number Test

Bots: Operate on precise math. They trade odd amounts like 1.342 ETH.

Smurfs: Operate on human logic. They prefer round numbers or amounts just under reporting limits (e.g., $9,900).

2. The Inner Circle

Bots: Provide liquidity to the market. They interact with thousands of random wallets.

Smurfs: Work in closed loops. They primarily interact with a small, specific group of wallets (a cluster) to move money in a circle.

3. The Rhythm of Time

Bots: React to market signals. They are erratic and execute in milliseconds.

Smurfs: Follow a schedule. Transactions often appear at regular intervals (e.g., within 5 minutes, or every 45 minutes exactly) to avoid detection.

By applying these heuristics to the clusters found in our matrix, we can programmatically flag suspicious behavior for further investigation, separating legitimate "liquidity chasing" from illicit structuring.

Launch Visualization

Experiment with time windows and thresholds to identify clusters.