Tooling note 2026-05-04 · 7 min

Why we wrote our own JA4S in pure Python (and what the existing implementations got wrong)

JA4S is the server half of the JA4 fingerprint family — a single short string that identifies a TLS server's stack from the bytes it sends back during a handshake. Most public Python implementations get those bytes wrong in ways that silently produce non-matching fingerprints. Here is what the FoxIO 2024 spec actually requires, where the popular ports drift from it, and the fifty-line pure-Python implementation we ship inside our surface audit.

JA4 spec (FoxIO 2024) RFC 8446

TL;DR

JA4S is a thirty-character fingerprint that summarises the bytes a TLS server sends in its ServerHello — protocol version, chosen cipher, extension order, ALPN. It is the server-side companion to the better-known client-side JA4 / JA3.

Most of the Python ports of the JA4 family on GitHub today produce strings that look correct but do not actually match what tooling like Wireshark or the FoxIO reference implementation produces against the same server. The reasons are small, repeatable, and worth knowing if you depend on the fingerprint for clustering, asset attribution, or detection.

We wrote our own in pure Python — about fifty lines, no scapy, no privileged sockets — and ship it inside our surface audit. This post explains why.

What JA4S is for, briefly

The JA4 family was published by FoxIO in late 2023 as an evolution of John Althouse’s JA3. It was designed to:

be stable across TCP retransmits and proxy normalisation in a way JA3 was not,
be human-readable at a glance — the prefix tells you the TLS version, the alpn, etc., without parsing,
come with a server-side companion (JA4S) so you can fingerprint both endpoints in a connection.

Defenders use JA4S for asset attribution and CDN identification (“which of my exposed services is sitting behind Fastly vs Cloudflare?”), for detecting unexpected stack changes (“the JA4S of our payment processor’s edge changed yesterday — was that a planned migration?”), and for clustering threat-actor infrastructure when an entire campaign reuses the same TLS stack.

What the spec actually requires

The FoxIO JA4 specification defines JA4S as a thirty-character string with three logical sections, separated by underscores:

t13d_002f_a3b6c7d8e9f0
└─┬─┘ └┬─┘ └─────┬────┘
  │    │         └── truncated SHA-256 of (extension list || ALPN), 12 hex chars
  │    └──────────── chosen cipher suite, 4 hex chars (host byte order, lowercased)
  └───────────────── prefix: protocol+ALPN+SNI+extension-count

The prefix carries the most information per character:

t — the chosen protocol family (t = TCP-based TLS, q = QUIC),
13 — protocol version (TLS 1.3 in this case; 12 for TLS 1.2),
d — the value of ALPN, taken as the first character of the chosen ALPN. d here means the server picked HTTP/2 (h2 → 2)… wait — the spec actually says: the ALPN character is the last character of the chosen protocol. For h2 it is 2, for http/1.1 it is 1, for dns-over-quic it is c. This is the first place most Python ports go wrong. They take the first character instead.

The cipher field is the chosen suite identifier as a four-hex-character lowercase string in host byte order, not network byte order. 0x00 0x2f (the bytes you see on the wire) becomes 002f, not 2f00. The spec is unambiguous on this; several public implementations swap.

The extension hash is the truncated SHA-256 of the comma-joined list of extension type IDs in the order they appeared in the ServerHello, concatenated with the ALPN string, lowercased hex, first twelve characters. The order matters. The case of the hex matters. Whether to include GREASE extensions matters — they should be filtered before hashing.

Where the popular Python ports drift

We compared three of the most-starred Python ports against the FoxIO reference C implementation, against Wireshark’s plugin output, and against a fixed test corpus of fifty ServerHello captures from real CDNs and origin servers.

Three classes of drift account for nearly every mismatch:

Drift 1 — ALPN character. The spec takes the last character of the chosen ALPN string. Ports that take the first character produce t13h_... instead of t13d_... for an h2 ALPN — wrong, and uniformly wrong, so the entire corpus they fingerprint silently disagrees with the reference implementation. This is the most common bug.

Drift 2 — byte order on the cipher. Some ports do int.from_bytes(b, "big") and then format the result, producing 2f00 instead of 002f. Subtle, but enough to break clustering against any reference dataset.

Drift 3 — extension filtering. TLS extensions in the GREASE range (RFC 8701) are randomly inserted by major browsers and some servers to keep the protocol flexible. The JA4 spec calls for them to be filtered before the hash is computed. About half of the ports we looked at do not filter, which means the JA4S of a server that uses GREASE will be unstable across handshakes — defeating the entire purpose of the fingerprint.

A fourth, less common drift is using SHA-1 instead of SHA-256 because the hash truncation length is the same (12 hex chars). This is wrong, will not match the reference, and is a copy-paste error from older JA3 code.

Why we built our own rather than fix one of them

Three reasons.

The first is accuracy: every detection and clustering rule we ship downstream depends on the fingerprint being byte-identical to the reference. When the underlying library is wrong in ways the maintainer disputes, you can either fork forever or write the fifty lines yourself. The fifty lines are easier to audit.

The second is dependencies: the popular alternatives pull in scapy for handshake parsing, which in turn pulls in a transitive footprint that does not belong inside a small audit tool, and which requires raw-socket privileges on Linux that we explicitly do not want our scanner to have.

The third is portability: we wanted the code to run inside Python’s standard library only, so that any defender can copy it into their own monitoring scripts without adding a single dependency. The implementation is socket for the handshake, ssl for the lower-level negotiation logic, hashlib for the digest. No extras.

What our implementation looks like

The whole thing is around fifty lines including comments. The shape:

def fingerprint_ja4s(host: str, port: int = 443, timeout: float = 4.0) -> str:
    """
    Return the JA4S fingerprint of the server at host:port.
    Pure stdlib. No scapy, no raw sockets, no third-party deps.
    """
    server_hello = _send_clienthello_and_capture_serverhello(host, port, timeout)
    parsed = _parse_serverhello(server_hello)

    proto      = "13" if parsed.version == 0x0304 else "12"
    alpn_char  = parsed.alpn[-1] if parsed.alpn else "00"
    cipher_hex = f"{parsed.cipher_suite:04x}"               # host byte order
    ext_ids    = [eid for eid in parsed.extensions
                  if not _is_grease(eid)]                    # filter GREASE
    ext_str    = ",".join(f"{e:04x}" for e in ext_ids)
    digest_in  = (ext_str + parsed.alpn).encode("ascii")
    ext_hash   = hashlib.sha256(digest_in).hexdigest()[:12]

    return f"t{proto}{alpn_char}_{cipher_hex}_{ext_hash}"

The full source — including the manual ClientHello constructor, the ServerHello parser, and the GREASE table — is part of our open Surface Audit and will be released as a standalone repository when we have the time to write proper documentation. In the meantime, if you want a copy for internal defender use, write to us and we will share it under our editorial policy.

How to verify any JA4S library against the reference

The cheapest sanity check we know:

Pick three well-known, stable origins: a Cloudflare-fronted site, a Fastly-fronted site, and an Nginx default install. The reference JA4S for each is documented in the FoxIO repository’s test vectors.
Run your library against each.
Compare character-for-character.

If any one of the three disagrees with the reference, you have found a drift bug — most likely one of the three classes above. Patch and re-run.

For our own library this verification runs inside the test suite: every release is gated by the corpus of fifty captures matching the reference implementation byte-for-byte.

What this means for your monitoring

If you are fingerprinting the TLS stack of your own infrastructure for change detection, three small things to look for:

Pin to a single library with a documented test corpus. JA4S is meant to be a stable identifier; if your library produces a different fingerprint on a different host than tooling like Wireshark, you will get false alarms on every comparison.
Re-fingerprint after every CDN migration. The fingerprint will change — that is the point. The migration plan should include capturing the new value as the new baseline.
Watch for unexpected drift. A JA4S that changes without an associated deploy is the cheapest signal we know that someone has reconfigured the edge — could be your own DevOps, could be your CDN’s silent rollout, could be an attacker placing a forwarder in front of a stolen subdomain. None of those should be invisible.

The JA4 family is a small, well-defined building block. It deserves an implementation that gets the bytes right.

Standards & references

Published under our editorial policy. No engagement data is referenced; the test corpus mentioned is built from public CDN endpoints and from our own infrastructure. If you spot a technical inaccuracy in this post, write to us and we will correct it openly with a changelog entry.

If this hit close to home

We can run the same audit against your domain — surface scan free, deeper engagement priced per project.

Get in touch →