← Back to blog

April 19, 2026

Introducing Cadence.

How I learned to train an NFL transformer in my high school latin class.

Find the verb.

My name is Liam Hyde. I'm a Cal Poly CS grad - now MSQE candidate - and I didn't know it at the time but the class that first made the mechanics of transformers click for me was high school Latin.

My Latin teacher's refrain, through every translation we ever did, was find the verb. He hammered it into us over and over. He didn't particularly care whether we had every word translated correctly, as long as the structure of the sentence was right. You could leave a raw Latin word inside your English translation and still get most of the credit, provided the English around it was properly shaped.

Same move, different substrate.

That Latin move, find the verb, is the same move a transformer does when it reads a sentence.

Take "the quick brown fox jumped over the lazy dog." Ask a transformer what "dog" means in that sentence, and the first thing it has to decide is which other words are relevant to pinning down what "dog" is doing here. The article "the" in front of it matters; "lazy" sits right next to it and modifies it; "over" governs the prepositional phrase "dog" sits inside; "jumped" is the verb those prepositions hang off. That weighted lookup, every token asking every other token how much they matter for this particular computation, is called cross-attention. The name is less important than the move, which is: find the verb, then let everything else resolve against it.

Thequickbrownfoxjumpedoverthelazydog

Attention · dog reads every other word

Try a harder sentence. "The bank refused the loan because it was insolvent." Ask what "it" refers to. "It" by itself could point anywhere, but the transformer looks across the sentence, sees "bank" and "loan" as candidate antecedents, weights "insolvent" heavily, and resolves, because banks can be insolvent in a way loans cannot, and the weights that let that resolution land are the same shape of weights that pinned down "dog" in the fox sentence.

Trained on enough text, the model's weights settle into a configuration that does that operation well on inputs it has never seen before. My Latin teacher was drilling the same operation by hand.

A case ending on a Latin noun tells you that noun's role relative to the verb; an attention weight on a token tells you that token's relevance relative to whatever the computation is currently trying to resolve. The Romans happened to encode the mechanism in suffixes and a transformer encodes it in weights, but both are answering the same question: which of the other words in this sentence do I depend on, and how much.

Why it generalizes.

People call language models "just autocomplete" or "a next-token predictor," which is literally what they output and also the least interesting possible read of what's happening.

Call the output autocomplete if you want, but the computation that produces it is a relationship engine. Autocomplete suggests the model is looking up the next word in some giant memorized table, which isn't what's going on at all; the model is running structural computation over the whole sentence, every token against every other token, with weights learned from the relationships that actually hold in natural language.

Structural computation generalizes in a way lookup does not. A memorized table only answers questions about the exact data that built it, while a relationship engine answers questions about any new input it's handed as long as the input has the same shape. A sentence has that shape, and so does an image, a chunk of code, a protein, or a stack of scouting cards for a football team.

A team is a stack of cards.

One card per starter on offense, defense, and special teams; one for the head coach; one for each coordinator; one for the cap situation; one for what the scheme actually looks like this year; one for where the team sits in the standings.

QB
Darnold
WR
Smith-Njigba
CB
Witherspoon
HC
Macdonald
OC
Fleury
DC
Durde
CAP
$303M
DRAFT
4 picks
Example · Seattle, 2026

The same attention math applies, unchanged. Every card reads every other card, weighted by whatever the transformer has learned matters for the question being asked.

Ask what the 2022 Rams were and the quarterback card dominates the read; Stafford was the load-bearing piece, and the structure routes through him even with Kupp and Donald in the stack. Ask what the 2017 Jaguars were and the pass rush and coverage cards take over, with Bortles barely moving the needle. The weights reallocate themselves for different team shapes.

Nobody told the model "quarterback matters more when it matters more." Nobody told it "pass rush dominates for certain defensive archetypes." It figured that out from ten seasons of actual teams and what happened to them, and the priorities came from the league rather than a rule sheet.

What comes out the other side is a short numeric summary, and two teams with similar summaries are similar teams. The site calls that summary a fingerprint, and the model that computes it is called Cadence. Every archetype label, radar chart, and "looks most like" call on the site is reading off the same fingerprint from a different angle.

PASSRUNRUSHCOV

Fingerprint · 4-axis projection

The fingerprint is not a single static read either. The same stack of cards, asked a different question, returns different weights. Ask the machinery "can this team run it back in January?" and the cap card, the quarterback card, and the pass-rush card do most of the work. Ask "is this team rebuilding?" and the draft capital and the age of the defensive core jump forward. Ask "does this scheme survive a coordinator change?" and the head coach card and the play-caller card reorder themselves.

A front office is a team of specialists.

A team is a stack of cards, but Cadence isn't a single reader holding all of them at once. Inside the model the work splits across a small roster of specialists — a cap guy who knows every restructure mechanic and void-year trap, a college scout whose whole job is pinning draft valuations against positional scarcity, a pro scout tracking the waiver wire and the trade deadline, a negotiator who models how players respond to offers under different franchise-tag pressures, a coaching consultant reasoning about scheme fit and HC-OC-DC chemistry. Below all of them sits a shared layer every decision runs through, the league rules and cap floor and roster limits that don't change whether you're drafting an edge or chasing a receiver in November.

Each specialist takes the lead when the question routes their way. Ask about a Mahomes restructure and the cap guy does the math; ask what to do with the Bears' top-ten pick and the college scout carries it, with the others still weighing in from the room. The literature calls this a mixture of experts, which inside a football front office is literally what it is.

A front office thinks before it acts.

Teams don't decide in one pass. A first-round pick goes through the draft room, the head coach, the GM, the owner, the capologist, and two rounds of whiteboarding before anyone hands a card to Goodell; a minimum-salary signing goes through a group text.

Cadence has the same gradient. Every decision runs through a recurrent loop that reads its own work in progress, consults the specialists, and reconsiders — sometimes once, sometimes up to eight times — before committing. Cheap calls halt early, and the consequential ones keep looping. The loops are not chain of thought written out in visible English; they're deliberation done in continuous weights, each pass sharpening the fingerprint before it gets emitted. The model has room to second-guess itself, and it uses that room when the stakes warrant it.

A franchise has a voice.

Not every front office makes decisions the same way. Belichick-era New England ran through Belichick; the current Colts let the GM drive personnel while the head coach handles the scheme; the Cowboys have Jerry. Authority routes differently inside different buildings, and the same move can be the right call in one org and a firing offense in another.

Cadence carries a per-franchise signal that shifts which specialist wins a tied vote. Same proposal, same cards, different decision, because the org voted differently, which is how it already works in real buildings.

So how is this useful?

A sharp human analyst watching film can hold maybe five to ten relationships in their head at a time, and a very sharp one can hold more, but nobody can hold every relationship across 32 rosters and 10 seasons while also tracking this week's shifts against historical comps, which is exactly what Cadence does.

When Cadence tells you the current version of a team looks most like a team from another era, that comparison didn't come from a human making a call; the fingerprint for the current team fell closest, in the space of all team shapes, to the fingerprint for that older one. The cards that drove the similarity, whatever they happen to be for the two teams involved, are the same cards that drove the similarity between that older team and the teams it looked most like in its own era. The comparison survives the math instead of depending on a narrative someone decided to believe.

It's the same read Latin class trained. Find the structural spine. Resolve everything else against it. The vocabulary is football, the verbs are roster decisions, the stakes are a Sunday instead of a translation exam, but the mental move is the same.

Going forward this is going to be a football blog. Some posts will be data-focused, some will be team takes, and there will be Seahawks propaganda. First real post this week: the case for the "low positional value" 2026 top 10.