Published 2026-05-19.

How sportsbook APIs actually work: an honest tour of the layer underneath

If you've shopped for a sports-betting data API in the last year, you've seen a lot of marketing about "real-time odds across every book." Very little of that marketing tells you what's actually happening underneath. This is the post that does. No NDAs, no hidden architecture; just the operational truth.

Sportsbooks don't publish a stable public API

The first thing to understand: there is no "Sportsbook API." DraftKings, FanDuel, Bet365, Pinnacle, every one of them, runs their own internal systems for traders, risk, and the consumer app. None of those systems is offered to the public as a documented, paid, contract-backed API.

What sportsbook aggregators (us, our competitors, every aggregator) work with is the public-facing surface those companies expose to their own consumer applications: the JSON endpoints their websites call, the WebSocket feeds their mobile apps subscribe to, the AJAX URLs you'd see if you opened Chrome DevTools' Network tab while browsing the sportsbook. These surfaces are not stable, not contractual, and the books can change them whenever they want.

The operational truth. Every aggregator runs a small army of parsers tailored to each book's current public surface. When a book changes their JSON shape, the parser breaks. The aggregator's quality is largely the speed at which they detect breakage and ship a fix.

What ingestion actually looks like

Per book, you typically have three flavors of public surface to choose from:

1. JSON REST endpoints

The most common. The bookmaker's website or mobile app makes a GET request to a URL like https://www.bovada.lv/services/sports/event/coupon/events/A/description/baseball/mlb. The response is a structured JSON object describing every event, market, and price for that sport.

An aggregator polls that endpoint on a cadence and parses the JSON into a normalized schema. Polling cadence ranges from sub-second (for high-volume sharp books) to 30+ seconds (for slow-moving DFS apps). The cadence is bounded by what the book's WAF tolerates without rate-limiting you, which we'll come back to.

2. WebSocket streams

Many bookmakers run their mobile/web frontend on top of a long-lived WebSocket connection to their backend. That connection emits price-change events as they happen, ~10-100ms after the trader desk moves the line. If you can attach to that WebSocket as the consumer app does (with the same handshake, same auth tokens, same subscribe messages), you get faster updates than any polling cadence.

The trade-off: WebSocket capture is much more fragile. The handshake usually involves a session token from the website's HTML, the subscribe messages need to be framed exactly right, and the price-change events come in a proprietary wire format that needs reverse-engineering. Every aggregator that claims "sub-1s freshness" has invested heavily in WebSocket capture infrastructure for the books that support it.

3. SPA capture (Single Page App, browser-rendered)

Some books don't expose prices as JSON anywhere. The prices are baked into the DOM after the React app renders. To get those prices, you literally run a headless Chromium browser, load the page, wait for the SPA to populate the DOM, and read the rendered prices out via Playwright or Puppeteer.

This is the most expensive flavor (1 Chromium per book per session, 200-300 MB RAM each) and the most fragile (the DOM structure changes any time the book pushes a new web build). Books we've seen require this: Caesars deep player props, DraftKings alt-line Unders.

Why aggregators care so much about egress IPs

Two reasons.

Reason 1: WAF and rate-limiting

Cloudflare, AWS WAF, and similar tools fingerprint clients by IP address, TLS fingerprint, request cadence, headers, and a dozen other signals. A bookmaker running Cloudflare in front of their consumer API can decide "this IP is making 100 requests per minute against a public endpoint that customers normally hit 1-2 times per minute; rate-limit them to 5 req/m." That's the friendly response. The unfriendly response is a 403 block.

Aggregators counter this by:

Polling slower than the WAF threshold.
Spreading polling across many IPs (residential proxies, a fleet of small VMs in different regions, ingest nodes on residential connections).
Using a TLS library that impersonates Chrome (curl_cffi rather than vanilla httpx, because the JA3 fingerprint matters).
Rotating User-Agent and Accept-Language headers to match what real browsers send.

Reason 2: Geofencing

US sportsbooks are licensed state-by-state. Caesars New Jersey serves Caesars from a different domain than Caesars Pennsylvania, and both refuse to serve customers outside their licensed states. If your aggregator's egress is in Germany (Hetzner), Singapore (DigitalOcean APAC), or anywhere else outside the US, you cannot reach US-state-specific sportsbook surfaces. The bookmaker's WAF returns a "not available in your region" page.

This is why serious aggregators run their ingest infrastructure from US-based IPs. The ones with comprehensive Caesars NJ coverage run a node in NJ. The ones with the full PA-only DraftKings surface run a node in PA. Aggregators with no US presence offer materially less coverage than they advertise.

What "real-time" actually means

"Real-time" is a marketing word. The honest version is per-source freshness, and it's bounded by:

The bookmaker's update cadence (how often they post new prices upstream).
The aggregator's polling cadence (how often they call the upstream).
Network latency.
The aggregator's processing time (parse + database write + WS broadcast).

For a sharp book like Pinnacle, the upstream updates every 1-3 seconds when the line is moving and stays quiet for tens of seconds between moves. A 1-second polling cadence captures the average move within 1.5 seconds end-to-end (random arrival within the polling window plus processing time). A 5-second polling cadence captures it within 5.5 seconds. There's no shortcut around this unless you're on the WebSocket path.

If an aggregator advertises sub-1-second freshness across the board, ask them which books and how. The honest answer is "Pinnacle via WebSocket where available; Bovada via 1-second REST polling from a fleet of regional nodes; FanDuel via mobile API at 2-3 seconds; DraftKings depends on the day." If the answer is "all books, all sources, sub-1s," they're stretching.

The honest question to ask any aggregator

Before signing up, ask:

Show me a public endpoint that returns the age in seconds of each source's most recent write, right now. If they can't, they don't measure their own freshness and you can't either. (Ours: /v1/meta/source-quality.)
When a book quotes only one side of a prop, what do you return for the missing side? Right answer: null. Wrong answer: any form of inference. (Our policy: null, always.)
Where do you run your ingest nodes? If they're single-region, ask about geofenced books and watch them stretch.
Where is your changelog? Every parser change should be public. (Ours: /changelog, updated within minutes of each deploy.)
What happens when one of your sources breaks? If the answer involves "alerting" or "we get notified," that's reactive. The honest answer involves circuit breakers, automatic fallback paths, and a published incident timeline.

What we publish

Because this is our space, here's the receipts on what ParlayAPI publishes that addresses each of the questions above:

/v1/meta/source-quality: live per-source SLA classification.
/v1/meta/per-book-sla: the threshold map.
/v1/status/history: trailing 24h-7d uptime per source.
/changelog: every parser change.
/v1/meta/parser-coverage: actual market_keys observed per book per sport.
/docs/runbooks: post-mortems for past incidents, including the 2026-05-19 gateway 502 outage we wrote up while it was fresh.

For the longer comparison rubric, see our 12-point vendor evaluation rubric.

Closing thought

Sportsbook data isn't a clean B2B API surface. It's reverse-engineered access to consumer-facing endpoints that the bookmaker can change at any time. The quality difference between aggregators isn't "do they have an API"; it's how fast they detect breakage, how honest they are about coverage gaps, and how much they invest in geographic + WAF diversity. Pick the one whose dashboard is harder to lie on.

Last verified: 2026-05-19.