2026-05-19 · architecture latency real-time

Latency budget for a betting agent: every step from book to bet

If you're chasing arbitrage or +EV value, the only number that matters is end-to-end latency: how long between a soft book mispricing a line and your bet hitting their server. Books are sophisticated enough that genuine windows close within 1 to 30 seconds. If your stack takes 20 seconds, you catch tail-end edges only. If you can run end-to-end in under 2 seconds, you catch the meat.

This post breaks down every step in that path with concrete numbers. We've measured each on production traffic running against US-licensed sportsbooks. If you're building a betting agent, model-runner, or arb scanner, this is the budget you're working with.

The full pipeline

Eight stages between the book moving a line and your bet getting confirmed:

Book publishes update: internal pricing engine pushes new line to their public-facing API or WebSocket.
Aggregator ingests: your data provider observes the update, parses it, persists it.
Aggregator broadcasts: REST cache invalidates or WebSocket push fires to subscribed clients.
Network to your agent: over the wire from aggregator edge to your machine.
Agent decision: your model evaluates whether the price is bet-worthy, computes stake.
Bet submission to book: POST to the book's bet-placement API.
Book risk-engine evaluates: they decide whether to accept your bet at the displayed price or adjust it.
Acceptance / line-change response: confirmation comes back, you log it.

Time budget table

Concrete numbers per stage. The "tight" column is what a production system actually hits. The "slack" column is what a hobby setup hits. The "stale" column is where you're effectively betting blind.

Stage	Tight (ms)	Slack (ms)	Stale (ms)	Optimization lever
1. Book publishes	0	0	0	You don't control this; it's the t=0 reference.
2. Aggregator ingest	200	800	3000+	WebSocket subscriptions or per-book hot loops. Polling = slack/stale; push = tight.
3. Aggregator broadcast	50	200	5000+	WebSocket fan-out at the edge. CDN-cached REST = stale by definition.
4. Network to agent	20	100	500	Geographic proximity to aggregator. East-coast US to East-coast US = ~20ms.
5. Agent decision	10	100	2000	Pre-computed sharp-anchor fair prob, lookup-table edge thresholds. Avoid synchronous DB writes in the hot path.
6. Bet submission	200	600	2000	Persistent connection to the book. Avoid re-auth per bet.
7. Book risk-engine	200	800	5000	You don't control this. Some books are intentionally slow on edge accounts (latency = risk filter).
8. Confirmation	100	300	1000	Network return path; symmetric with stage 4.
Total round-trip	780	2900	18500+

Tight ~780ms. Achievable. Sub-second round trip from book-line-move to bet-confirmation on a well-architected stack.

Slack ~2900ms. Reasonable for most operators. Catches most opportunities; misses the tightest windows.

Stale 18s+. The line has either moved or the edge has been hit by other faster operators by the time your bet reaches the book.

Where the time actually goes

Three observations from running this in production:

Stage 2 (aggregator ingest) dominates

If you're polling a REST endpoint every 30 seconds, you've already lost. Your effective ingest latency floor is half your polling interval (15 seconds on average), plus the cache TTL. A 30-second poll against a 60-second-TTL cache = 30 to 90 seconds of staleness before the data even gets to you.

WebSocket flips this. The aggregator's hot loop hits the book on its own cadence (often 200-500ms), pushes the diff to your socket as soon as it lands. Your effective ingest latency is the book's update cadence plus the aggregator's processing overhead, which is typically 300-800ms total.

Action item. If you're polling and seeing edge windows that "close before I can place the bet", you don't have an agent-latency problem; you have an ingest-latency problem. Switch to WebSocket. See /docs/websocket and examples/ws_reference_client.py.

Stage 7 (book risk-engine) is the boss fight

You can hyper-optimize stages 1-6, get your bet to the book's server in 300ms, and still wait 2+ seconds at the book for risk-engine evaluation. Books deliberately slow-evaluate accounts they've flagged as sharp. There's a latency ceiling on bet acceptance that no amount of client-side speed can break through.

The defense: don't pattern-bet (always taking the off-market side, always near max stake, always within seconds of line move). Sprinkle in recreational-shaped bets, vary stake size, take some "value" plays that you'd lose to a model. This reduces your sharp-flag, which reduces the risk-engine latency. Long-run.

Stage 5 (agent decision) is the easy win

Most agents waste 1-2 seconds in decision time because they synchronously query a database, run a heavy stats package, or recompute fair value from scratch. Pre-compute everything: store the sharp anchor's no-vig fair prob per (event, market, side) in memory, refresh it on every push, and the agent's "is this +EV" check becomes a single dictionary lookup plus an arithmetic comparison. Sub-10ms.

# bad: sync DB + recompute
def is_ev(book_price, event_id, side):
    sharp_price = db.query("SELECT price FROM sharp WHERE ...")  # 50-200ms
    fair = compute_devig(sharp_price, ...)  # 10-50ms
    return implied(book_price) < fair

# good: in-memory lookup
def is_ev(book_price, event_id, side):
    fair = FAIR_PROBS.get((event_id, side))  # <1ms
    if fair is None: return False
    return implied(book_price) < fair  # <1ms

The 1-3 second window

Real-world arb and +EV windows on liquid US books typically last 1 to 30 seconds. The distribution is heavily front-loaded: 80% of opportunities close within the first 5 seconds, 50% within 2 seconds. Your round-trip latency has to be less than the window for the bet to land at the displayed price.

Practical implication: getting from 5-second total latency to 2-second total latency roughly doubles your hit rate, because you go from catching the long-tail to catching half the meat. Getting from 2-second to 1-second is another 50% improvement on top of that.

The point of diminishing returns. Below ~700ms, additional latency improvements stop helping. You're hitting the book's risk-engine floor (stage 7) regardless. Don't co-locate in the book's data center; the book will detect that, flag you, and either limit you or push you off entirely. Sub-second is the sweet spot; sub-300ms is suspicious and counterproductive.

Reference architectures

"Polling + REST" (slack tier)

Poll /v1/sports/{sport}/odds every 5 seconds.
Diff against last snapshot, flag edges.
Place bets via book's web flow (browser automation or HTTP).
Expected end-to-end: 3 to 8 seconds. Catches ~30% of arbs.

"WebSocket + REST hybrid" (tight tier)

Subscribe to wss://parlay-api.com/v1/ws for diff pushes.
Pre-compute fair-prob table in memory; refresh on every push.
Persistent HTTPS connection to each funded book; bet submitted on websocket-trigger.
Expected end-to-end: 800ms to 1.5s. Catches ~70% of arbs.

"WebSocket + co-located agent" (production tier)

Agent runs in the same data center region as ParlayAPI's edge (typically us-east-1 or us-east-2).
Pre-warmed bet submission threads per book.
Decision logic in pure C / Rust / Go, no GC pauses during the hot path.
Expected end-to-end: 500-800ms. Catches ~85% of arbs (the remaining 15% are inside the book's evaluation latency floor).

How we measure

Every response we return has a X-Source-Latency-Ms header indicating how stale the aggregator's snapshot was at the time of return. WebSocket pushes carry the same field in their payload. You can plot end-to-end by subtracting your received_at from the upstream observed_at. We track 50th, 95th, and 99th percentile per book on /uptime.

Want to verify your own stack? Run our WebSocket reference client with the built-in latency logger; it timestamps every push at receive and prints the per-book end-to-end every minute.

One more thing

Latency is a means, not an end. The end is edge captured × throughput × longevity. Sub-second latency captures more edge, but if you're betting in patterns the book detects, your throughput collapses and your longevity goes to zero. Latency is necessary but not sufficient. The sharpest operators have sub-second stacks AND vary their bet patterns; the books can't tell them apart from sophisticated recreational bettors until much later.

If you want the latency floor we describe here, ParlayAPI ships sub-second WebSocket on the scale tier ($499/mo) and sub-3-second REST on every tier including free. Try the WebSocket reference client at examples/ws_reference_client.py or sign up at /signup.

← All posts · Value-hunting trifecta · WebSocket docs