{"version":"0.1","as_of":1779312874,"count":2,"limit":50,"filters":{"status":null,"since":null},"status_summary":"monitoring","incidents":[{"id":"2026-05-15-pinnacle-relay","title":"Pinnacle data routed via GCP relay; freshness elevated","status":"monitoring","severity":"minor","started_at":"2026-05-15T17:42:00Z","resolved_at":null,"duration_seconds":null,"impact":"Pinnacle market data (h2h, spreads, totals, period markets) is being delivered via a routed path through our GCP secondary location instead of the primary Hetzner egress. Customer-visible freshness for Pinnacle is elevated from the usual sub-3-second range to 30-60 seconds. All other 79 sources unaffected. No data loss; only freshness is affected for Pinnacle. /v1/odds, /v1/props, /v1/period-odds, /v1/events continue to return Pinnacle prices.","root_cause":"An internal rate-limit probe from the production Hetzner IP stacked on top of the existing collector's request rate to Pinnacle's public guest API, briefly exceeding Pinnacle's Cloudflare WAF tolerance. Pinnacle's WAF responded by returning HTTP 403 to the Hetzner IP. The probe is the proximate cause; the root cause is that the probe was designed in isolation without modeling existing production load.","resolution":"Detected within 1 minute via collector logs. Engaged circuit breaker to halt the 403 storm. Activated standby GCP relay infrastructure (SSH tunnel + pproxy) so Pinnacle traffic now routes from our Hetzner box through our GCP secondary location at us-east1. Pinnacle sees the GCP egress IP, which is healthy. An auto-revert background probe checks Pinnacle direct-IP availability every 10 minutes; the relay disengages automatically after 3 consecutive successful direct probes. Expected restoration of direct routing within 24 hours from incident start.","affected_endpoints":["/v1/sports/{sport}/odds (bookmaker=pinnacle)","/v1/sports/{sport}/props (bookmaker=pinnacle)","/v1/period-odds (bookmaker=pinnacle)","/v1/events (bookmaker=pinnacle)"],"post_mortem_url":"/changelog#2026-05-15"},{"id":"2026-05-15-cutover","title":"Brief tunnel cutover during M4 to Hetzner migration","status":"resolved","severity":"minor","started_at":"2026-05-15T05:52:00Z","resolved_at":"2026-05-15T05:52:03Z","duration_seconds":3,"impact":"Customer-facing /healthz served by GCP nginx backup upstream during a ~3 second tunnel cutover. /v1/* paths returned 502 briefly. All paths returned to 200 within 3 seconds.","root_cause":"Planned migration of API origin from M4 mini (home network, swap-thrashing) to Hetzner EX44 (Falkenstein DC, NVMe RAID1, 62 GB RAM). The systemd autossh tunnel on Hetzner started during a 3-second window when the M4 launchd tunnel was already unloaded. GCP nginx backup upstream served /healthz during the gap.","resolution":"Tunnel came online cleanly. No customer support tickets received. Architecture remains stable post-cutover (10+ hours of zero incidents at the time of this entry).","affected_endpoints":["/v1/*"],"post_mortem_url":"/changelog#2026-05-15"}],"related":{"sla_url":"/v1/meta/sla","status_html_url":"/status","live_status_json_url":"/v1/status","provider_state_url":"/v1/meta/provider-state","changelog_url":"/v1/meta/changelog"},"note":"Source-of-truth is src/api/static/incidents.json, human-edited and git-versioned. Schema may evolve under the version field; current fields are stable."}