LiveKit Ops · EU

Self-hosted LiveKit, run like a product.

Production audits, EU-jurisdiction deployments, and ops retainers — for video/voice infra that can’t run on US cloud, or shouldn’t at your volume.

EU media & telemetry stay in EU · Hetzner/OVH/Scaleway · GDPR/NIS2-literate

browser mobile SIP EU jurisdiction · Hetzner FR / DE livekit-server (SFU) egress · turn-detector · TURN ✓ media + telemetry stay in EU US cloud telemetry → US browser mobile SIP EU jurisdiction · Hetzner FR / DE livekit-server (SFU) egress · turn-detector · TURN ✓ media + telemetry stay in EU US cloud telemetry → US

Is this you?

  • You’re in a regulated vertical — telehealth, online notarization, legal — and "media transits US cloud" is a sentence your compliance officer cannot sign.
  • You’re past the volume where managed pricing makes sense, and the invoice line for video is growing faster than the revenue line it supports.
  • GDPR isn’t a checkbox for you — EU data residency is a feature you sell to your own customers, and you need it to be provably true.

If none of these — you probably want LiveKit Cloud, and that’s fine. See below.

What breaks in production

The demo works. Then real traffic arrives. These are the failures that page self-hosted LiveKit teams:

livekit-server RSS climbing +38 MB/h after a SIP re-INVITE storm — goroutine leak in the signaling path, OOM at 03:40.
turn-detector spiking to 2.1 GB on long silent rooms — agent workers killed mid-call, users blame your app.
SIP↔WebRTC p95 latency 180 ms → 460 ms once egress lands on the same node as the SFU.
room-composite egress rolling on empty rooms nobody closed — storage bill ×3 in one month.

Three ways to engage

Production Audit

from $3k · 1 week

  • Full pass over your deployment: config, scaling, SIP/egress, observability
  • Load test against your real traffic shape
  • Written report: what breaks first, at what number, and the fix

What you keep: the report, the load-test harness, a prioritized fix list.

EU Deployment

from $5k

  • livekit-server + TURN + egress on Hetzner/OVH/Scaleway, IaC from day one
  • Monitoring, alerting, backup and restore drills included
  • Media and telemetry pinned to EU jurisdiction, documented for your DPA

What you keep: the Terraform/Ansible repo, runbook, dashboards, every credential.

Ops Retainer

from $1.5k/mo · SLA

  • Response SLA, upgrades, capacity planning
  • Incident postmortems, in writing
  • Monthly report: usage, cost per participant-minute, next risks

What you keep: everything. Runbook and access stay yours — no lock-in by design.

Not sure you need self-hosting?

I also do LiveKit Cloud migration assessments — sometimes the answer is Cloud, and I’ll tell you so in writing.

Ask about an assessment

Work, not adjectives

$ tail -n 4 incidents.log · creator platform, per-minute video billing — LiveKit in production
2025-12 symptom: per-minute billing ticking before anyone was in the room
cause: billing auto-started on call creation, not on participant join
fix: state machine keyed to room presence — billing starts only when both sides are in
2026-01 symptom: client kept paying after the other side dropped mid-call
cause: no pause semantics on disconnect
fix: pause/resume tied to room presence; paused time excluded from the charge
2026-05 symptom: scheduled calls dying with room_unavailable the moment someone opened the call screen early
cause: issuing a join token flipped call state; the room monitor saw an empty room and finalized the call
fix: state transitions driven only by LiveKit participant_joined webhooks — not by token issuance
2026-05 symptom: a cancelled call could still be joined — /join-token happily issued a valid LiveKit access token
cause: no finalized-state check before token issuance
fix: 410 for cancelled/completed/no_show; tokens only for live states

I’m a backend and infrastructure engineer: Node and Go — 3.5 years in production, the last two running LiveKit/WebRTC for a creator platform with per-minute billing — SFU scaling, SIP bridging, voice agents, egress pipelines. GitHub: github.com/lfazliev.

FAQ

Why not LiveKit Cloud?

For most teams, Cloud is the right answer — and I’ll say so (see the assessment above). But Cloud doesn’t let you pin all telemetry to an EU region, and past a certain volume the per-minute economics flip against you. If either applies, self-hosting is a business decision, not an ideology.

Who am I dealing with?

One person. A solo specialist operating as an Armenian sole proprietorship, with EU hosting and EU sub-processors only. Contracts under EU law available on request. No agency, no subcontractors you haven’t met.

What if you get hit by a bus?

Every engagement ships with a runbook, and every credential, repo, and dashboard is yours from day one. Any competent infra engineer can pick up where I left off. That’s not a promise — it’s in the deliverables list above.

What does the retainer SLA cover?

Defined response times for incidents, scheduled upgrades, and capacity reviews. Exact response tiers and escalation paths are agreed per contract — ask on the call and you’ll get them in writing before signing.

How fast can we start?

Audits start within two weeks of the call. Deployments are scheduled after an audit or scoping session — typically 4–6 weeks end to end. Retainers begin the month after onboarding.