Scaling and HA

Velixir has two scaling dimensions: vertical (the plan, which sets per-replica CPU / memory / disk) and horizontal (replica count, optionally autoscaled). Plus an optional high-availability mode that keeps warm-standby replicas ready for instant failover.

Plans

Plans set per-replica resources. Every app picks one at creation; you can change it any time without redeploying - the next deploy rolls pods over with the new plan.

Plans range from a small dev tier (shared CPU, 512 MB RAM) up to dedicated multi-vCPU plans for production traffic. The full plan grid is on the Scaling tab of each app and on the public pricing page.

Autoscaling

On the Scaling tab, set:

Min / max replicas - the bounds the autoscaler operates within.
CPU threshold - scale up when sustained CPU exceeds this percentage. Required.
Memory threshold - optional. When both are set, K8s scales up if either is exceeded.
Sustain for / cool-down - how long load must exceed the threshold before scaling up, and how long to wait before scaling down. The defaults (180s / 300s) suit most apps.

We use the standard Kubernetes HPA under the hood. Metrics come from metrics-server inside the cluster - the same data that drives the live charts on the Overview tab.

Slots

Every app has a production slot plus optional staging slots. Slots share the app's plan but each has its own replicas and its own deploy history. Use them to:

Stage releases - deploy to staging, smoke-test at staging-<your-app>-<region>.velixir.run, then promote.
Run preview environments - spin up a per-feature slot per PR (and tear it down when the PR merges).

Promoting a slot is near-instant: we swap which slot the edge route points at; the previously-live pods stay warm so flipping back is also fast.

High availability

On the Scaling tab, toggle Enable HA. We:

Provision warm-standby replicas alongside your primary, spread across worker nodes.
Wire the edge so traffic fails over automatically if the primary set goes unhealthy.

The warm-standby plan is configurable separately, so you can run a cheaper plan on the standby if its only job is to absorb traffic during an outage. Standby replicas are billed at the normal hourly rate while they're running; pausing the slot stops billing but takes longer to fail over.

Failover is automatic. You don't need to flip a switch.