Summary: Our platform is engineered for high availability across power, cooling, network, compute, and storage. We target a best-effort 99.9% annual uptime for core infrastructure. This target excludes announced maintenance and events outside our reasonable control (e.g., natural disasters, third-party incidents such as fiber cuts or grid/substation failures).
What “99.9% SLO” Means
- Target (best effort): 99.9% availability, calculated annually.
- Scope: Infrastructure reachability and platform services at our network edge and hypervisor layer.
- SLO, not SLA: This is a transparency goal, not a credit-back guarantee.
Infrastructure Redundancy
Power (A/B paths)
- UPS + generator for seamless bridging and extended runtime.
- Dual PSUs per server, each to an independent PDU on separate power phases.
- Regular testing, monitoring, and documented switchover runbooks.
Cooling (N+1)
- Two independent AC units (N+1); either unit can handle the full thermal load while the other is serviced.
- Continuous temperature and humidity monitoring with alerting.
Network
- Dual FTTO uplinks on diverse paths/carriers with dynamic routing for rapid failover.
- Redundant switching and upstream DDoS mitigation on a best-effort basis.
Storage & Ceph Availability
- Ceph-backed storage is available on Cloud VPS and Nextcloud.
- Ceph is not available on Dedicated Servers or AMD VPS.
Operations
- Continuous monitoring of power, cooling, network, compute, and storage.
- Proactive maintenance with rollback plans; many tasks are non-disruptive due to redundancy.
- Incident response guided by documented runbooks and escalation paths.
Scheduled Maintenance (Excluded from 99.9%)
- We announce maintenance windows in advance and aim for off-peak timing.
- If impact is expected, it will be stated in the notice; many actions are performed live with no interruption.
Exclusions (Outside Our Reasonable Control)
- Announced maintenance windows.
- Natural disasters/force majeure (e.g., earthquakes, floods, severe storms, wildfires).
- Third-party incidents (e.g., accidental fiber cuts, upstream carrier failures, grid/substation failures).
- Customer-side causes (e.g., guest OS/app misconfiguration, exhausted resources, firewall rules, or changes by the customer/their vendors).
Conclusion
Our layered design A/B power, N+1 cooling, diverse fiber uplinks, and fault-tolerant storage where applicable is built to achieve a best-effort 99.9% annual uptime. While no system can guarantee 100% availability, this architecture minimizes the impact of component failures and enables maintenance with minimal disruption. The 99.9% target is calculated annually and excludes announced maintenance and events beyond our reasonable control.