Summary: Our platform is engineered for high availability across power, cooling, network, compute, and storage. We target a best-effort 99.9% annual uptime for core infrastructure. This target excludes announced maintenance and events outside our reasonable control (e.g., natural disasters, third-party incidents such as fiber cuts or grid/substation failures).
What “99.9% SLO” Means
- Target (best effort): 99.9% availability, calculated annually.
 - Scope: Infrastructure reachability and platform services at our network edge and hypervisor layer.
 - SLO, not SLA: This is a transparency goal, not a credit-back guarantee.
 
Infrastructure Redundancy
Power (A/B paths)
- UPS + generator for seamless bridging and extended runtime.
 - Dual PSUs per server, each to an independent PDU on separate power phases.
 - Regular testing, monitoring, and documented switchover runbooks.
 
Cooling (N+1)
- Two independent AC units (N+1); either unit can handle the full thermal load while the other is serviced.
 - Continuous temperature and humidity monitoring with alerting.
 
Network
- Dual FTTO uplinks on diverse paths/carriers with dynamic routing for rapid failover.
 - Redundant switching and upstream DDoS mitigation on a best-effort basis.
 
Storage & Ceph Availability
- Ceph-backed storage is available on Cloud VPS and Nextcloud.
 - Ceph is not available on Dedicated Servers or AMD VPS.
 
Operations
- Continuous monitoring of power, cooling, network, compute, and storage.
 - Proactive maintenance with rollback plans; many tasks are non-disruptive due to redundancy.
 - Incident response guided by documented runbooks and escalation paths.
 
Scheduled Maintenance (Excluded from 99.9%)
- We announce maintenance windows in advance and aim for off-peak timing.
 - If impact is expected, it will be stated in the notice; many actions are performed live with no interruption.
 
Exclusions (Outside Our Reasonable Control)
- Announced maintenance windows.
 - Natural disasters/force majeure (e.g., earthquakes, floods, severe storms, wildfires).
 - Third-party incidents (e.g., accidental fiber cuts, upstream carrier failures, grid/substation failures).
 - Customer-side causes (e.g., guest OS/app misconfiguration, exhausted resources, firewall rules, or changes by the customer/their vendors).
 
Conclusion
Our layered design A/B power, N+1 cooling, diverse fiber uplinks, and fault-tolerant storage where applicable is built to achieve a best-effort 99.9% annual uptime. While no system can guarantee 100% availability, this architecture minimizes the impact of component failures and enables maintenance with minimal disruption. The 99.9% target is calculated annually and excludes announced maintenance and events beyond our reasonable control.