Summary: Our platform is engineered for high availability across power, cooling, network, compute, and storage. We target a best-effort 99.9% annual uptime for core infrastructure. This target excludes announced maintenance and events outside our reasonable control (e.g., natural disasters, third-party incidents such as fiber cuts or grid/substation failures).

What “99.9% SLO” Means

  • Target (best effort): 99.9% availability, calculated annually.
  • Scope: Infrastructure reachability and platform services at our network edge and hypervisor layer.
  • SLO, not SLA: This is a transparency goal, not a credit-back guarantee.

Infrastructure Redundancy

Power (A/B paths)

  • UPS + generator for seamless bridging and extended runtime.
  • Dual PSUs per server, each to an independent PDU on separate power phases.
  • Regular testing, monitoring, and documented switchover runbooks.

Cooling (N+1)

  • Two independent AC units (N+1); either unit can handle the full thermal load while the other is serviced.
  • Continuous temperature and humidity monitoring with alerting.

Network

  • Dual FTTO uplinks on diverse paths/carriers with dynamic routing for rapid failover.
  • Redundant switching and upstream DDoS mitigation on a best-effort basis.

Storage & Ceph Availability

  • Ceph-backed storage is available on Cloud VPS and Nextcloud.
  • Ceph is not available on Dedicated Servers or AMD VPS.

Operations

  • Continuous monitoring of power, cooling, network, compute, and storage.
  • Proactive maintenance with rollback plans; many tasks are non-disruptive due to redundancy.
  • Incident response guided by documented runbooks and escalation paths.

Scheduled Maintenance (Excluded from 99.9%)

  • We announce maintenance windows in advance and aim for off-peak timing.
  • If impact is expected, it will be stated in the notice; many actions are performed live with no interruption.

Exclusions (Outside Our Reasonable Control)

  • Announced maintenance windows.
  • Natural disasters/force majeure (e.g., earthquakes, floods, severe storms, wildfires).
  • Third-party incidents (e.g., accidental fiber cuts, upstream carrier failures, grid/substation failures).
  • Customer-side causes (e.g., guest OS/app misconfiguration, exhausted resources, firewall rules, or changes by the customer/their vendors).

Conclusion

Our layered design A/B power, N+1 cooling, diverse fiber uplinks, and fault-tolerant storage where applicable is built to achieve a best-effort 99.9% annual uptime. While no system can guarantee 100% availability, this architecture minimizes the impact of component failures and enables maintenance with minimal disruption. The 99.9% target is calculated annually and excludes announced maintenance and events beyond our reasonable control.

Was this answer helpful? 0 Users Found This Useful (0 Votes)