WISPGate Technical Details

High-Availability Architecture

Carrier-grade high availability ensuring resilience, failover, and continuous service uptime

Motivation & Why We Need HA

  • Many clients have thousands to hundreds of thousands of subscribers β€” load and scale demands are real.
  • Single-server deployments lead to resource saturation (CPU, RAM, I/O) β€” risk of outages or performance degradation.
  • Growing demand for carrier-grade reliability and 24/7 uptime β€” especially for LTE (PCRF/HSS/OCS) and real-time services (IPTV, VoIP).
  • HA ensures redundancy, failover, load balancing, and maintenance with zero downtime β€” protecting revenue streams and network stability.

Key Layers to Protect (What HA Covers in WISPGate Ecosystem)

Layer / Component Role / What It Handles
AAA / Policy / Charging
(PCRF/HSS/OCS / RADIUS / DHCP / SIP-auth)
Real-time subscriber control, authentication, policy, quota, billing triggers
BSS / OSS Core
(WISPGate API, UI, Mediation, Rating, Billing)
Business logic: user management, billing, reporting, integrations, DB
Data-Plane (User Traffic)
(EPC, BNG, BRAS, UPF, NAT, routing)
Not managed by WISPGate, but must remain unaffected if control-plane fails
Supporting Services
(Cache, Queue, DB, Logs)
State, sessions, queues, data consistency, CDR persistence

Our HA design ensures that failure in one node doesn't impact subscriber control, billing, or service availability.

HA Option Matrix

# Model Best For
1 Vertical Scale (Single Large Server) Small / low-criticality deployments
2 Single-Site Active/Passive (N+1) Medium-size clients (~100–150k subs)
3 Single-Site Active/Active (Scale-Out) Large clients (100k–500k+ subs)
4 Dual-Site Active/Passive (DR) National operators needing DR/BC plans
5 Dual-Site Active/Active (Geo) Carrier-scale multi-regional deployments

Option 1 β€” Vertical Scale (Single Large Server)

Pros

  • Fastest, cheapest initial step
  • No architectural complexity β€” minimal operational overhead

Cons

  • Single point of failure β†’ full outage on crash or hardware fault
  • No redundancy, no failover, no rolling upgrade, limited scalability

When to Use

  • Short-term "bridge" until HA deployment
  • Very small subscriber base or low-criticality deployments
Descriptive Alt Text

Option 2 β€” Single-Site, Active/Passive HA (N+1)

  • 2 WISPGate app nodes behind a load-balancer (active/passive)
  • Primary DB + synchronous standby DB (auto-failover)
  • Clustered cache/queue layer (Redis, RabbitMQ)
  • 2 AAA / RADIUS / DHCP / PCRF / HSS / OCS nodes β€” failover configured
  • All referencing the same shared DB & data store

Pros

  • Strong availability improvement vs a single server
  • Simple to implement and manage
  • Works for LTE and fixed-access (non-LTE) environments

Cons

  • Failover may cause a short interruption (DB switchover)
  • Capacity still limited by a single active DB node
  • No protection against the whole data center failure

Best For

  • Medium-size customers (up to ~100–150k subs)
  • Clients needing reliable uptime but not yet at carrier-scale
Descriptive Alt Text

Option 3 β€” Single-Site, Active/Active (Scale-Out HA)

  • Multiple stateless WISPGate app/worker nodes behind LB β€” horizontal scaling
  • DB cluster (multi-node, master + replicas / Galera / cluster manager)
  • Redis / queue cluster for caching, sessions, tasks
  • Multiple AAA / RADIUS / PCRF / HSS / OCS nodes all active β€” load distribution
  • Diameter / RADIUS / SIP endpoints configured across all nodes

Pros

  • Real throughput scaling β€” can handle large subscriber counts and high request rates
  • Smooth rolling upgrades, no downtime for maintenance
  • Better performance isolation (billing load, AAA load, UI load separated)

Cons

  • Increased complexity β€” requires mature DevOps / monitoring / orchestration
  • DB cluster management, split-brain risk, replication consistency challenges
  • Configuration errors can cause worse failures than a single-node

Best For

  • Large clients (100k–500k+ subs)
  • When you expect growth, heavy traffic (especially LTE + real-time billing), multiple services (IPTV, VoIP, Internet)
Descriptive Alt Text

Option 4 β€” Dual-Site, Active/Passive (DR / Disaster Recovery + HA)

  • Primary data center with full HA stack (as Option 1 or 2)
  • Secondary data center (warm standby) β€” replicate DB asynchronously
  • Idle or limited service load on the secondary DC
  • Network elements (BNG, MME, NAS, Diameter peers) configured with primary + secondary endpoints

Pros

  • Protection against the entire data center failure
  • Secondary DC can be scaled down (cost-efficient warm standby)
  • Clear Disaster Recovery path

Cons

  • Potential data loss during failover due to async replication (RPO)
  • Requires disciplined DR procedures (manual or scripted switchover)
  • Extra cost for duplicate infrastructure and maintenance overhead

Best For

  • National / regional operators needing high business continuity guarantees
  • Clients who require formal DR / BC (business continuity) plans
Descriptive Alt Text

Option 5 β€” Dual-Site, Active/Active (Geo-Distributed HA & Scale-Out)

  • Full HA stack (Active/Active) replicated in two or more data centers
  • Either:
    • Multi-primary DB cluster across DCs (complex)
    • Sharding by region (DC1 handles region A subs, DC2 handles region B subs)
    • Regional AAA / PCRF / RADIUS / HSS / OCS services β€” local to DC
    • Cross-DC replication or synchronization for global data (billing, CDRs, reporting)

Pros

  • Maximum scalability and resilience
  • Can survive losing an entire DC with only partial degradation
  • Ideal for multi-region / multi-country deployments

Cons

  • Very high architecture complexity β€” significant expertise needed
  • Data model must be carefully designed (sharding, replication, session affinity, data consistency)
  • Overkill (and cost-heavy) for most use-cases

Best For

  • Carrier-scale clients with a multi-regional footprint
  • Strategic long-term deployments where growth and uptime at scale are critical
Descriptive Alt Text

LTE vs Non-LTE β€” How HA Differs in Practice

LTE (EPC + HSS/PCRF/OCS)

Key points:

  • Session state on Gx/Gy/S6a must not be lost on node failure
  • Charging reliability: you cannot lose CCR/CCA, CDRs, or quota changes

Minimum sane design (short-term LTE HA):

  • Option 1 (Single Site A/P) with
    • 2 PCRF nodes behind LB or with Diameter failover
    • 2 HSS nodes sharing replicated DB
    • OCS logic behind LB and DB cluster

Medium-term (proper LTE HA):

  • Option 2 (Single Site A/A) for PCRF/HSS/OCS with:
  • Stateless Diameter frontends
  • All states in DB/Redis cluster with proper durability
  • EPC is configured to load balance across multiple peers.

Non-LTE (FWA, Fiber, DSL)

Key points:

  • RADIUS and DHCP are critical but easier than Diameter
  • Payments and billing must not double charge or mis-rate under failover

Minimum HA

  • Two RADIUS/DHCP nodes (active/active) + DB primary/standby
  • BRAS/BNG/routers configured with both RADIUS servers
  • WISPGate app cluster in A/P or A/A behind LB

Extended HA

  • Add mediation/rating workers running active/active
  • Dedicated DB read replicas for heavy reporting / BI to keep the primary fast

IPTV & VoIP Billing/Management

For IPTV and VoIP, the HA pattern is mostly:

Signalling / Control:

  • SIP/Softswitch/RADIUS/Diameter: 2+ nodes active/active
  • WISPGate provides AAA and charging decisions via RADIUS/Diameter/API

Billing / CDR Rating:

  • Mediation jobs run on multiple worker nodes with idempotent rating logic
  • Message queues (Kafka/RabbitMQ) replicated across nodes

In terms of HA architecture, they fall under the same BSS/AAA HA stack you design for LTE and fixed access. You don't want a separate bespoke HA story per service; you want a unified "Charging & AAA Cluster" architecture.

Summary / Conclusion

Recap:

  • βœ” HA is mandatory for any mid-size or large deployment β€” keeping control, billing, and revenue streams safe.
  • βœ” Single-server vertical scale is a temporary "stop-gap," not a long-term solution.
  • βœ” Recommended standard: Single-Site Active/Passive (Tier 1) or Active/Active (Tier 2) as baseline HA offering.
  • βœ” For large clients / strategic deployments: Dual-Site / Geo-Distributed HA.

Scale Your Operation

Discover how WISPGate’s high-availability architecture ensures your network remains resilient as you scale to hundreds of thousands of subscribers.

Book Demo Explore Platform

Technical Architecture

Explore the security models, database structures, and API access layers that complete the WISPGate technical ecosystem.

Security Model Database Structure