Azure Networking book cover by Jose Moreno and Adam Stuart

In my view there are many certifications that do not mean much in the real world, I’ve met a lot of engineers with few certs for many reasons, either they stuggle with the exam format through a diversity or they just don’t like taking them, but CCIE is not one of them. For me, anyone with a CCIE does not walk on water, they simply moonwalk on water! The networking knowledge needed is intense. I have worked directly with Adam Stuart at Microsoft and he is not only top class technically, he is a bloody great guy too. Along with Jose Moreno he co authored Azure Networking. What follows pulls out the ExpressRoute high availability guidance from their book and frames it for engineers who need a design that keeps working when a circuit fails at 03:00. This book is great to read start to finish or to use it as a referance book, as I do. Click on the book cover to take you through to Amazon and buy your own copy.

Local first steps

Start with two circuits in two different peering locations. Separate providers if you can. Separate facilities always. If both circuits land in the same building you have a single point of failure you do not control. Order dual ExpressRoute circuits, terminate them in different metros, and validate that each has its own physical path. Only when the real world plumbing is diverse should you move on to routing.

Why it matters

ExpressRoute is often the lifeline to your workloads. A single circuit outage without a tested design means downtime, escalations, and long nights. A sound design means the failure is a log entry. The goal is simple: in normal operation, each site uses the nearest Azure region. In failure, traffic shifts automatically to the surviving path without violating security or policy.

How easy it is

The patterns are straightforward once you decide on the topology.

  1. Build two circuits in different locations.
  2. Connect them to two Azure regions.
  3. Choose your topology: Bow Tie for lowest latency and clarity, or Square for cost lean scenarios where a longer failover path is acceptable.
  4. Set routing preferences so the right paths are preferred in steady state and failover is automatic.

You must test. You should document exact expected paths per prefix family and you should rehearse failure modes.

How much it will save your backside

I frequently say, that the ONLY thing we can guarantee in IT, it that things fail and typically when we most need them up. Circuits are no differant, they fail. Peering locations have maintenance. A Bow Tie or Square design turns a provider event into a non event for the business. The difference is people sleeping or not.

The Bow Tie design

Bow Tie connects each on premises site to both Azure regions. You then bias routing so that each site prefers its local region in steady state.

ExpressRoute Bow Tie design

Operational intent:

Practical routing notes from the book:

Why engineers like Bow Tie:

The Square design

Square connects each site only to its local region. In failure, traffic rides the Azure backbone to reach the other region. You trade some latency in a failover for lower circuit costs and a simpler commercial model.

ExpressRoute Square design

When Square makes sense:

Caveats to plan for:

Adding the safety raft: VPN as failover

When risk appetite demands a third way, layer site to site VPN as a last resort. Terminate on an Azure VPN gateway or NVA and advertise a limited set of critical prefixes. Keep throughput expectations realistic and test your routing so the VPN is only selected when both ExpressRoute paths are unavailable.

Coexistence and transitivity

If you run ExpressRoute and VPN together, Azure Route Server can allow the gateways to exchange routes so each overlay knows the other exists. For traffic inspection through Azure Firewall or an NVA, bind a route table to the gateway subnet and steer specific prefixes through the firewall while still allowing the gateways to learn each other’s routes for reachability. Treat Route Server as a control plane helper. It teaches routes. It does not set next hops unless you tell it to with UDRs.

Local first steps checklist

Why it matters

Because the day a circuit drops is not the day you want to start drawing diagrams. This design work prevents outages, preserves user trust, and avoids noisy board updates.

How easy it is

It is engineering, not wizardry. The patterns are known. The test cases are repeatable. The only real mistake is not building for failure and not rehearsing it.

How much it will save your backside

Enough that the outage becomes a ticket rather than a crisis. Enough that your name is associated with resilience instead of recovery.

Go deeper

This post skims key lessons from Azure Networking by Jose Moreno and Adam Stuart. The book goes further with redistribution metrics, weight behaviour, AS path strategies, coexistence patterns, and clean diagrams. If you work with Azure networking, buy it, read it, and keep it within reach.