OSPF Split-Brain Problem and Designated Router (DR) Election

All the CCNA/CCNP guys out there once learned from their study that once an OSPF router is elected as Designated Router(DR), he will not willingly loose this status to a new OSPF router that becomes active on the same broadcast network despite having worse priority compared to the new one. Well, this is not completely true. There exists a situation where DR will simply have to willingly enter itself in a limited form of election process again and possibly loose to prevent issues in the network.

In this post, let me show you this problem called OSPF split-brain on live example topology that anyone can do in GNS3 simulator if they wish.

First, let’s summarize what most of the people out there know about OSPF DR election and what is by no means incorrect.

OSPF election of Designated Router (DR)

  • DR is select the router with highest priority, if priorities are equal, the highest router-ID wins.
  • Backup Designated Router (BDR) is router with second highest priority/router-ID.
  • All other routers in common broadcast domain only establish sessions with DR and BDR.
  • DR and BDR elections are not-preemptive, this means that once a DR and BDR is established, they keep their statuses despit new routers becomig active on the same broadcast area that may even have beeter priorities. In other words, there are no new elections held for every OSPF router that comes late.

Let’s start with an OSPF topology on which we will demonstrate an example of DR and BDR election:

OSPF topology
PF topology

So, the question for network oriented reader is, which router in this topology is going to be DR and BDR for the common 10.0.0.0/24 network?

Answer is that R1 will be DR and R3 will be BDR. This can be seen from the “show ip ospf neighbours” on any of the routers.

R4# show ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         100   FULL/DR         00:00:38    10.0.0.1        FastEthernet0/0
2.2.2.2          50   2WAY/DROTHER    00:00:30    10.0.0.2        FastEthernet0/0
3.3.3.3          75   FULL/BDR        00:00:39    10.0.0.3        FastEthernet0/0

As already mentioned, OSPF DR election is by a non-preemptive election. This means that even if we change the priorities so that some other router will have better preference, until the current DR is active, new elections will not start. So lets change the priority on R2 to a bigger value.

OSPF Topology with Priority Change
OSPF Topology with Priority Change on R2

Now, let’s have a look at the election state and despite the new priority for R2 (NeighbourID 2.2.2.2), the DR/BDR have not changed:

    R4# show ip ospf neighbor

    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.1.1.1         100   FULL/DR         00:00:35    10.0.0.1        FastEthernet0/0
    2.2.2.2         200   2WAY/DROTHER    00:00:36    10.0.0.2        FastEthernet0/0
    3.3.3.3          75   FULL/BDR        00:00:35    10.0.0.3        FastEthernet0/0

NO CHANGE!
So to this point, the theory most of CCNA/CCNP people know is correct. Lets simulate a connection issue to our topology.

OSPF Broadcast Network Separated
OSPF Broadcast Network Separated

I created this topology with two switched specifically for this possibility of having the routers separated in pairs by loosing connection between the switches. This will also not give any information to the routers as there is no detectable power outage on their physical interfaces. Before separation R1 was DR and R3 was BDR (because of preemption).

When we separated the networks, Dead-Timers started to expire for DR/BRD sessions from routers trapped to wrong side of the network. From our R4 router perspective showed below, the R4 router lost connection the the old BDR router (R3) and also to R2. Afterwards R4 have become the new BDR for the “left” part of the network.

R4# show ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         100   FULL/DR         00:00:35    10.0.0.1        FastEthernet0/0
2.2.2.2         200   2WAY/DROTHER    00:00:06    10.0.0.2        FastEthernet0/0
3.3.3.3          75   FULL/BDR        00:00:04    10.0.0.3        FastEthernet0/0
R4#
*Mar  1 00:42:37.915: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Dead timer expired
*Mar  1 00:42:39.299: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0 from EXSTART to DOWN, Neighbor Down: Dead timer expired
R4# show ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         100   FULL/DR         00:00:31    10.0.0.1        FastEthernet0/0

On the right side, the DR/BDR statuses have also changed only a little with R2 and R3?

    R3(config-if)#do show ip ospf neighbor

    Neighbor ID     Pri   State           Dead Time   Address         Interface
    2.2.2.2         200   FULL/BDR        00:00:37    10.0.0.2        FastEthernet0/0
    R2(config-if)#do sh ip ospf neighbor

    Neighbor ID     Pri   State           Dead Time   Address         Interface
    3.3.3.3          75   FULL/DR         00:00:30    10.0.0.3        FastEthernet0/0

So despite having smaller priority, R2 has immediatelly changed role from BDR to DR once it detected the old DR (R1) unreachable. R3 was elected new BDR.

In summary for “left” part of the network:

R1 – DR
R4 – BDR

In summary for “right” part of the network:

R2 – DR
R3 – BDR

Now who can tell what happens when I put the networks back together ? Who will be DR and who will be BDR? Will there be completely new elections starting from no-one being preferable in any way? Many questions that I would like to answer for you in this article summary.

Two DR routers find each other on one Broadcast domain
Two DR routers find each other on one Broadcast domain

Now this is an interesting result of how OSPF resolved this problem that is called a “split-brain” issue. Now, let’s look on this output from the R3 router to see who win.

R3>show ip ospf neighbor

    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.1.1.1         100   FULL/DR         00:00:33    10.0.0.1        FastEthernet0/0
    2.2.2.2         200   FULL/BDR        00:00:33    10.0.0.2        FastEthernet0/0
    4.4.4.4          25   2WAY/DROTHER    00:00:36    10.0.0.4        FastEthernet0/0

INTERESTING! The two DR routers have meet. The one DR who discovered that there is another DR with better priority has willingly discarded its DR status entered election process. The same happened to BDR routers, the two BDR routers identified themselves and the only the R4 has willingly discarded its BDR status.  There were no elections held when we connected the two previously separated parts of the network together.

Explanation of what happened here can be found in the OSPFv2 RFC2328:

    7.3.  The Designated Router

        The Designated Router is elected by the Hello Protocol.  A
        router's Hello Packet contains its Router Priority, which is
        configurable on a per-interface basis.  In general, when a
        router's interface to a network first becomes functional, it
        checks to see whether there is currently a Designated Router for
        the network.  If there is, it accepts that Designated Router,
        regardless of its Router Priority.  (This makes it harder to
        predict the identity of the Designated Router, but ensures that
        the Designated Router changes less often.  See below.)
        Otherwise, the router itself becomes Designated Router if it has
        the highest Router Priority on the network.  A more detailed
        (and more accurate) description of Designated Router election is
        presented in Section 9.4.

If we go to section 9.4, we can find there an explanation how the DR and BDR is elected in the situation when one or more routers already declare them selves DR or BDR.

    9.4. Electing the Designated Router

        (2) Calculate the new Backup Designated Router for the network
            as follows.  Only those routers on the list that have not
            declared themselves to be Designated Router are eligible to
            become Backup Designated Router.  If one or more of these
            routers have declared themselves Backup Designated Router
            (i.e., they are currently listing themselves as Backup
            Designated Router, but not as Designated Router, in their
            Hello Packets) the one having highest Router Priority is
            declared to be Backup Designated Router.  In case of a tie,
            the one having the highest Router ID is chosen.  If no routers  
            have declared themselves Backup Designated Router, choose
            the router having highest Router Priority, (again excluding
            those routers who have declared themselves Designated Router),
            and again use the Router ID to break ties.

        (3) Calculate the new Designated Router for the network as
            follows.  If one or more of the routers have declared
            themselves Designated Router (i.e., they are currently
            listing themselves as Designated Router in their Hello
            Packets) the one having highest Router Priority is declared
            to be Designated Router.  In case of a tie, the one having
            the highest Router ID is chosen.  If no routers have declared
            themselves Designated Router, assign the Designated
            Router to be the same as the newly elected Backup Designated
            Router.

 Summary

What I presented you is just part of the hidden “magic” OSPF protocol can do in situations that you will not find in the basic CCNA/CCIP literatrure. But as you have just seen, OSPF is ready for even these situations in the depths of the RFC standard and this means that maybe you already run OSPF in blisfull ignorance of the problems you network topology may provide for the protocol and the protocol itself is allowing you your good night sleep just by being prepared for much more situations that you are aware of …. and I personally appreacite my good night sleep.

Then again, I thank you for reading and if you liked this article, please share.

Peter

 

---
Peter Havrila , published on