OSPF Split-Brain Problem and Designated Router (DR) Election
All the CCNA/CCNP guys out there once learned from their study that once an OSPF router is elected as Designated Router(DR), he will not willingly loose this status to a new OSPF router that becomes active on the same broadcast network despite having worse priority compared to the new one. Well, this is not completely true. There exists a situation where DR will simply have to willingly enter itself in a limited form of election process again and possibly loose to prevent issues in the network.
In this post, let me show you this problem called OSPF split-brain on live example topology that anyone can do in GNS3 simulator if they wish.
First, let’s summarize what most of the people out there know about OSPF DR election and what is by no means incorrect.
OSPF election of Designated Router (DR)
- DR is select the router with highest priority, if priorities are equal, the highest router-ID wins.
- Backup Designated Router (BDR) is router with second highest priority/router-ID.
- All other routers in common broadcast domain only establish sessions with DR and BDR.
- DR and BDR elections are not-preemptive, this means that once a DR and BDR is established, they keep their statuses despit new routers becomig active on the same broadcast area that may even have beeter priorities. In other words, there are no new elections held for every OSPF router that comes late.
Let’s start with an OSPF topology on which we will demonstrate an example of DR and BDR election:
So, the question for network oriented reader is, which router in this topology is going to be DR and BDR for the common 10.0.0.0/24 network?
Answer is that R1 will be DR and R3 will be BDR. This can be seen from the “show ip ospf neighbours” on any of the routers.
R4# show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 1.1.1.1 100 FULL/DR 00:00:38 10.0.0.1 FastEthernet0/0 2.2.2.2 50 2WAY/DROTHER 00:00:30 10.0.0.2 FastEthernet0/0 3.3.3.3 75 FULL/BDR 00:00:39 10.0.0.3 FastEthernet0/0
As already mentioned, OSPF DR election is by a non-preemptive election. This means that even if we change the priorities so that some other router will have better preference, until the current DR is active, new elections will not start. So lets change the priority on R2 to a bigger value.
Now, let’s have a look at the election state and despite the new priority for R2 (NeighbourID 2.2.2.2), the DR/BDR have not changed:
R4# show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 1.1.1.1 100 FULL/DR 00:00:35 10.0.0.1 FastEthernet0/0 2.2.2.2 200 2WAY/DROTHER 00:00:36 10.0.0.2 FastEthernet0/0 3.3.3.3 75 FULL/BDR 00:00:35 10.0.0.3 FastEthernet0/0
NO CHANGE!
So to this point, the theory most of CCNA/CCNP people know is correct. Lets simulate a connection issue to our topology.
I created this topology with two switched specifically for this possibility of having the routers separated in pairs by loosing connection between the switches. This will also not give any information to the routers as there is no detectable power outage on their physical interfaces. Before separation R1 was DR and R3 was BDR (because of preemption).
When we separated the networks, Dead-Timers started to expire for DR/BRD sessions from routers trapped to wrong side of the network. From our R4 router perspective showed below, the R4 router lost connection the the old BDR router (R3) and also to R2. Afterwards R4 have become the new BDR for the “left” part of the network.
R4# show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 1.1.1.1 100 FULL/DR 00:00:35 10.0.0.1 FastEthernet0/0 2.2.2.2 200 2WAY/DROTHER 00:00:06 10.0.0.2 FastEthernet0/0 3.3.3.3 75 FULL/BDR 00:00:04 10.0.0.3 FastEthernet0/0 R4# *Mar 1 00:42:37.915: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Dead timer expired *Mar 1 00:42:39.299: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0 from EXSTART to DOWN, Neighbor Down: Dead timer expired R4# show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 1.1.1.1 100 FULL/DR 00:00:31 10.0.0.1 FastEthernet0/0
On the right side, the DR/BDR statuses have also changed only a little with R2 and R3?
R3(config-if)#do show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 2.2.2.2 200 FULL/BDR 00:00:37 10.0.0.2 FastEthernet0/0 R2(config-if)#do sh ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 3.3.3.3 75 FULL/DR 00:00:30 10.0.0.3 FastEthernet0/0
So despite having smaller priority, R2 has immediatelly changed role from BDR to DR once it detected the old DR (R1) unreachable. R3 was elected new BDR.
In summary for “left” part of the network:
R1 – DR
R4 – BDR
In summary for “right” part of the network:
R2 – DR
R3 – BDR
Now who can tell what happens when I put the networks back together ? Who will be DR and who will be BDR? Will there be completely new elections starting from no-one being preferable in any way? Many questions that I would like to answer for you in this article summary.
Now this is an interesting result of how OSPF resolved this problem that is called a “split-brain” issue. Now, let’s look on this output from the R3 router to see who win.
R3>show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 1.1.1.1 100 FULL/DR 00:00:33 10.0.0.1 FastEthernet0/0 2.2.2.2 200 FULL/BDR 00:00:33 10.0.0.2 FastEthernet0/0 4.4.4.4 25 2WAY/DROTHER 00:00:36 10.0.0.4 FastEthernet0/0
INTERESTING! The two DR routers have meet. The one DR who discovered that there is another DR with better priority has willingly discarded its DR status entered election process. The same happened to BDR routers, the two BDR routers identified themselves and the only the R4 has willingly discarded its BDR status. There were no elections held when we connected the two previously separated parts of the network together.
Explanation of what happened here can be found in the OSPFv2 RFC2328:
7.3. The Designated Router The Designated Router is elected by the Hello Protocol. A router's Hello Packet contains its Router Priority, which is configurable on a per-interface basis. In general, when a router's interface to a network first becomes functional, it checks to see whether there is currently a Designated Router for the network. If there is, it accepts that Designated Router, regardless of its Router Priority. (This makes it harder to predict the identity of the Designated Router, but ensures that the Designated Router changes less often. See below.) Otherwise, the router itself becomes Designated Router if it has the highest Router Priority on the network. A more detailed (and more accurate) description of Designated Router election is presented in Section 9.4.
If we go to section 9.4, we can find there an explanation how the DR and BDR is elected in the situation when one or more routers already declare them selves DR or BDR.
9.4. Electing the Designated Router (2) Calculate the new Backup Designated Router for the network as follows. Only those routers on the list that have not declared themselves to be Designated Router are eligible to become Backup Designated Router. If one or more of these routers have declared themselves Backup Designated Router (i.e., they are currently listing themselves as Backup Designated Router, but not as Designated Router, in their Hello Packets) the one having highest Router Priority is declared to be Backup Designated Router. In case of a tie, the one having the highest Router ID is chosen. If no routers have declared themselves Backup Designated Router, choose the router having highest Router Priority, (again excluding those routers who have declared themselves Designated Router), and again use the Router ID to break ties. (3) Calculate the new Designated Router for the network as follows. If one or more of the routers have declared themselves Designated Router (i.e., they are currently listing themselves as Designated Router in their Hello Packets) the one having highest Router Priority is declared to be Designated Router. In case of a tie, the one having the highest Router ID is chosen. If no routers have declared themselves Designated Router, assign the Designated Router to be the same as the newly elected Backup Designated Router.
Summary
What I presented you is just part of the hidden “magic” OSPF protocol can do in situations that you will not find in the basic CCNA/CCIP literatrure. But as you have just seen, OSPF is ready for even these situations in the depths of the RFC standard and this means that maybe you already run OSPF in blisfull ignorance of the problems you network topology may provide for the protocol and the protocol itself is allowing you your good night sleep just by being prepared for much more situations that you are aware of …. and I personally appreacite my good night sleep.
Then again, I thank you for reading and if you liked this article, please share.
Peter