Until January 2012 I was routing negative. I didn’t even know what RIP, OSPF, BGP stood for. I was trying to explain to an employee of L3 who was a Cisco router geek why you should never route on a firewall…because there are ignorant admins like me who don’t know how to fix routing problems on a firewall. And if routing fails, the whole network fails. He said “It’s not rocket science (you idiot – was implied)”. His routing team then spent the next week trying to get OSPF going and failing miserably. Telecom calls had me in tears laughing. Now I knew I would never route on a firewall.
Until 2 months ago I barely knew how RIP worked. I just knew that you never dynamically route on a firewall….well on a CheckPoint. WHY? Basically because its hard to configure, debug and sometimes just doesn’t work. Check Point knew this too that’s why they bought Nokia IPSO so they could incorporate their version of dynamic routing into GAIA.
Well, its been a long journey and I’m not there yet, but this is what I’ve learned in the past two month. Prior to GAIA dynamic routing was $$$$ so no one did it. Now its free and people are turning it on. Its been a eye opener for me I gotta admit. Overall dynamic routing is not hard, but getting to work with all the other components internal and external has been a battle. Especially clustering.
Why Do We Dynamically Route Uncle Dreez? Well, my little static routing grasshopper let me share this with you. In my diagram below we have a general HA environment. Two data centers going to two ISPs. You can clusterXL across the data centers as long as the sync line has low reliable latency. In this diagram some newbie Windows geek decided to add a new subnet (10.5.0.0/24) and put a PC on it. When the user browses the Internet the packets gleefully follow the default route….but what about the return packets? The firewall and other components don’t know about 10.5.0.0.
Phase 1 RIP:
Back in the days when I had hair they solved this problem with a protocol called RIP. Really simple. RIP enabled components broadcast their entire routing tables to their local subnet every 30 seconds. Routing gateways would then turn around and send the RIP updates (full routing tables) out all of its interfaces. Basically dynamic routing spam. RIP was OK when routing tables were small, but as routing tables grew this tsunami wave of routes would overwhelm the core router tables as every mom and pop RIP component blasted their routes and it propagated into the core. On large networks, 30 hops away took 900 seconds and by the time routes got to the other side, the route might be down again. Slow, memory sucking (but simple). Almost like Windows 8 without the simple. Oh yeah, because its so simple it does very little checking for infinite routing loops. Yet another similarity to Windows 8 – endless CPU sucking loops.
Phase 2: OSPF
The one big improvement OSPF contributed was a Designated Router (DR). Every subnet that had a router gateway (between two subnets) had a Designated Router assigned (DR). The DR (and also Backup DR, if two paths) was the target of all routing updates. Now the firewall may have 10 directly connected routers, but only 2 of them are a DR and BDR . The DR and BDR are known as OSPF ‘neighbors’. The firewall will get and send routing tables and Link State Announcements (LSAs) (e.g. “network 10.2.1.0/27 is down”) to its OSPF neighbors (well, I watered this down a bit but close enough to get you going.)
So instead of having thousands of mother-in-laws yelling at you whining about their life since Day 1, all routing updates go to 1 destination DR on the subnet. The DR’s would then talk to each other exchanging routes. If a DR fails, then use the BDR. Updates are limited to (1) full routing tables every 30 minutes or so (2) The DRs use small 75 byte LSA messages between them if a route goes down (“Hey 10.2.1.0/27 is DOWN!”) so the chatter is minimal compared to the RIP mother-in-laws.
A second way OSPF has limited the RIP mother-in-laws was with a concept called Areas. Areas are kinda like subnets where subnet broadcasts are kept within the subnet. Areas are suppose to be separate administrative boundaries where the routers all inside the boundary all talk to each other and share a common link state table. A router can span adjacent areas and give/get routing table updates, but the two areas keep separate link state tables. So the routing updates are small messages, and larger messages that share the routing state tables are done infrequently and only within an area.
In the table below we have two cities with two data centers in each city (4 data centers). Each data center has a different ISP it connects to for redundancy. Each city has its own unique Area 0 (sometimes called Backbone) and Area 1 that is unique to that city and is not shared with any other cities. Area 0 in XXXX keeps its own link state table, Area 1 in XXXXX keeps its own state table, etc. The OSPF areas are connected to the outside world using BGP (discussed next). Any router that is at the border of an area is known as an Area Border Router (ABR). In our diagram all the routers/firewalls A-M are considered ABRs.
So the net-net (subtle joke) is: OSPF is more efficient and quicker and its routing updates converge faster than RIP. It also is smarter at producing the fastest route between two points without routing loops.
Phase 3: BGP
OK, so I make OSPF sound pretty sexy. Why don’t we run OSPF on everything? Well that one thing I didn’t tell you about was summarizing routes. OSPF can be a memory pig as the routing tables get bigger. Although the ABRs can summarize routes, the internal routers’ routing tables will explode. So on large networks the routers memory tables couldn’t store all the routes. So BGP was created as an intelligent protocol that would summarize routes.
If a router only had two entries in it:
- 10.1.0.0/24 next hop 10.1.1.2
- 10.2.0.0/24 next hop 10.1.1.2
It would summarize them as
- 10.0.0.0/8 next hop 10.1.1.2
Thus saving 1 table entry. OK so multiply this times thousands of routes over thousands of routers over hundreds of routing hops, you are going to save a lot of time and space. So routing tables and updates could go faster and take less time.
BGP is great for the core network where it can take in all types of routing protocols, summarize them and redistribute them back out to the edges where OSPF and RIP lie.
BGP is also great for external routing to external Autonomous Systems (AS) – routing domains controlled by non-administrative entities that you don’t know.
I’m going to stop here for BGP because my knowledge is limited on it and I’ll get myself in trouble if I go further.
…….