Monthly Archives: May 2013

Heading to Morocco to get a life

Well everyone, believe it not I have a life. I take summers off with my hot German girlfriend Gaby to rockclimb in Southern France and hang out in Morocco with a camel in the desert. So for the next 3 months you are on your own, Should be back online by September finding new CheckPoint bugs. If you don’t hear from me by then, chances are I fell off a rock or got kidnapped in the Sahara. But hey, I was living large and not sitting in some cold basement installing R76 for the 100th time.

See you on the other side my friends,

Later,

dreez

Image

Image

Snapshots – YES you can clone

Well, CP says you can’t clone devices with snapshots.  They suggest cloning using GAIA commands like Cisco. Well….its not there yet. Its one of these deals where you export the GAIA commands, then import and the manually modify some stuff (hostname, IP, others ) and then import again. I’m not totally buying it because you make so many custom mods to these platforms in the kern_config, or new rpms, or log file sizes that GAIA does not capture it all.

First what is this good for:

  1. Appliance  A dies. You RMA appliance A-NEW of same model, disk, etc
  2. You are deploying 200 of models similar to Appliance A. Take a snapshot of A and then revert them on all 200 – update IPs and licenses. Easier than building from scratch.

I still think the best way is through snapshots…but it takes a few steps. In the photo below you can create a snapshot in WebUI. Then you can export. It will try and export to your PC which is good … but if the line is slow you can also grab the export from the /var/log/download file in case you want to archive to an FTP archive site.

On the clone device you can import doing the reverse. Its probably easiest fastest if you are directly connected…for example building a new clone to replace 1 or more devices. So lets say you get a new appliance replacement for an RMA and it comes with R75.40 and your clone snapshot is at R76.

  1. Export the /etc/sysconfig/network-scripts directory
  2. Import the snapshot file from your directly connection laptop
  3. Revert the the snapshot – You are now at R76
  4. Winscp the /etc/sysconfig/network-scripts directory from your laptop to the new clone appliance. It has all the MAC addresses from the new hardware’s NICs.
  5. reboot
  6. Done

snapshot

Read this To import and export snapshots from GAIA.

A little trick someone showed me (Thanks Gary!)  was in a cluster to keep snapshots of both members ON both members. Then if member A dies, you can import the snapshot from Member B quickly.

VSX Is Special. The Warp Interfaces need a special seed value in the registry.

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk55980

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk80320&js_peid=P-114a7bc3b09-10006&partition=Advanced&product=VSX,

Firewall Dynamic Routing For Dummies – Part Quad

And then there is clustering……

<Work In Progress>

UPDATE: 6/4/2013: Just got word that CP has acknowledged that there is a problem with failover and OSPF. Sometimes OSPF uses its real IP and not the VIP. It should ALWAYS use the VIP. This would be a huge impact for all CP customers. CP claims they will fix this. Till then, tread lightly on failing over. Use clusterXL_admin down would be my suggestion.

Sorry to dump this on but I have good news and I have bad news.

  • Good news: Once its up and running it seems to work
  • Bad news: Once its up and running it seems to work

Clustering is a bitch. If you have a standalone system, go for it. Dynamic routing on clustered system is not there yet. The routing people never talked to the clustering people about life and the reality outside of lab.

  1. So when doing full connectivity upgrades, recognize this only applies to state tables and not routing tables. So although you can get the state tables to fail over, the routing will not converge immediately so you will lose pings.
  2. Member priority. So member A is active it is advertising on the VIP HELLO packets. If you are upgrading member B and member B is the priority member then when you push policy HA will restart and member B will be the VIP advertising OSPF HELLO packets…without a full routing table. Doooo-ah! What to do. Or what if member B is upgraded but you still need to put a patch to routed/ospf on it? You upgrade, it asks to reboot, you reboot and member B is active but doesn’t have the new routed on it and no routes. The whole member priority prior to upgrades should be thought out.
  3. Wait for it……Should you OSPF route before clustering comes up? So OSPF tells the world “Hey, I’m accepting packets on this VIP”. So packets start coming your way but clustering is slowly starting to get out of bed……
    1. clustering
  4. PROBLEMO MUCHACHO….Well, not all the time. When you type cphastop on the problem member, the problem member’s routed starts advertising on its own real IP address and not the VIP. I think this is a problem, because where do the packets go – to the VIP or the real IP? You guessed it, the REAL IP. URRRRP. Problem. yes the routing people and the cluster people and the documentation people were not talking to each other. The upgrade guide is wrong, you can’t use cphastop because it does not stop routed. You have to drouter stop routed too. Routed and clustering are independent of each other and they should be tied hip-to-hip.
  5. So you are single, no family, no dog and you have a death wish. Not a problem, we have that for you. Run VPNs and your firewall and figure out if you should inject kernel routes into OSPF, or let OSPF figure it out and do it for you.  I wish I had the answer, let me know if you do before you die.kernel routes
  6. Just take it from me. NEVER pull an interface cable to disable member from joining the cluster and then reboot. Your life will change before your eyes.
  7. Learn to write awk script in order to convert OSPF routes to static routes in order to do a clean upgrade. If you can’t take any ping losses, then you have to convert the dynamic routing cluster into a static routing cluster. Convert OSPF routes into static routes (do not do a ‘save config’), and upgrade that way. The static routes will continue to route as long as state tables send packets your way. Once things are working, reboot and let OSPF fill the routing tables in place of the GAIA clish static routes which will disappear on reboot because you didn’t do a ‘save config’

Summary: I’m not convinced that clustering and dynamic routing are ready for prime time – full connectivity upgrades. Assume you will take a 1 minute to 3 minute outage as you boot both members and they form a rightful cluster – and then OSPF starts up.

Firewall Dynamic Routing For Dummies – Part Tres

Debugging…

Oh you will know when it dies. You’ll be sleeping at 3am and every router geek in your org will be at your house with torches. When OSPF dies, it goes out with a bang because not only does it die on the firewall, the adjacent neighbors will say to themselves “Hmmmm…I haven’t seen any HELLO packets from the firewall lately let’s assume he’s dead” and pull all the routes and then broadcast to the world YOUR firewall’s dynamic routing sucks.

You laugh now grasshopper.

Debugging OSPF can only really be done with a peer routing geek at your side. So make sure you stay on good terms with this person, or else you are the walking dead. Buy them beer but not guns, they get dangerous when they combine the two.

Right now I don’t have a cool VM setup to show you demos. Sorry. Maybe in the future when I get back from Morocco this summer (my sweetheart wants to go sleep with camels in the desert go figure).

Here are the magic commands:

GAIA clish:

  • show ospf neighbor – These are the DR and BDR where your LSAs go to. Look to make sure they are in FULL state and not sucking air in EXSTART trying to start the handshake.  Should look something like this. Only 2 routers will be in FULL state on an interface. They are the DR and BDR. In case one of them crash the others in 2way will vote and become a DR and BDR.

    neighbor

  • show ospf database – dump what OSPF knows about
  • show ospf summary – short form of database
  •  show configuration OSPF – OSPF configuration commands
  • show route all – all routes
  • show route ospf – ospf routes – NOTE: sometimes I get random results. Unix SPLAT netstat -nr shows all the routes, but this command comes up blank. Not sure.
  • show route static – just static routes
  • drouter stop/start – when its 3am and you just want to go home.

Basically you sit there with the routing geeks and they will tell you if its working after typing in all these commands.

And then there is clustering……

Firewall Dynamic Routing For Dummies – Part Deux

So the best part about OSPF is you don’t have to figure out what the configurations are. In fact, if you try to tell the router geeks what the configurations are they will burn down your house. This is a sensitive area for router geeks. You have the ability to screw with their kingdom and bring their house of cards down. So I could tell you what most these parameters are, but you can also Read The Friggin Manual and the router geeks will feed them to you.

But I’ll gloss over some of the obvious ones.

  • Router id is usually one of the interfaces but is ONLY and ID and does not impact routing. All the routers have to have unique IP addresses.
  • Cost: Small is faster, big is slower. Router geeks usually have a table they use to calculate the cost of the links.  1 gig links are low cost and dial up are expensive.

ospf 1

  • Virtual links you can usually ignore. This is like a VPN tunnel through the firewall when OSPF routers on either side want to talk thru the firewall directly to each other.
  • Areas: this is the fun part where it comes all together. <wait for it….>

areas

So remember in my Part Uno when I said that Windows 8 PC nut created a new network 10.5.0.0 and I wanted all routers in my AS to know about it? This is where it happens. This is where you can tell an interface to join an area and suck in and distribute routes that the firewall knows about. As the firewall gets Link State Announcements (LSA) from far and wide (can be from non-adjacent areas) the firewall will enter the LSA into the routing table and forward the LSA to its adjacent areas. How do you prevent internal addresses from leaking out? Well usually the router geeks will put filters on the perimeter and/or also on the internal BGP domain so that only the right LSAs come your way. You really don’t have to worry about that, that’s why we pay router geeks big bucks.

So in our example the 10.5.0.0/25 is created in some far off router and at some point the LSA makes it to the firewall. Because it traveled  a long distance it has a cumulative cost as hopped through all the OSPF and BGP routers to get to the firewall. The firewall enters the route into the routing table and forwards the LSA (after adding in more cost for the adject links) to other areas the firewall is linked into in this page.

Some thing goes when the 10.5.0.0 goes down. The Windows 8 geek trips over the network wire and the port goes unlinked. The far off router will detect that the network 10.5.0.0 is down and send an LSA to pull the entry from everyones routing table.

Once again, you really don’t have to worry about many of these parameters. They have to match up with the adjacent routers or it won’t work.

The one parameter that you can dink with is the passive option. This tells your local OSPF daemon that the local interface will participate in the local OSPF calculations for that port BUT it will not advertise or suck in LSAs through that port (let’s say eth1, 10.1.0.0/27). If the port goes down, then locally the subnet entry 10.1.0.0/27 will be removed from the routing table BUT the LSA will go out OTHER interfaces (eth2/3/4/5) that the firewall is participating in. So the subnet of the interface (e.g eth1 is 10.1.0.0/27) will be part of the OSPF LSA going out on eth 2/3/4/5 assuming they are attached to OSPF areas, when eth1 goes up and down, but because the interface eth1 does not accept LSAs none of the attached eth1 networks will be in the local routing table and won’t be repeated out eth 2/3/4/5

area config

Firewall Dynamic Routing For Dummies – Part Uno

Until January 2012 I was routing negative. I didn’t even know what RIP, OSPF, BGP stood for. I was trying to explain to an employee of L3 who was a Cisco router geek why you should never route on a firewall…because there are ignorant admins like me who don’t know how to fix routing problems on a firewall. And if routing fails, the whole network fails. He said “It’s not rocket science (you idiot – was implied)”. His routing team then spent the next week trying to get OSPF going and failing miserably. Telecom calls had me in tears laughing. Now I knew I would never route on a firewall.

Until 2 months ago I barely knew how RIP worked. I just knew that you never dynamically route on a firewall….well on a CheckPoint. WHY? Basically because its hard to configure, debug and sometimes just doesn’t work. Check Point knew this too that’s why they bought Nokia IPSO so they could incorporate their version of dynamic routing into GAIA.

Well, its been a long  journey and I’m not there yet, but this is what I’ve learned in the past two month. Prior to GAIA dynamic routing was $$$$ so no one did it. Now its free and people are turning it on. Its been a eye opener for me I gotta admit. Overall dynamic routing is not hard, but getting to work with all the other components internal and external has been a battle. Especially clustering.

Why Do We Dynamically Route Uncle Dreez? Well, my little static routing grasshopper let me share this with you.  In my diagram below we have a general HA environment. Two data centers going to two ISPs. You can clusterXL across the data centers as long as the sync line has low reliable latency. In this diagram some newbie Windows geek decided to add a new subnet  (10.5.0.0/24) and put a PC on it. When the user browses the Internet the packets gleefully follow the default route….but what about the return packets? The firewall and other components don’t know about 10.5.0.0.

General Routing

Phase 1 RIP:

Back in the days when I had hair they solved this problem with a protocol called RIP. Really simple. RIP enabled components broadcast their entire routing tables to their local subnet every 30 seconds. Routing gateways would then turn around and send the RIP updates (full routing tables) out all of its interfaces. Basically dynamic routing spam. RIP was OK when routing tables were small, but as routing tables grew this tsunami wave of routes would overwhelm the core router tables as every mom and pop RIP component blasted their routes and it propagated into the core.  On large networks, 30 hops away took 900 seconds and by the time routes got to the other side, the route might be down again. Slow, memory sucking (but simple). Almost like Windows 8 without the simple. Oh yeah, because its so simple it does very little checking for infinite routing loops. Yet another similarity to Windows 8 – endless CPU sucking loops.

Phase 2: OSPF

The one big improvement OSPF contributed was a Designated Router (DR). Every subnet that had a router gateway (between two subnets) had a Designated Router assigned (DR). The DR (and also Backup DR, if two paths) was the target of all routing updates. Now the firewall may have 10 directly connected routers, but only 2 of them are a DR and BDR . The DR and BDR are known as OSPF ‘neighbors’. The firewall will get and send routing tables and Link State Announcements (LSAs) (e.g. “network 10.2.1.0/27 is down”) to its OSPF neighbors  (well, I watered this down a bit but close enough to get you going.)

So instead of having thousands of mother-in-laws yelling at you whining about their life since Day 1, all routing updates go to 1 destination DR on the subnet.  The DR’s would then talk to each other exchanging routes. If a DR fails, then use the BDR. Updates are limited to (1) full routing tables every 30 minutes or so (2) The DRs use small 75 byte LSA messages between them if a route goes down (“Hey 10.2.1.0/27 is DOWN!”) so the chatter is minimal compared to the RIP mother-in-laws.

DR and BDR

A second way OSPF has limited the RIP mother-in-laws was with a concept called Areas.  Areas are kinda like subnets where subnet broadcasts are kept within the subnet. Areas are  suppose to be separate administrative boundaries where the routers all inside the boundary all talk to each other and share a common link state table. A router can span adjacent areas and give/get routing table updates, but the two areas keep separate link state tables. So the routing updates are small messages, and larger messages that share the routing state tables are done infrequently and only within an area.

In the table below we have two cities with two data centers in each city (4 data centers). Each data center has a different ISP it connects to for redundancy. Each city has its own unique Area 0 (sometimes called Backbone) and Area 1 that is unique to that city and is not shared with any other cities. Area 0 in XXXX keeps its own link state table, Area 1 in XXXXX keeps its own state table, etc.  The OSPF areas are connected to the outside world using BGP (discussed next). Any router that is at the border of an area is known as an Area Border Router (ABR). In our diagram all the routers/firewalls A-M are considered ABRs. 

So the net-net (subtle joke) is: OSPF is more efficient and quicker and its routing updates converge faster than RIP. It also is smarter at producing the fastest route between two points without routing loops.

ospf bgp 2

Phase 3: BGP

OK, so I make OSPF sound pretty sexy. Why don’t we run OSPF on everything? Well that one thing I didn’t tell you about was summarizing routes.  OSPF can be a memory pig as the routing tables get bigger. Although the ABRs can summarize routes, the internal routers’ routing tables will explode. So on large networks the routers memory tables couldn’t store all the routes. So BGP was created as an intelligent protocol that would summarize routes.

If a router only had two entries in it:

  • 10.1.0.0/24 next hop 10.1.1.2
  • 10.2.0.0/24 next hop 10.1.1.2

It would summarize them as

  • 10.0.0.0/8 next hop 10.1.1.2

Thus saving 1 table entry. OK  so multiply this times thousands of routes over thousands of routers over hundreds of routing hops, you are going to save a lot of time and space. So routing tables and updates could go faster and take less time.

BGP is great for the core network where it can take in all types of routing protocols, summarize them and redistribute them back out to the edges where OSPF and RIP lie.

BGP is also great for external routing to external Autonomous Systems (AS) – routing domains controlled by non-administrative entities that you don’t know.

I’m going to stop here for BGP because my knowledge is limited on it and I’ll get myself in trouble if I go further.

…….

In Search of the Model Number

Well, this was pretty obvious!

I just wanted to know the model number for my inventory script. Anyways Obviously its in the HKLM_registry.data file. Didn’ you know that??? Duh!

grep -i ‘series\|Appliance’    $CPDIR/registry/HKLM_registry.data or

dmidecode | fgrep -i product

and SOMETIMES it will print out the actual Model number. Otherwise you get a weird code:

U-10-00

and have to use the table:

http://blog.lachmann.org/?p=172

Obvious.

Oh yeah, if you need the serial number.

dmidecode | egrep ‘Product|Serial Number’

Thanks to Snake Oil for the lead.

http://snakeoilresearch.blogspot.com/2010/12/what-hfashotfixes-have-been-applied-to.html

http://blog.lachmann.org/?p=172

I sure hope the new provisioning tool can do this for me some day.

MDM Architecture – Part IV

After seeing the new MDS environment my brain is just spinning

One thing I was thinking is that now that the backend is in SQL, why limit to global/local? Why can’t you have user based hierarchical?
DOMAIN GLOBAL-
       DOMAIN North America
                DOMAIN – Midwest
                       ENFORCMENTPOINT- St. Paul
                DOMAIN – NY
        DOMAIN ASIA
        DOMAIN EU
DOMAIN PCI
        DOMAIN PCI-NorthAmerica
That way ENFORCMENTPOINT can inheret all the DOMAINS rules and objects along with its own local ones.
If you migrate St. Paul to a new domain, the migration can just work with local objects named “DOMAINX_DOMAINY_migrated_host_server_1.2.3.4” and its IP address.  Put a search function in so one can search and replace where the migrated objects were localized.
UPDATE 6/1/2013:
The new management environment is suppose to support millions of objects which is great. But managing them has me worried. MDS has average support for large numbers of objects (rules, admins, network objects, services, etc).
I feel there should be several group and scoping templates that allow us to group, search, execute on groups of objects. Off the top of my head I feel there should be hierarchical and relational groups, folder hierarchies, labels and label hierarchies. You may consider using inheritance in these hierarchies. To search these objects it should be like google – indexed fuzzy searches. Then you can click and drag them into a new folder for example or execute a command on them.
THis way WE control our scoping rules and CP does not force us into a global/local decision. Large enterprises are not that simple.
dreez

Review of the new Provisioning and Management environment

So I was super excited to have the privilege of reviewing the new provisioning and management environments. These are my cryptic notes that I am flushing out. Too bad I couldn’t take screenshots.

SmartProvisioning:

Not sure when this is coming out but it looks mostly done and operational. Very very cool changes FINALLY! Here are the ones I tested:

  1. From the MGT console you can launch a ssh shell! Finally. It uses pub/private keys for authentication
  2. From the GATEWAY->Topology properties, you can set routes, DNS, NTP
  3. Setting SIC will automatically retrieve topology, DNS, NTP, routes, etc
  4. I’m saving the best for last. You can run scripts AND finally get the output back. You create your own shell scripts obviously.
  5. The execution status looks like SmartUpdate where a line oriented menu shows (executing, complete. ) I told them this was inadequate because then something blows up you don’t know what happened. They need option of providing detailed debugging on these commands.
  6. It gets even better. You can execute all these commands on groups of firewalls! Wow, true provisioning. I suggested that it allows you to build groups of firewalls and apply changes to that group.

I am very excited about the changes so you can finally go out buy this blade license.

Management Notes

===============================================================

The management environment will be a dramatic change. No more SmartDashboard and SmartDomain manager. Only 1 environment. Perfect!. It was only a dummy interface and some powerpoints so can’t really say how well it works. Like a demo mode with only 25% of the functions implemented.

  1. The interface has this Windows 8 look to it. Seems OK
  2. The window is blank with 4 icons on the left; gateways, monitor, policy, blades
  3. Domains, rules, objects, etc done in all one GUI as I said above
  4. You can put user defined tags on all objects and search on them. Seems like a good idea. Easier to search through 300 huge database for special items.
  5. Global/Local split still occurs. I wish they would have Hierarchical domains.
  6. Gateways has all the gateways. Didn’t see much here
  7. Backend is SQL database (finally)
  8. Can have concurrent administrators and one admin can lock out others if one is editing an object/rule. The writer then ‘publishes’ and the other admin can modify the object.
  9. Monitor: Was also blank but I guess you can now do global monitoring of all gateways (finally) and users can modify the view. Also global Smartlog (excellent!)
  10. Policy: This is what I saw the most of. The policy looks the same but with one additional column for application control.
  11. Policy will immediately verify after you enter the data
  12. You click on a rule and below panel has smartlog and all the hits on that rule. Cool.
  13. Provisioning will be incorporated into the GUI (see above)
  14. You can move gateways between domains easily (I was told). Hopefully more drag and drop here
  15. They have layers of policies. Not sure I get this. So you have these tabbed windows. Tab 1 is a policy, Tab 2 is a policy, etc. Admins can be assigned per tab. The tab rule are then executed 1,2,3,4. Seems OK, not sure how to use this. Which they would have hierarchical domains.
  16. Blades: Did not see what this does

At CPX they said the management would be divided between Access Control and Threat Management. I didn’t see any of that in the demo.

So overall very cool. Looking forward to seeing more. Will be fun doing the migration into this new environment 🙂

MDS or die!

dreez

Snapshots done right: GAIA vs SPLAT

Just figured this out today and it is really cool.

SPLAT snapshots bundle up the world into a single gzip file and you can then export it. On boot, you interrupt the boot and you can suck it in via TFTP and rebuild. This is pretty good except takes a long time. The good part is you can export the snapshot on non-RAID systems in case the disk corrupts.

GAIA (and R75.40 SPLAT) is different. They have Logical Volume Manager on it which provides flexibility in creating and destroying logical disk partitions. When you create a snapshot, GAIA creates a whole new disk partition, and then creates a filesystem based backup on the new logical disk partition (not a single gzip file).

To explore this, create a R75.40 snapshot called DREEZSNAPSHOT. Then use logical volume manager to look at the logical volumes. I don’t have a VM to show this exactly but it will look something like this.

Image

Note how lv_log and lv_current are your current log and GAIA partitions that you are currently using? A third one will appear called something like /dev/mapper/vg_splat-lv-DREEZSNAPSHOT. It will be ‘hidden’ because it is NOT mounted onto the Unix file system.

Go ahead and mount it:

mkdir /mnt/dreez
mount /dev/mapper/vg_splat-lv-DREEZSNAPSHOT /mnt/dreez
cd /mnt/dreez

Now go explore around. See it has a full file system in it?

The cool part about all this is when you do upgrades, if (and when) the upgrade blows up GAIA will automatically revert to the old image. Before the upgrade GAIA will create an image that you can revert to if some magic tests are not passed. This is super important for that firewall in Botswana who’s local sales rep/IT guru only speaks Zambizi and you are doing the remote upgrade at 2am on Saturday with no KVM/remote console and your kids birthday party is at 7am…and the whole thing blows up. GAIA will autorestore and you are off to your kids birthday party.

THis is very cool. Other firewalls have had this for years (my glorious Sidewinder!), and CP has finally got it right too. This is a prime example of enterprise firewall management…the ability to easily manage large number of firewalls remotely. Only downside is you have to pray to the gods  the non-RAID disk doesn’t blow up.

Now if they could fix provisioning and get that fancy new mgt out there! Life will be good!

Over and out y’all,

dreez

Helen's Loom

"The most difficult thing is the decision to act, the rest is merely tenacity." -Amelia Earhart

Life Stories from Dreez

These are stories from my travels. Generally I like to write stories about local people that I meet and also brag about living the retirement dream with my #1 wife Gaby. She is also my only wife.