Assumptions

The information is based on OneFS v7.2.0.0.  It will likely apply to later versions but new features and functions will impact the information presented.  Click on the network diagrams to get a clearer full size view.

 

What is routing?

Routing is the process to determine how to get from one place to another.  In the networking context, we normally talk about IP packets and how to get packets from a source to a destination.  Each IP packet has a 5 tuple that consists of the source IP address, source port, destination IP address, destination port as well as the protocol in use such as TCP or UDP.  Traditional routing makes all routing decisions based upon the destination IP address of a packet and does not take into account any of the other fields in the tuple.  There are other alternative routing methods available, but using the destination IP address is the most common.

network_diagram (5).png

 

Let’s first walk through a simple scenario and follow a packet flow from source to destination and back.  The scenario we will look at is Client C1 that wants to send a packet to Server A1. Here are the steps that would occur for a packet to travel to Server A1.  I will ignore any name resolutions at this time and assume we just use IP addresses.

  1. Client C1 wants to talk to Server A1 at IP address 10.1.1.80
    1. Client C1 determines that the destination is not local to it and it does not have a static route defined that can handle 10.1.1.80.
    2. The client will then send the packet to its default gateway for further processing.
  2. Router C receives the packet from Client C1
    1. The router looks at the destination IP of 10.1.1.80 and notices that it has a route to the destination through the router at 10.1.1.1.
    2. The router will send the packet to the router at 10.1.1.1 through its external interface.
    3. The packet may pass through more routers in the network core.
  3. Router A receives the packet on its external interface.  This packet could be Router C or it could be any number of other routers.
    1. Router A determines that the destination 10.1.1.80 is directly connected.
    2. The router will send the packet directly to 10.1.1.80 on its internal interface.
  4. Server A needs to send a response packet to Client C1.
    1. Server A determines that the destination is not local to it and it does not have a static route defined for the destination address of 10.2.1.50.
    2. The server will then send the packet to its default gateway for further processing.
  5. Router A receives the packet from Server A
    1. The router looks at the destination address of 10.2.1.50 and notices that it has a route to the subnet through the router at 10.2.1.1.
    2. The router will send the packet to the router at 10.2.1.1 through its external interface.
    3. The packet may pass through more routers in the network core.
  6. Router C receives the packet on its external interface.  This could be Router A or it could be any number of other routers.
    1. Router C determines that the destination 10.2.1.50 is directly connected.
    2. The router will send the packet directly to 10.2.1.50 on its internal interface.

As you follow the flow, you will notice that all routing decisions are made via the destination IP address.  At each hop, each device in the path looks at its directly connected networks and then routing table to determine the next device to send the packet.

 

What is Source Based Routing (SBR)?

At the core, SBR is just a different method to determine the path a packet should take to get to its destination.  With SBR the routing decision is made by the source of the packet instead of or in addition to the destination address.

 

Let’s take an example that can often occur on an Isilon system.  You have a cluster that has both 1 GbE and 10 GbE interfaces.  You dedicate the 1 GbE interfaces for management purposes and you want all data access to go over the 10 GbE interfaces. You again have Client C1 that wants to communicate with Server A1, however this time the client wants to access data at high speed so they connect to a different subnet on the Isilon that is connected to the 10 GbE interfaces.

network_diagram (8).png

 

  1. Client C1 wants to talk to Server A1 at IP address 10.3.1.90
    1. Client C1 determines that the destination is not local to it and it does not have a static route defined that can handle 10.3.1.90.
    2. The client will then send the packet to its default gateway for further processing.
  2. Router C receives the packet from Client C1
    1. The router looks at the destination IP of 10.3.1.90 and notices that it has a route to the destination through the router at 10.1.1.1.
    2. The router will send the packet to the router at 10.1.1.1 through its external interface.
    3. The packet may pass through more routers in the network core.
  3. Router A receives the packet on its external interface.  This packet could be from Router C or it could be any number of other routers.
    1. Router A determines that the destination 10.3.1.90 is directly connected.
    2. The router will send the packet directly to 10.3.1.90 on its internal interface onto the 10 GbE switch.
  4. Server A needs to send a response packet to Client C1.
    1. Server A determines that the destination is not local to it and it does not have a static route defined for the destination address of 10.2.1.50.
    2. The server needs to determine which gateway to send the packet.  Gateways with lower priorities have precedence over gateways with larger priority numbers.
    3. The server has 2 default gateways that it can use.  10.1.1.1 with a priority of 0 and 10.3.1.1 with a priority of 10.  The server will choose the gateway with priority 0 which is 10.1.1.1.
    4. The server will send the packet to 10.1.1.1 through the 1 GbE interface, not the 10 GbE interface.
  5. Router A receives the packet from Server A, however it is through the 1 GbE switch!
    1. The router looks at the destination address of 10.2.1.50 and notices that it has a route to the subnet through the router at 10.2.1.1.
    2. The router will send the packet to the router at 10.2.1.1 through its external interface.
    3. The packet may pass through more routers in the network core.
  6. Router C receives the packet on its external interface.  This could be Router A or it could be any number of other routers.
    1. Router C determines that the destination 10.2.1.50 is directly connected.
    2. The router will send the packet directly to 10.2.1.50 on its internal interface.

 

This is a problem on Isilon because there is only 1 global routing table.  With only a single routing table for all interfaces, when using destination IP only routing, you end up with packets being sent out the wrong interface in many cases.  There are situations where even a static route will not work properly as the destination is and should be reachable over multiple networks, as is the case in this example.

 

You may ask why system doesn't send out the packet through the interface it arrived.  That ability is what SBR provides in a manner.  Instead of relying on the destination IP to route, the SBR feature on Isilon creates a dynamic forwarding rule.  The system makes note of the client's IP and the subnet on the Isilon that the packet arrived.  It then creates a reverse rule so packets going to that IP will always be forwarded to the default gateway on for that subnet.  As an exmaple, if you have a subnet of 10.3.1.x with a gateway of 10.3.1.1, whenever a packet arrives at the cluster destined for any IP in the 10.3.1.x subnet, a rule will be made to send return packets to the gateway 10.3.1.1 regardless of what is in the routing table or gateway priorities.  The way it is currently implemented it also bypasses any static routes that you may have configured.

 

You may ask why system the system doesn’t send out the packet through the interface it arrived. That ability is what SBR provides. Instead of just relying on the destination IP, the SBR feature on Isilon will tag the incoming packet and note the MAC address of the packet sender.  When it is time to send the packet back to the requestor, the tag is checked and the node will send the response packet back to the same MAC address that was saved. This bypasses the routing table completely.  Because of this, you cannot mix SBR with static routes currently.

 

With SBR turned on the packet flow in the above example turns into the following.

network_diagram (9).png

 

Let’s build a lab setup that you can create and see how SBR is working using VMware.  This lab will require a 3 node cluster running OneFS 7.2.0.0 or later and 4 virtual Linux machines.

 

The setup looks like the following diagram.  Instructions on how to reproduce this are at the end of the document.

sbr_network_diagram.png

 

In this case we have a client B1 that wants to communicate with the server A1.  We can test out the packet flow using ping.  On client B1 we do a ping to 10.3.1.91 and we will not get a response.  The packets will flow through the 5 steps outlined above.  You can verify this by running a tcpdump on Router A and you can see the ping packets arriving.

 

On Router A run: tcpdump -n -e icmp

You should see something similar to this:

root@ubuntu:/home/ubuntu# tcpdump -n -e icmp

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

09:23:37.252429 00:0c:29:48:1a:ba > 00:0c:29:a0:21:e7, ethertype IPv4 (0x0800), length 98: 10.3.1.91 > 10.4.1.50: ICMP echo reply, id 6001, seq 18, length 64

09:23:37.252469 00:0c:29:a0:21:e7 > 00:50:56:ee:67:b3, ethertype IPv4 (0x0800), length 98: 10.3.1.91 > 10.4.1.50: ICMP echo reply, id 6001, seq 18, length 64

Notice the source IP of 10.3.1.91.  This is the IP address of the ext-3 interface from the node.  The node needs to send a response out back to 10.4.1.50 but it is routed to Router A instead of returning back on Router B.  You will notice there are 2 echo replies above. The first is the packet from the node to the router.  The second packet is the router trying to forward the packet onward using its own default gateway.

 

This type of packet flow is called asymmetric routing.  Not all cases of asymmetric routing are problematic, but in many cases it is an issue. If you have statefull firewalls in the network since the return path is different the firewalls will not know about the traffic and you will get dropped packets.  It can also be considered a security issue since packets from one network is sent on another network that should not have those packets. You can think of a situation where you have 2 tenants connecting to the cluster and incorrect routing could cause 1 customer’s packets to be sent across to another erroneously.

 

With SBR enabled on the cluster when Client B1 performs the ping, the packet will return successfully. The packet flow looks like the following diagram. (See Turning on SBR in the instructions section)

sbr_network_diagram 2.png

 

Duplicating subnets

Because SBR creates a forwarding rule based on subnets, an interesting feature opens up when you enable this type of routing.  You can have 2 client networks that have the same subnet and IP addressing scheme and still properly route packets as long as the destination those clients use are on different subnets on the Isilon  Take the following diagram as an example.

Because SBR does return path routing using MAC addresses, an interesting feature opens up when you enable this type of routing.  You can have 2 client networks that have the same subnet and IP addressing scheme and still properly route packets.  Take the following diagram as an example.

sbr_network_diagram 3.png

 

In this diagram, Client A1 and Client B1 have the exact same IP address.  They are on different LAN segments however.  In a normal routed network, this would cause problems as you could only ever route packets to the 10.2.1.0 subnet over one router.  However with SBR, both Client A1 and Client B1 can send and receive data from Server A1.  This particular example uses 2 routers however you can still do the same using VLANs.  (See the Duplicating Subnets instructions below on how to setup)

 

Is SBR the total solution?

SBR is a step in the right direction to support very complex network topologies; however it is not a complete solution.  SBR works only with incoming packets.  A packet that originates from the cluster that is not a response from a client still requires processing through standard routing tables.  This means that requests from the cluster like DNS lookups, LDAP lookups, AD lookups, e-mail, SNMP and other outgoing traffic will not be aware and will follow standard routing rules including static routes.

 

There is still a need to support multiple routing tables.  As well, the current SBR implementation prevents the use of static routes for the return path which can be problematic in some network topologies.

 

What about NIC affinity?

NIC affinity is a sysctl that can be configured in OneFS.  This setting only applies when you have multiple NICs on the same node connected to the same subnet. This setting is normally automatically enabled when you have multiple NICs on the same subnet.  The purpose of this is to have response packets go out the same NIC that it arrived.  It does this by looking at the source IP address of the response packet.  Whichever interface is currently configured with that IP address is the interface that the packet will be sent.

 

How do routing tables, SBR and NIC affinity interact?

For data that originates from the cluster, standard routing rules always apply.

 

When SBR is enabled, response packets are sent back to the MAC address using dynamically generated forwarding rules. taken from the incoming packets.


With SBR enabled, currently the routing table and all static routes are bypassed.  When SBR is disabled, the standard routing table and static routes are in use.  Originating packets follow routing table and static routes regardless of SBR.

 

NIC affinity is enabled or disabled normally by OneFS itself.  It does this to balance the outgoing traffic so that not all the traffic leaves on a single interface.

 

All three features have their place and work together to gets packets to where they need to go as efficiently as possible.

 

Running your own virtual test bed

  1. Download the OneFS v7.2.0.0 simulator
  2. Uncompress the .zip into a directory of your choice
  3. Modify the .vmx files to add an additional NIC
    1. In the first three clone directories (clone1, clone2 and clone3) edit the b.7.2.0.16r.vga.cloneX.vmx file (replacing X with 1, 2 and 3)
    2. Find the line that shows:
      guestinfo.dongle_sysconfig = "1337-1337-1337"
      Modify the line as follows:
      guestinfo.dongle_sysconfig = "1337-1337-1339"
  4. Add an extra NIC to each virtual machine
    1. Set the NIC to use a LAN Segment called DATA.
  5. Start up the virtual machines as normal and create a cluster.
  6. For the external interface configure it to this subnet:
    192.168.100.81-.83/255.255.255.0, gateway 192.168.100.10 priority 1
    Add the ext-1 interfaces to the pool
  7. For the second external interface, create a new zone, Zone X and configure it to this subnet:
    10.3.1.91-.93/255.255.255.0, gateway 10.3.1.1 priority 10
    Add the ext-2 interfaces to the pool
  8. Download a Linux LiveCD image.  These instructions use an Ubuntu Desktop image but can be adapted to other distributions.
  9. Create 4 virtual Linux machines and use the same LiveCD for each.  2 VMs will act as routers and 2 will act as clients.  After booting up the LiveCD image click on "Try Ubuntu" to quickly bring up a usable machine.
    1. On the 2 router VMs you need to add an additional NIC.
    2. On Router A, leave the primary NIC set to NAT.  On the secondary NIC set it to use a LAN segment called CLIENT1.
    3. On Router B, set the primary NIC to use a LAN segment called DATA and the secondary NIC as a LAN segment called CLIENT2.
    4. On Client A1 VM, set the primary NIC to use LAN segment CLIENT1.
    5. On Client B1 VM, set the primary NIC to use LAN segment CLIENT2.
  10. Boot the Router A VM.  On this machine configure the network interfaces.
    1. Use the network manager to modify the IP addresses.
    2. Configure eth0 to be static and set address to 192.168.100.10/255.255.255.0, GW (Your VMware NAT gateway)
    3. Configure eth1 to be static and set address to 10.2.1.1/255.255.255.0, GW None
    4. Make yourself root to run privileged commands
      sudo su
    5. In a terminal window enable IP forwarding.
      echo 1 > /proc/sys/net/ipv4/ip_forward
  11. Boot the Router B VM.  On this machine configure the network interfaces.
    1. Use the network manager to modify the IP addresses.
    2. Configure eth0 to be static and set address to 10.3.1.1/255.255.255.0
    3. Configure eth1 to be static and set address to 10.4.1.1/255.255.255.0
    4. In a terminal window enable IP forwarding.
      echo 1 > /proc/sys/net/ipv4/ip_forward
  12. Verify interfaces are working.
    1. From a node on the cluster ping 192.168.100.10
    2. From a node on the cluster ping 10.3.1.1
    3. From a node on the cluster ping 10.2.1.1
    4. From a node on the cluster ping 10.4.1.1
      This ping should fail as we do not have any static route setup and the default gateway is 192.168.100.10
    5. From your host ping 192.168.100.10
  13. Boot Client A1 VM
    1. Use the network manager to modify the IP address.
    2. Configure eth0 to be static and set address to 10.2.1.50/255.255.255.0 with a gateway of 10.2.1.1
  14. Boot Client B1 VM
    1. Use the network manager to modify the IP address.
    2. Configure eth0 to be static and set address to 10.4.1.50/255.255.255.0 with a gateway of 10.4.1.1

 

Turning on SBR

  1. On any node of the cluster
    isi networks sbr enable

 

Testing SBR

  1. From Client A1, ping 192.168.100.81.  This should work.
  2. From Client B1, ping 10.3.1.91.  This should work.
  3. Disable SBR
    isi networks sbr disable
  4. See ping from Client B1 fail.
  5. Re-enable SBR
    isi networks sbr enable
  6. See ping from Client B1 succeed again.

 

Duplicating subnets

  1. One Client B1, use the network manager and modify the IP address to 10.2.1.50/255.255.255.0 with a gateway of 10.2.1.1
  2. On Router B, use the network manager and modify the IP address of eth1 to 10.2.1.1/255.255.255.0, GW none
  3. Change the LAN segment so Router B and the clutster only talk on 1 network
    1. On router B, in the VM setting for primary Network Adapter change to NAT
    2. On the cluster, change the interfaces in 10.3.1.0 subnet from ext-2 to ext-1.  This will make both subnets run in the NAT LAN segment.  However, they are still in 2 different zones.
    3. On the cluster change the 10.3.1.0 subnet from being in Zone X to the System zone.
      Now both subnets live in the same Access Zone and on the same interfaces.

 

Comments are welcome!  Please start a discussion here if you have any other questions on Source Based Routing or routing in general.

 

 

Edit notes:

2014-12-12

Adding additional information and steps on how to reproduce in VM environment