Design and Implementation of a Linux-based Content Switch
C. Edward Chow and Weihong Wang
Department of Computer Science
University of Colorado at Colorado Springs
1420 Austin Bluffs Parkway
Colorado Springs, CO 80933-7150
USA
Email:  chow@cs.uccs.edu, wwang@cs.uccs.edu
Tel: 2-1-719-262-3110
Fax: 2-1-719-262-3369
Name of Corresponding Author: C. Edward Chow
Keywords: Internet Computing, Cluster Computing, Content Switch, Network Routing
Abstract
In this paper we present the design of a Linux-based content switch, discuss ways for improving the TCP delay binding and the lessons learnt from the implementation of the content switch. A content switch routes packets based on their headers in the upper layer protocols and the payload content. We discuss the processing overhead and the content switch rule design. Our content switch can be configured as the front end dispatcher of web server cluster and as a firewall. By implementing the http header extraction and xml tag extraction, the content switch can load balancing the request based on the file extension in the url and routes big purchase requests in XML to faster servers in e-commerce system. The rules and their content switching rule matching algorithm are implemented as a module and hence can be replaced without restarting the system.  With additional smtp header extraction, it can be configured as a spam mail filter or virus detection/removal system.
 
1	Introduction
With the rapid increase of Internet traffic, the workload on servers is increasing dramatically.  Nowadays, servers are easily overloaded, especially for a popular web server. One solution to overcome the overloading problem of the server is to build scalable servers on a cluster of servers [1] [2].  A load balancer is used to distribute incoming requests among servers in the cluster.  Load balancing can be done in different network layers.  A web switch is an application level (layer 7) switch, which examine the headers from layer 3 all the way to the HTTP header of the incoming request to make the routing decisions.  By examining the HTTP header, a web switch can provide the higher level of control over the incoming web traffic, and make decision on how individual web pages,  images, and media files get served from the web site. This level of load balancing can be very helpful if the web servers are optimized for specific functions, such as image serving, SSL (Secure Socket Layer) sessions or database transactions.  
By  having a generic header/content extraction module and rule matching algorithm, a web switch can be extended as  a content switch [3, 4] for  route packets including other application layer protocols, such as SMTP, IMAP, POP, and RTSP. By specifying different sets of rules, the content switch can be easily configured as a load balancer, firewall, policy-based network switch, a spam mail filter, or a virus detection/removal system.
1.1	Goals and motivation for Content Switch
Traditional load balancers known as Layer 4 (L4)  switches examine IP and TCP headers, such as IP addresses or TCP and UDP port numbers, to determine how to route packets.  Since L4 switches are content blind, they cannot take the advantages of the content information in the request messages to distribute the load.   For example, many e-commerce sites use secure connections for transporting private information about clients.  Using SSL session IDs to maintain server persistence is the most accurate way to bind all a client's connections during an SSL session to the same server.   A content switch can examine the SSL session ID of the incoming packets, if it belongs to an existing SSL session, the connection will be assigned to the same server that was involved in previous portions of the SSL session.  If the connection is new, the web switch assigns it to a real server based on the configured load balancing algorithm, such as weighted least connections and round robin.  Because L4 switches do not examine SSL session ID, which is in layer 5, so that they cannot get enough information of the web request to achieve persistent connections successfully. 
Web switches can also achieve URL-based load balancing.  URL based load-balancing looks into incoming HTTP requests and, based on the URL information, forwards the request to the appropriate server based on predefined polices and dynamic load on the server. 
XML are proposed to be the language for describing the e-commerce request. A web system for e-commerce system should be able to route requests based on the values in the specific tag of a XML document.  It allows the requests from a specific customer, or purchase amount to be processed differently. The capability to provide differential services is the major function provided by the web switch.  Intel XML distributor is such an example, it is capable of routing the request based on  the url and the XML tag sequence [3].  
The content switching system can achieve better performance through load balancing the requests over a set of specialized web servers, or achieve consistent user-perceived response time through persistent connections, also called sticky connections.
1.2	Related Content Switching Techniques
1.2.1	Proxy Server
Application level proxies [5,6] are in many ways functionally equivalent to Content Switches. They classify the incoming requests and match them to different predefined classes, then make the decision whether to forward it to the original server or get the web page directly from the proxy server based on proxy server's predefined behavior policies.  If the data is not cached, the proxy servers establish two TCP connections -one to the source and a separate connection to the destination.  The proxy server works as a bridge between the source and destination, copying data between the two connections. Our proposed  Linux-based  Content Switch (LCS) is  implemented in kernel IP layer. It reduces the protocol processing time and provides more flexible content switching rules and integration with load balancing algorithms. 
1.2.2	Microsoft NLB
Microsoft Windows2000 Network Load Balancing (NLB) distributes incoming IP traffic to multiple copies of a TCP/IP service, such as a Web server, each running on a host within the cluster. Network Load Balancing transparently partitions the client requests among the hosts and lets the clients access the cluster using one or more "virtual" IP addresses. As enterprise traffic increases, network administrators can simply plug another server into the cluster. With Network Load Balancing, the cluster hosts concurrently respond to different client requests, even multiple requests from the same client. For example, a Web browser may obtain various images within a single Web page from different hosts in a load-balanced cluster. This speeds up processing and shortens the response time to clients. LCS provides more flexible content switching rules.
1.2.3	Linux LVS
Linux Virtual Server(LVS) is a load balancing server which is built into Linux kerne [1]. In the LVS server cluster, the front-end of the real servers is a load balancer, also called virtual server, that schedules incoming requests to different real servers and make parallel services of the cluster to appear as a virtual service on a single IP address. A real server can be added or removed transparently in the cluster. The load balancer can also detect the failures of real servers and redirect the request to an active real server. 
LVS is a transport level load balancer. It is built in the IP layer of Linux kernel. The incoming request first comes to the load balancer. It is then forwarded to one of the real servers based on the existing load balancing algorithm, using IP addresses and port numbers as the keyword to create a  hash entry in the hash table, and save the ip address and port number of assigned real server as the value of the hash table entry. When the following packets of this connection come, load balancer will get the hash entry based on  the IP addresses and port numbers, retrieved the ip address and port number of the assigned real server,  and redirect the packets to it. 
2	Linux-based content switch design
The Linux-based Content Switch (LCS) is based on the Linux 2.2-16 kernel and the related  LVS package.  LVS is a Layer 4 load balancer which forwards the incoming request to the real server by examining the IP address and port number using some existing schedule algorithm. LVS source code is modified and extended with new content switching funcitons.  LCS examines the content of the request, e.g., URL in HTTP header and XML payload, besides its IP address and port number, and forwards the request to the real servers based on the predefined content switching rules. Content Switch rules are expressed in term of a set of simple if  statements. These if statements include conditions expressed in terms of the fields in the protocol header or pattern in the payload and branch statements describing the routing decisions. Detailed of the content switching rules are presented in Section 4.
2.1	The architecture and operation of LCS
  Figure 1 shows the main architecture of LCS. Here Content switch schedule control module is the main process of the Content Switch which is used to manage the packet follow. Routing Decision, INPUT rules, FORWARD rules and OUTPUT rules are all original modules in Linux kernel. They are modified to work with Content Switch Schedule Control module. Content Switch Rules module is the redefined rule table. Content switch schedule control module will use this information to control the flow of the packets. Connection Hash table is used to hash the existing connection to speed up the forwarding process. LVS Configuration and Content Switch Configuration are user space tools used to define the Content Switch server clusters and the Content Switch Rules.
Figure 2 shows the main operations of the Content Switch. The administrator of the content switch use a content switch (CS) rule editor to create the content switching rules. The content switch rule editor can be a simple text editor or an enhanced editor with GUI and built-in with conflict detection function for checking the conflicts within the rule set.  Examples of conflicts include duplication, two rules with the same condition but different routing actions, two rules with conditions that intersect, two rules with conditions that are subset and in improper order. We have built an interative Java-based editor with rule conflict detection.  The rule set will be translated into an internal function to be called by the rule matching algorithm download into the kernel as a module. The fields of headers and XML tag sequences mentioned in the rule set will be saved in an array to enable the header content extraction function to extract only those data needed for the execution of the rules. 
When an incoming packet arrives, the content switching schedule control module will call the header content extraction function to extract the values of the headers or XML tags mentioned in the rule set, and put them into an array for the rule matching. The content switching schedule control module then called the rule matching algorithm to match the conditions based on the extracted values. If the condition of a rule matches with the header and content of the packet, the routing instructions in the action part of the rule, or the branch statement of the if statement,  is carried out.  
 
Content switch rule has a simple syntax is  if (condition) { action1} [else { action2}. ]
One example of a content switch rule is 
if (match(url,"process.pl") && xml.purchase/totalAmount > 5000) { 
     routeTo(FastServers, NONSTICKY); 
}
Here only the packet with the url field or the absolute path field of the HTTP request end with process.pl, its payload containing the XML document, and the value of its tag sequence purchase/totalAmount greater than 5000, will be routed to one of FastServers, selected based on some load balancing decision and the connection will be NOSTICKY, i.e., future request from the same connection will be going through the content switch rule matching again.  If 
For load balancing performance purpose, the network path information module collects network bandwidth information between the client subnet and those of the servers.  The server load status collect information from servers about their current pending queues and processing speed.  If the routing decision is to load balancing among a set of servers, the load balancing algorithm can retrieve the network path and server load information and make smart decisions.
3	TCP Delayed Binding 
Many upper layer protocols utilize TCP protocol for reliable orderly message delivery. The TCP connection will be established via a three way handshake and the client will not deliver the upper layer information until the three way handshake is complete.  The content switch then selects the real server, establishes the three way handshake with the real server, and serve a bridge that relays packets between the two TCP connections.  This is called TCP delayed binding.
To hide the complexity behind the clustering system, the client only deals with a virtual IP address, VIP. Therefore all subsequent packets from the client will go through the content switch. It will be more efficient if the return packets from the server to the client can be by-passed the content switch and go directly to the client.  However, since the sequence number committed by the content switch and the sequence number committed by the real server are different by the nature of TCP protocol, every subsequent packets between the client and the server require sequence number modification. Due to the changes in the sequence number field, the header checksum also needs to be recomputed.  This imposes significant processing overhead in the content switch processing. We will discuss detail