Keep Connections Alive : Browser - Load Balancer - Web Server - mod jk - Tomcat : Part 1

This series is about how connections are kept alive between different components along the route.
 Browser - Load Balancer - Web Server - mod jk - Tomcat

Part 1 : Browser - Load Balancer.
Part 2 : Load Balancer - Apache HTTP Server
Part 3 : Apache HTTP Server - Mod jk
Part 4 : Mod jk - Tomcat

 

Browser -> Load Balancer


How is the connection between web browser and load balancer maintained?

I created a test.jsp that does nothing but sleep for 6 minutes.
Thread.sleep(6 * 60 * 1000);//sleep for 1000 ms
I deployed it to the tomcat server and access it from IE. I got the "Internet Explorer cannot display the webpage" error after 5 minutes.



Troubleshooting

This is such a difficult error to troubleshoot, because on the server side, everything is ok.


From mod_jk.log, I got the following entry 1 minute after IE times out.
[Wed Sep 19 10:31:11 2012] 360.003697 APP-INF 200 /Test/test.jsp

From Apache web server access.log, I got the following entries. It does show that the test.jsp page was served 6 minutes late comparing with surrounding log entries, but the status code is 200, which means ok. 

[19/Sep/2012:10:30:59 -0400] "POST /blah"
[19/Sep/2012:10:25:11 -0400] "GET /Test/test.jsp HTTP/1.1" 200 207
[19/Sep/2012:10:31:30 -0400] "POST /blah"


Interestingly, when I hit the url from Google Chrome, everything is fine. After 6 minutes, the page got displayed. So what's the difference between IE and Google Chrome? It turns out that Google Chrome keeps sending keep-alive signals to the server that keeps the connection alive, and IE doesn't.

http://serverfault.com/questions/275409/keep-alive-and-timeout-behaviours-between-different-browsers-on-windows

So, it was the connection between IE and Load Balancer that was dropped, by the load blancer. According to F5 load balancer manual, there is an "Idle Timeout" setting to the TCP Profile that "specifies the number of seconds that a connection is idle before the connection is eligible for deletion". The default is 5 minutes.

http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm_configuration_guide_10_1/ltm_protocol_profiles.html#1211081

 

The Fix

According to the F5 documentation, we changed the "Keep Alive Interval" setting on the load balancer TCP profile from 30 minutes (1800 secs) to 60 second.

http://support.f5.com/kb/en-us/solutions/public/8000/000/sol8049.html

After the change, IE is able to display the page after 6 minutes, instead of getting the notorious "Internet Explorer cannot display the webpage" error. One can argue that normal pages don't take 6 minutes to display, but I have seen enough applications take that long either by design or under weird situations. Besides, by principal, load balancer should not time out the connection with client while keeping the connection with the backend open.

More Things to Consider

This takes care of the connection between web browser and load balancer. How about the connection between load balancer and web server? It is discussed in part 2 of this topic.

No comments:

Post a Comment