This series is about how connections are kept alive between different components along the route.
Browser - Load Balancer - Web Server - mod jk - Tomcat
Part 1 : Browser - Load Balancer.
Part 2 : Load Balancer - Apache HTTP Server
Part 3 : Apache HTTP Server - Mod jk
Part 4 : Mod jk - Tomcat
Browser -> Load Balancer
How is the connection between web browser and load balancer maintained?
I created a test.jsp that does nothing but sleep for 6 minutes.Thread.sleep(6 * 60 * 1000);//sleep for 1000 ms
I deployed it to the tomcat server and access it from IE. I got the "Internet Explorer cannot display the webpage" error after 5 minutes.
Troubleshooting
This is such a difficult error to troubleshoot, because on the server side, everything is ok.From mod_jk.log, I got the following entry 1 minute after IE times out.
[Wed Sep 19 10:31:11 2012] 360.003697 APP-INF 200 /Test/test.jsp
From Apache web server access.log, I got the following entries. It does show that the test.jsp page was served 6 minutes late comparing with surrounding log entries, but the status code is 200, which means ok.
[19/Sep/2012:10:30:59 -0400] "POST /blah"
[19/Sep/2012:10:25:11 -0400] "GET /Test/test.jsp HTTP/1.1" 200 207
[19/Sep/2012:10:31:30 -0400] "POST /blah"
Interestingly, when I hit the url from Google Chrome, everything is fine. After 6 minutes, the page got displayed. So what's the difference between IE and Google Chrome? It turns out that Google Chrome keeps sending keep-alive signals to the server that keeps the connection alive, and IE doesn't.
http://serverfault.com/questions/275409/keep-alive-and-timeout-behaviours-between-different-browsers-on-windows
So, it was the connection between IE and Load Balancer that was dropped, by the load blancer. According to F5 load balancer manual, there is an "Idle Timeout" setting to the TCP Profile that "specifies the number of seconds that a connection is idle before the connection is eligible for deletion". The default is 5 minutes.
http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm_configuration_guide_10_1/ltm_protocol_profiles.html#1211081
The Fix
According to the F5 documentation, we changed the "Keep Alive Interval" setting on the load balancer TCP profile from 30 minutes (1800 secs) to 60 second.http://support.f5.com/kb/en-us/solutions/public/8000/000/sol8049.html
After the change, IE is able to display the page after 6 minutes, instead of getting the notorious "Internet Explorer cannot display the webpage" error. One can argue that normal pages don't take 6 minutes to display, but I have seen enough applications take that long either by design or under weird situations. Besides, by principal, load balancer should not time out the connection with client while keeping the connection with the backend open.
No comments:
Post a Comment