Production environment: ArcIMS 9.2 SP4 running on Windows 2003 x86 and ArcSDE 9.2 SP4 running on MSSQL 2005 SP2 / Windows 2003 x86. From what I read, I believe the problem exists on later versions of ArcIMS as well.
Essentially what would happen is that under periods of high load (especially if someone decided to start image tiling our maps) one or more virtual servers (image servers) would stop responding. We then had to gracefully stop all ArcIMS services and manually kill the process for the particular image server who had stopped responding. This would also leave dead processes hanging on our SDE server which in turn would create problems there due to a high number of connections. Some days this would happen 5-6 times and would leave our users very frustrated, quite understandably. We weren't able to pinpoint any particular event that caused it as the server was running fine for two years until it gradually got more and more unstable during the last few months. The only pattern we saw was that it seemed to affect the most heavily accessed image servers most often, but not always the same one. It would also just be image servers (not feature servers, query servers etc).
Here are some of the things we did:
- Restarted all virtual servers every night (basically all ArcIMS processes).
- Trimmed axl-files to reduce the number of layers and remove eventual (minor) errors and warnings during image server startup.
- Enforced stricter scaling restrictions for heavy datasets.
- Increased memory for Tomcat.
- Added more CPUs and RAM to provide better load balancing.
- Increased the number of image servers (virtual servers) and gave each image server more instances.
- Gave each image server (virtual server) exclusive access to its own ‘server’.
- Increased the frequency of log file rotation to avoid appending to huge log files. And cleaned all temp folders frequently.
- Increased logging to debug level and tried playing back map requests in order to reproduce the problem, but weren't able to do that consistently.
Well long story short – I altered the most vulnerable Virtual Servers to utilize two servers but with one instance each rather than 2 (or more), like shown below:
What a difference. Our ArcIMS environment changed to rock stable overnight. We’ve experienced only a couple of issues now in three weeks. I can finally go home, and not have to go over to the computer to restart ArcIMS every few hours. As expected the processing is a little slower than it used to be with more instances, but that’s a small price to pay. Besides – in a year or so we’re going to replace ArcIMS with ArcGIS Server anyway.
I'm going to try this. We have issues with arcims now. Thank you.
ReplyDelete