What is this?

This is basically where I write down stuff that I work with at my job as a GIS Technical Analyst (previously system administrator). I do it because it's practical for documentation purposes (although, I remove stuff that might be a security breach) and I hope it can be of use to someone out there. I frequently search the net for help myself, and this is my way of contributing.

Thursday, October 6, 2011

Installing Visual Studio 2010 (and SQL Server 2008 Express) on non-English Windows 7 x64

I'm running a Norwegian installation of Windows 7 and a few days ago I decided to install the standard edition of Visual Studio 2010. The whole installation went fine except for SQL Server Express 2008. For some reason it did not want to install. I used to have english Windows 7, so I immediately suspected that it was language related.

The log for Visual Studio installer file on: C:\Program Files (x86)\Microsoft Visual Studio 10.0\Microsoft Visual Studio 2010 Professional - ENU\Logs\dd_error_vs_procore_100.txt

listed the following:
[09/26/11,00:07:03] Microsoft SQL Server 2008 Express Service Pack 1 (x64): [2] Error code -2067922940 for this component is not recognized.
[09/26/11,00:07:03] Microsoft SQL Server 2008 Express Service Pack 1 (x64): [2] Component Microsoft SQL Server 2008 Express Service Pack 1 (x64) returned an unexpected value.
[09/26/11,00:07:04] VS70pgui: [2] DepCheck indicates Microsoft SQL Server 2008 Express Service Pack 1 (x64) is not installed.



Checking SQL Servers own install log on C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log had a little more information - among other the following:

The performance counter registry hive is corrupted. To continue, you must repair the performance counter registry hive. For more information, see http://support.microsoft.com/kb/300956.

The KB lists a long tedious procedure involving dozens of registry deletions and copying files from the Windows installation media. I did go through it, but it did not do any good.

The next thing I did helped however. I grabbed the latest Windows 2008 R2 express x64 (wSP1) from Microsoft and installed it. Voila! Visual Studio works like a charm.

Tuesday, April 19, 2011

Unstable ArcIMS virtual servers and how to deal with it

For some time now we’ve experienced instability in our ArcIMS production environment. It has been quite difficult to deal with, and despite trying to apply all best practices from ESRI we could find, we were not able to stabilize the ArcIMS server.

Production environment: ArcIMS 9.2 SP4 running on Windows 2003 x86 and ArcSDE 9.2 SP4 running on MSSQL 2005 SP2 / Windows 2003 x86. From what I read, I believe the problem exists on later versions of ArcIMS as well.

Essentially what would happen is that under periods of high load (especially if someone decided to start image tiling our maps) one or more virtual servers (image servers) would stop responding. We then had to gracefully stop all ArcIMS services and manually kill the process for the particular image server who had stopped responding. This would also leave dead processes hanging on our SDE server which in turn would create problems there due to a high number of connections. Some days this would happen 5-6 times and would leave our users very frustrated, quite understandably. We weren't able to pinpoint any particular event that caused it as the server was running fine for two years until it gradually got more and more unstable during the last few months. The only pattern we saw was that it seemed to affect the most heavily accessed image servers most often, but not always the same one. It would also just be image servers (not feature servers, query servers etc).

Here are some of the things we did:
  • Restarted all virtual servers every night (basically all ArcIMS processes). 
  • Trimmed axl-files to reduce the number of layers and remove eventual (minor) errors and warnings during image server startup. 
  • Enforced stricter scaling restrictions for heavy datasets. 
  • Increased memory for Tomcat. 
  • Added more CPUs and RAM to provide better load balancing. 
  • Increased the number of image servers (virtual servers) and gave each image server more instances. 
  • Gave each image server (virtual server) exclusive access to its own ‘server’. 
  • Increased the frequency of log file rotation to avoid appending to huge log files. And cleaned all temp folders frequently. 
  • Increased logging to debug level and tried playing back map requests in order to reproduce the problem, but weren't able to do that consistently. 
We managed to increase processing speed quite a lot, but nothing we did seemed to make it more stable. In short we were stuck. That was until I happened to read one post by Kelly Watts in this long thread: (http://forums.esri.com/Thread.asp?c=64&f=792&t=247210&ESRISessionID=P0xuaUmvWOi6GSAlHoEDo3TOHx3bdB7j6EBX2YxCMXCtb43wRjC7vkXJDv42HP-armKb) which suggested that all image servers should only use one instance each. Quite the contrary to what ESRI suggests in their best practice documents (the default is two, but for high load servers you’re encouraged to increase that).

Well long story short – I altered the most vulnerable Virtual Servers to utilize two servers but with one instance each rather than 2 (or more), like shown below:



What a difference. Our ArcIMS environment changed to rock stable overnight. We’ve experienced only a couple of issues now in three weeks. I can finally go home, and not have to go over to the computer to restart ArcIMS every few hours. As expected the processing is a little slower than it used to be with more instances, but that’s a small price to pay. Besides – in a year or so we’re going to replace ArcIMS with ArcGIS Server anyway.

Thursday, March 24, 2011

Keep your log folders tidy with movelogs.exe

Do you sometimes need to keep a log folder tidy? Well, I do. All the time. We have some webservers who create an awful lot of logs, and since I didn't really find a Windows replacement for Linux' logrotate I tried to write a batchfile to get rid of logfiles that were older than a certain number of days. Eventually I gave up and decided it was easier to write it in c#. I suppose powershell would have been even better, but who cares as long as the job gets done?
I suppose I'm not the only sysadmin out there who might need this from time to time, so here it is: movelogs.exe.

It will search a folder for files with a given extension, then apply another extension of your choice to these files providing they are older than a given number of days.

For instance the following command and parameters:

movelogs.exe c:\temp .log .oldlog 30
will search the folder c:\temp for all files ending with .log and attach .oldlog to the filename if the file is older than 30 days.

Personally I call movelogs from a small batchscript who will delete the .oldlog files after movelogs.exe is done. In other situations you might want to move them to a external dump-folder or something else.

Example myservice_deletelogs.bat:
movelogs.exe c:\temp\myservice .log .oldlog 30
del /q c:\temp\myservice\*.oldlog

I have provideed the executable, but if you want to compile it yourself - here's the c# source (should compile without problems in Visual Studio).


using System;
using System.Text;
using System.IO;

namespace movelogs
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 4)
            {
                string strFolder = args[0];
                string strExtension = args[1];
                string strNewExtension = args[2];
                System.DateTime datetimeNow = DateTime.Now;
                try
                {
                    int intFileRenameAge = Int16.Parse(args[3]);
                    string[] strFileList = Directory.GetFiles(@strFolder);
                    foreach (string strFileName in strFileList)
                    {
                        FileInfo fileinfoMyFile = new FileInfo(strFileName);
                        System.DateTime datetimeMyFileDate = fileinfoMyFile.LastWriteTime;
                        TimeSpan timespanFileAge = datetimeNow - datetimeMyFileDate;
                        if (strFileName.EndsWith(strExtension))
                        {
                            int intFileAge = timespanFileAge.Days;
                            if (intFileAge > intFileRenameAge)
                            {
                                Console.WriteLine("{0} is older than {1} days", strFileName, intFileRenameAge);
                                try
                                {
                                    File.Move(strFileName, strFileName + strNewExtension);
                                }
                                catch
                                {
                                    Console.WriteLine("Problem moving {0}. Possible reasons: file in use or permissions.", strFileName);
                                }
                            }
                        }
                    }
                }
                catch (DirectoryNotFoundException)
                {
                    Console.WriteLine("Problem opening folder: {0}", strFolder);
                }
                catch (FormatException)
                {
                    Console.WriteLine("Wrong format for number of days. Try a number");
                }
            }
            else
            {
                Console.WriteLine("movelogs.exe will scan a folder for files with a given extension who are older than a given number of days and then add a new exension them.");
                Console.WriteLine("v1.0
kjetil.pettersson@gmail.com\n");
                Console.WriteLine("Usage: ");
                Console.WriteLine("movelogs.exe [foldername] [extension] [new extension] [age in days]");
                Console.WriteLine("Example:");
                Console.WriteLine("movelogs.exe c:\\temp .log .oldlog 30");
            }
        }
    }
}
 

Friday, March 18, 2011

HP StorageWorks D2D4312 SATA-disks failing

We recently acquired a HP StorageWorks D2D4312 G2. It was running smooth for about 2 months before it started acting strange. We experienced a noticeable decrease in performance and eventually SATA disks started failing. We replaced two drives in about a week but eventually we could not replace them fast enough to rebuild the RAIDs, and lost data. In addition some of the virtual libraries and drives would go offline for no apparent reason. Clearly something was wrong, although the limitied console and GUI made it hard to pinpoint.

We upgraded the software (to 2.1.01) and used the last supported Firmware CD (8.70). No help there. I opened a case with HP and sent a couple of support tickets, and according to them there had been quite a few predicted disk failures going on prior to when the SATA disks actually started failing. So basically they wanted me to download the "D2D G2 Component Firmware" zip-file (version 20110217) and use that to upgrade the Firmware CD (quite an annoying process to be honest, but I'll spare you the details. Let's just say that doing this when you're working remote is more than a little tricky). When looking at the versions it turns out the only difference between the Firmware CD 8.70 and the "D2D G2 Component Firmware" zip-file is the SATA disks firmware version (version HPG1 vs HPG3)

This took care of the problem immediately. Not only did the disks rebuild, and not fail again (at least not yet), but the performance increase was very noticeable (50% in some cases, but some of that might be because we altered the stream concurrency per drive to 1 (down from 3) on our backup jobs in Data Protector). So there you go - it seems like upgrading your SATA-disks to HPG3 is well spent time.

Wednesday, January 19, 2011

VMWare ESXi 4.1 NICs stops responding when copying database in SQL Server 2008 r2

Had a strange issue on some VMWare ESXi 4.1 (build 260247) hosts running on a HP C7000 Blade encenter here today.

Blades:
3 BL460c G1 (Broadcom NC373i and NC382m NICs)
3 BL460c G6 (with Broadcom NetXtreme II 57711E/NC532i and NC326m NICs)
Running in EVC mode: "Inten Xeon 45nm Core2"

Interconnect bays:
2 gbe2c
2 gbe2c layer2/layer3

The problem happened on a clean install of Windows Server 2008 r2 with SQL Server 2008 r2. When I tried to copy a database in SQL Server Management Studio using the Wizard I got to the part where I choose "destination" and selected the network tab. This is when it's supposed to start broadcasting the network to find other SQL Servers. In my case I just get thrown out of RDP and VMWare remote console. The issue could be reproduced every time.

At first I thought the VM crashed, but it was still running (latest VM Tools jftr). What happened was that the interfaces on the esxi host that were connected to the two gbe2c (not the gbe2c layer2/layer3 which is a newer model) interconnect bay switches died. I could not ping the ESXi host, and cannot could not rescue the VMs since I was not able to connect to the Management network interface. The only way to get back in touch with the ESXi host was to cold boot the blade (from HP Onboard Administrator).

This only happened if the VM was running on a ESXi host on a BL460c G6 blade, it did NOT happen on the BL460c G1 blades. Therefore I updated firmware on the BL460c G6 blade and the gbe2c switches, but the problem persisted.

/var/log/messages showed a lot of Broadcom-related kernel panic messages, so I looked around for new drivers for the NetXtreme II NICs which I found here:

http://downloads.vmware.com/d/details/esx41_broadcom_netextremeii_dt/ZHcqYnRlaHRiZCVodw

Installing the 1.60.50 drivers on my BL460c G6 blades did the trick!

Tuesday, January 11, 2011

Non-supported network adapter on ESXi 4.1 vm

Got a VMWare vSphere cluster with 6 VMWare ESXi 4.1.0 hosts in it, and today when I was going to migrate a windows 2008 r2 server vm to another host within the same cluster I got the warning:

"Virtual ethernet card 'Network adapter 1' is not supported. This is not a limitation of the host in general, but of the virtual machine's configured guest OS on the selected host"

Turns out this vm still had a flexible type NIC installed, which worked fine but still gave me this warning when I migrated the vm. Removing the NIC and installing a VMXNET 3 adapter got rid of the warning!

Thanks to Ed at Linkfish consulting for the tip.