Oddities in Gathering Windows Performance Data

At Zenoss we do quite a bit of remote monitoring of computers running Windows. In the Enterprise edition of the product, we collect raw performance counter data using the conventional remote Windows Registry APIs.

We ran into an issue recently with a customer running Windows 2000 where the data from the remote server was being truncated prematurely. Since we implement our own remote API (so we can run natively on Linux and with Python, rather than requiring Windows), there was some immediately concern we ran into a low-level bug in our protocol implementation. Thanks to the release of the Windows Communications Protocols (MCPP) last year we have great detail on how our API layer should function.

Reviewing the MCPP in detail compared to our implementation showed no bugs against the specification, but I did notice some odd behavior. Normally when using the RegQueryValue API you specify a NULL buffer point and a zero-length buffer size so that the call will provide the actual size of the buffer needed. With this particular customer’s server I noticed that this behavior wasn’t behaving as documented in the MCPP.

An error code of ERROR_MORE_DATA was being returned. The MCPP says that when this value is returned the server will populate the size output variable with the actual size in bytes of the needed buffer. In this case, the size was always the same size as the input. After some experimentation I found that if I passed in approximately 64 Kbytes more data the call would finally succeed.

While quite odd behavior, this is actually the documented and expected state in the Win32 API documentation for RegQueryValueEx, but not in the MCPP. Specificially, when using the HKEY_PERFORMANCE_DATA key the ERROR_MORE_DATA behaves differently and the caller has more responsibility in guessing an appropriate buffer size.

The following pseudo-code shows the basic flow for how RegQueryValueEx should be used, either for locally or remote performance data access.

size = 65536 # starting size, probably computed from a previous registry call
params.in.data = params.out.data = buffer(size)
while 1:
    params.in.size = size
    params.out.size = 0
    if params.out.result == ERROR_MORE_DATA:
        size = size + 65536 # add another 64 Kbytes of data to the buffer
        params.in.data = params.out.data = buffer(size)

After fixing that issue I was still left with one oddity. Let’s say, for example, it took 293,500 bytes of data before the RegQueryValueEx call was successful. And yet, the actual amount of returned data would only be 195,000 bytes, or something similar. This behavior seems quite different than on the other Windows operating systems we have tried so far.

This is the first time we’ve tried our data collection against a Windows 2000 server running Exchange locally. Windows 2000 has also been the source of several other key behavior differences in how performance data is returned, so my current speculation is how the server actually determines what data to be returned varies greatly between operating system versions. We normally query the performance counter registry for only a subset of values. It may well be that on Windows 2000 a buffer size large enough to retrieve all performance counters is required, even though once the call is complete it actually used quite a bit less.

Quirky, but another bug gone.

Looking Back on a Old Project

In 2000 I began work on a client application for the ICB chat system. At the time, the best client for the system was the old-school UNIX client that worked using a terminal interface. For things like chat, terminal interfaces are fantastic but at some point they began to stop keeping pace with the rest of technology. For example, a common activity in any chat system is the sharing of URLs for other users to view. Most terminal applications at the time (many of the Linux-based terminal applications have since improved upon this) did not support this, so users were forced to cut & paste URLs – not a fun process.

By 2000, I had been developing with Java full-time professionally for just over a year, with exposure to the language off and on since it was originally released. I wanted to improve upon the existing Java clients out there, while at the same time making an excuse to learn Java’s Swing API (their new and robust graphical user interface API).

Another key was a Java core technology known as Java Web Start. This technology allowed the automatic download, configuration, and updating of Java applications (versus applets) directly from the web. For the users, this means the application can run locally on their computer without being stuck inside the web browser only – generally, a very poor performing approach. For the developer (me!) it meant a more robust run-time environment with automatic support for browser integration, making it easy to launch URLs in whatever browser the user has available. This last point was key – with Java Web Start you don’t have to code command-line sequences to launch a browser, you just use a built-in API and it just happens.

The initial development of my client, which I named IcyBee, took a few months of my spare time. A small number of users immediately switched to it and began offering feedback and that, coupled with my own desires, led to several periods of feature creep over the next few years. Support for graphical emoticons, UTF-7 text encoding (the ICB protocol was designed in ’89 and hasn’t changed since, and thus it uses 7-bit US-ASCII only!), and URL shrinking were among the most notable features added.

The initial development was done using the Java 1.3 platform, but by the time the most stable version was released in September of ’04, Java 1.4 was the minimum requirement. Until January 3, 2009, version 0.92 remained stable and under constant use. While the user base is extremely small, any software product that manages to go over 4 years without a change and still remain useful day-to-day is impressive to me – even more so given I knew of the warts and bugs lurking about in the product.

Fast forward to late ’08 and the holiday break. I felt like doing some coding that was not related to work (programmers really do like to code for fun…) and there were a few bugs in IcyBee that needed fixing. I really had only two goals for the new version: update the code to use the Java 1.5 platform as a minimum and fix a few miscellaneous bugs that had never been published in a version.

Part of converting to the Java 1.5 platform meant changing the code to use the new coding constructs that were available. This was not necessary, but rather simplified many parts of the code and allowed me to address several warnings my editing system. During this conversion process I realized just how much more I had learned about Java development since the original effort. By itself, this is no surprise – good engineers of any sort look back on what they’ve done and typically think about how they would do it differently (even it was a smashing success!) based upon what they have learned since then.

Since I originally wrote IcyBee I had been through 3 commercial software projects that all used Java. One of those was a Swing GUI application, and clearly I wrote IcyBee’s UI before that experience. Another project was a massive multi-tiered application with a web UI. This project greatly enhanced my knowledge of design patterns in the Java space. The original coding of Icybee also took place before I had read the Effective Java book, and few of the techniques preached by that text were in the project.

Somewhat surprisingly, I am still quite content with the overall design of the application and its class framework. If I were writing a new chat client today, even in a different language, I would use the same design pattern again. This is rather typical of good design patterns – they’re applicable to several different kinds of problems that share common characteristics and tend to be robust for the long-term.

The Swing UI in IcyBee is the worst part of the application. I had not yet had vetting experience from my day job in that technology so just getting something functional was all I was really concerned with. If I were coding this application today the approach I would take in the UI implementation would be completely different. The end result for the users would in return be tighter, faster and more visually appealing. I’d likely even take advantage of some custom graphic design work and avoid using the rather stale Swing UI graphics that reeks of circa-2000 computing.

There’s a lot left to do to IcyBee to make it “complete” and yet I realized that’s just the perfectionist streak talking. IcyBee has been stable for a long, long time, and users aren’t demanding new features. More spit & polish would really be for my own benefit and not so much for the users. There is always the temptation to keep improving upon projects like this given that they are excellent résumé fodder, but at some point you have to let go and focus your energies on other things. There are simply other projects with broader user bases that would be much more beneficial for me to spend my time on.

On January 3, 2009 I released a new version of IcyBee and labeled it 1.0-stable. The next day I decided to fix another long standing bug with the URL shortening feature that had been broken because TinyURL has started to suck. Luckily, a new service called bit.ly is available and provides a wonderful REST/JSON API. On January 4, 2009 1.01-stable was released. I hope this version lasts another 4+ years like 0.92 did.

Smart & Get Things Done

I finally got around to reading Smart & Gets Things Done: Joel Spolsky’s Concise Guide To Finding The Best Technical Talent which came out a little over a year ago. The title pretty much hits the point of the book on the head.

It is a quick, pleasant read, even if you aren’t a big fan of the Joel On Software blog. Joel has some pretty good insight into the software development industry and technical talent in general, so even if you don’t agree with all of his assertions (I certainly don’t), his opinions are worth reading and heading.

Some key points I took from this book were:

  • The great developers aren’t usually on the job market, so you have to go out and hunt them down.
  • Treat developers like the talent they are; the great ones are rare.
  • Spend the money making sure the developer’s work enviroment is top-notch; it’s a better way to spend your company’s dollar than with over-the-top salaries.
  • Follow some real software engineering processes or pay the price with a poor quality product and a stressed-out team.
  • If you’re hiring programmers, make sure to actually have them write code during the interview process.

Overall, I’d give it 3.5 out of 5 stars.

Summer reading list

I just ordered several books from Amazon for work-related reading over the summer. All of these are highly recommended so hopefully they won’t be too boring.

  • Designing Interfaces: Patterns for Effective Interaction Design – Jenifer Tidwell
  • Here Comes Everybody: The Power of Organizing Without Organizations – Clay Shirky
  • Presentation Zen: Simple Ideas on Presentation Design and Delivery (Voices That Matter) – Garr Reynolds
  • Once You’re Lucky, Twice You’re Good: The Rebirth of Silicon Valley and the Rise of Web 2.0 – Sarah Lacy
  • Smart and Gets Things Done: Joel Spolsky’s Concise Guide to Finding the Best Technical Talent – Joel Spolsky