Sunday, August 26, 2007

How To Create 32-bit Import Libraries Without .OBJs or Source

This article is intentionally titled the same as Microsoft KB article 131313 (was Q131313) because that KB article does not give enough information for you to create a .LIB file that works with Windows system DLLs.

There are some functions in the Win32 API that have no import library. One example is the function named RemoveControlByName() in OCCACHE.DLL. According to the documentation, the only way to use the function is to use LoadLibrary() and GetProcAddress(). This is error-prone, requires a lot of code to maintain, and isn't usable with the Delay-Load feature of VC++.

Obviously it would be preferable to have a .LIB to link against, but creating such a library is challenging. If you use Dependency Walker to look at OCCACHE.DLL, you'll see that the function is named simply RemoveControlByName. While that seems obvious, it shouldn't be possible to have this name because it doesn't include any notation for the calling convention.

If the function were __cdecl, then the function name should have started with an underline, such as _RemoveControlByName. If the function were __stdcall, then the underline should have been added as well as suffix to indicate the number of bytes in its call stack. RemoveControlByName has five 4-byte paramters, so a __stdcall signature should have looked like _RemoveControlByName@20. However, the function name has no decorations at all, which should be impossible according to Microsoft's discussion of name decoration.

The Q131313 article discusses the general case of manually creating a .LIB file for a .DLL file. The discussion under Creating a .DEF File says "The reason for this limitation is based on an assumption made by the LIB utility that all names are automatically exported without a leading underscore." This seems promising because we don't have a leading underscore. However, functions exported from Windows DLLs(almost) always using __stdcall, and the discussion under Creating a .DEF File is only applicable to __cdecl.

In spite of those warnings, I spent quite a bit of time trying to craft a .DEF file that described what I was trying to do. (You can use LIB.EXE to compile a .DEF file to a .LIB without any .OBJ files) Although most .DEF files simply list the undecorated function names under the EXPORTS section, the syntax of the .DEF file allows for creating aliases and other strange constructs. Some of the things I tried included:

; Raw function name
RemoveControlByName

; Alias the undecorated name to __stdcall
RemoveControlByName=RemoveControlByName@20

; Explicit reference to the necessary DLL
RemoveControlByName@20 = OCCACHE.RemoveControlByName

However, none of these generated a .LIB that worked. The first two I couldn't link against and the third would link but fail to run.

I thought I'd learn something by looking at the symbol that the parent program is trying to link against, but that was even worse:

__imp_?RemoveControlByName@20

I knew my header file was correct, so that was definitely the correct name. However, you'll notice the leading "__imp", which indicates that the symbol is being imported from a DLL. This was something else that apparently needed to be included in my hand-crafted .LIB file, and I hadn't seen any discussion anywhere on how to do that.

I've tried to solve this problem two other times in the last several years, and this was the point I gave up in both of those cases. However, this time, failure was not an option. I needed the solution.

I tried the second option in the KB article, described under Stubbing Out Functions. I painstakingly created a dozen functions that mimicked the signatures of the functions listed in the header file. If you are doing this for yourself, here's a tip for creating a Visual Studio project. The final file you need is a .LIB file, so it's tempting to use one of the Visual Studio projects for creating a .LIB. However, that's wrong. If you created a .LIB, then you'll link with the stub functions you created, which is useless. What you really want is a DLL project, which happens to create a .LIB as a by-product.

Anyway, I created the functions as described in the KB article. Since my project used .cpp files, all of the calls in my header file had to be declared extern "C". For example:

extern "C" {
#include "occache.h"
};

If you look in any of the standard Windows include files, such as winbase.h, you'll see this same declaration. Note that this declaration has no relation to _cdecl and therefore has no impact on the calling convention. In other words, it doesn't force all of the functions in occache.h to be called with _cdecl. What this declaration does is to modify all of the linker symbols so that they won't include the C++ name decorations, which encode all of the function's parameters into the function name.

I also updated all of the function signatures in the header file to include __declspec(dllexport).

I compiled it, linked my main application to the new .LIB file, and it linked! I thought I was done, but when I ran the application, I received the error "Entry Point Not Found: The procedure entry point _RemoveControlByName@20 could not be located in the dynamic link library OCCACHE.DLL"

I examined the .LIB with DUMPBIN. Under Public Symbols, I now see two definitions for RemoveControlByName:

_RemoveControlByName@20
__imp__RemoveControlByName@20

It appears that the "__imp" definition was a result of adding __declspec(dllexport), so that explained why the application linked successfully. One problem solved.

Continuing my examination of the DUMPBIN information, I saw that RemoveControlByName was defined as:

Archive member name at FE0: OCCACHE.DLL/
46D12949 time/date Sun Aug 26 00:18:33 2007
uid
gid
0 mode
38 size
correct header end

Version : 0
Machine : 14C (x86)
TimeDateStamp: 46D12949 Sun Aug 26 00:18:33 2007
SizeOfData : 00000024
DLL name : OCCACHE.DLL
Symbol name : _RemoveControlByName@20
Type : code
Name type : name
Hint : 9
Name : _RemoveControlByName@20

To determine whether or not this was correct, I used DUMPBIN to compare against known-good definitions in USER32.LIB, where I found that the "Name type" in the USER32 records was defined as "undecorate" instead of "name". Obviously, there was a magic incantation to set this flag, presumably in the .DEF file.

I added a .DEF file and spent several hours trying various alias combinations, none of which worked. Finally, in desperation, I created a .DEF file that contained just the raw function names. For example:

EXPORTS
RemoveControlByName

I built my application, it linked, and it ran. What happened?

I ran DUMPBIN again on my library. The record describing RemoveControlByName now contained the "undecorate" attribute:

Archive member name at FFC: OCCACHE.DLL/
46D12740 time/date Sun Aug 26 00:09:52 2007
uid
gid
0 mode
38 size
correct header end

Version : 0
Machine : 14C (x86)
TimeDateStamp: 46D12740 Sun Aug 26 00:09:52 2007
SizeOfData : 00000024
DLL name : OCCACHE.DLL
Symbol name : _RemoveControlByName@20
Type : code
Name type : undecorate
Hint : 6
Name : RemoveControlByName

Also, the very last line in the record showed that the "Name" was RemoveControlByName, with no decoration. Exactly what I needed.

It's clear that there's quite a bit of undocumented behavior here. Adding the entry to the .DEF file had a rather dramatic effect on the generated library. I couldn't find any mention of this behavior in the documentation on .DEF files. The only relevant reference I could find was in Microsoft's documentation under __cdecl, where it says "Underscore character (_) is prefixed to names, except when exporting __cdecl functions that use C linkage." This statement is true, but it's not the whole truth. To create a .LIB file that can link against such a construct, you also must have the function declared in a .DEF file.

In summary, to create a .LIB file that will let you link with Windows system DLLs, you need to:
  1. Follow the instructions in Q131313 under Stubbing Out Functions. Make sure you name the project the same as the Windows DLL.
  2. Make sure your dummy functions are defined with __declspec(dllexport) as well as __stdcall.
  3. For the header file used by the parent application, make sure that your function declarations are surrounded with extern "C".
  4. Add a .DEF file to your project that includes the function names with no decoration.

Saturday, August 25, 2007

Media Player Slows Network in Vista

If you've read my earlier blog posts about Gigabit Ethernet, you know that I've had my share of difficulty getting good performance out of my GigE network. I've also posted that I've had problems with Vista that I didn't see in Windows XP or Windows Server 2003. Now I know why. Windows Media Player puts a big throttle on GigE network performance, even if it is paused and not playing anything.

The issue was reported at 2CPU.com and a response from Microsoft was reported at ZDNet. The problem is minor on 10/100 Ethernet, but on a GigE Ethernet the network performance can be throttled back to 100Mps levels. Apparently, there's no registry setting that will resolve the problem, the only solution is to shut down Windows Media Player. There are scattered reports that the problem happens with other media players, such as WinAmp, but I haven't confirmed these reports.

There's another problem I also ran into that caused GigE to throttle back to 100Mbps. This was a self-inflicted problem, but it took several weeks to resolve. I have a LinkSys WRT54G router that handles my gateway/firewall/NAT. All of my servers get their IP address with DHCP and the WRT54G is set to always hand out the same IP address to those servers. This was done so that the servers would get the DHCP settings from our provider. Anyway, one of the servers was migrated to a new motherboard with a different MAC address. The WRT54G started handing out a random DHCP address to that node but still reported that node name as having the other, assigned IP address, all of which made the network become schizophrenic over the IP address assigned to that particular node name. The result was that any traffic that was destined for that node ended up going to the gateway, which didn't have a GigE connection, so traffic was throttled back to 100Mbps speeds.

Update 8/28/2007 - Mark Russinovich has posted a detailed analysis of the network slowdown. It turns out that the more Network Interface Cards (NICs) you have in your system, the worse the problem gets. I have three NICs, including WiFi, which slows my performance to a theoretical maximum of 9MB/second.