The longest, most detailed description in the world of how to successfully use ReadDirectoryChangesW.
This is Part 2 of 2.
Part 1 describes the theory and this part describes the implementation.
Go to the GitHub repo for this article or just
download the sample code.
Getting a Handle to the Directory
Now we'll look at the details of implementing the Balanced solution described in
Part 1. When reading the declaration for
ReadDirectoryChangesW, you'll notice that the first parameter is to a directory, and it's a HANDLE. Did you know that you can get a handle to a directory? There is no OpenDirectory function and the
CreateDirectory function doesn't return a handle. Under the documentation for the first parameter, it says “This directory must be opened with the FILE_LIST_DIRECTORY access right.” Later, the Remarks section says, “To obtain a handle to a directory, use the CreateFile function with the FILE_FLAG_BACKUP_SEMANTICS flag.” The actual code looks like this:
HANDLE hDir = ::CreateFile(
strDirectory,
// pointer to the file name
FILE_LIST_DIRECTORY,
// access (read/write) mode
FILE_SHARE_READ
// share mode
| FILE_SHARE_WRITE
| FILE_SHARE_DELETE,
NULL, // security descriptor
OPEN_EXISTING,
// how to create
FILE_FLAG_BACKUP_SEMANTICS // file attributes
| FILE_FLAG_OVERLAPPED,
NULL);
// file with attributes to copy
The first parameter, FILE_LIST_DIRECTORY, isn't even mentioned in the
CreateFile() documentation. It's discussed in
File Security and Access Rights, but not in any useful way.
Similarly, FILE_FLAG_BACKUP_SEMANTICS has this interesting note, "Appropriate security checks still apply when this flag is used without SE_BACKUP_NAME and SE_RESTORE_NAME privileges." In past dealings with this flag, it had been my impression that Administrator privileges were required, and the note seems to bear this out. However, attempting to enable these privileges on a Windows Vista system by adjusting the security token does not work if UAC is enabled. I'm not sure if the requirements have changed or if the documentation is simply ambiguous. Others are
similarly confused.
The sharing mode also has pitfalls. I saw a few samples that left out FILE_SHARE_DELETE. You'd think that this would be fine since you do not expect the directory to be deleted. However, leaving out that permission
prevents other processes from renaming or deleting files in that directory. Not a good result.
Another potential pitfall of this function is that the referenced directory itself is now “in use” and so can't be deleted. To monitor files in a directory and still allow the directory to be deleted, you would have to monitor the parent directory and its children.
Calling ReadDirectoryChangesW
The actual call to
ReadDirectoryChangesW is the simplest part of the whole operation. Assuming you are using completion routines, the only tricky part is that the buffer must be DWORD-aligned.
The OVERLAPPED structure is supplied to indicate an overlapped operation, but none of the fields are actually used by
ReadDirectoryChangesW. However, a little known secret of using Completion Routines is that you can supply your own pointer to the C++ object. How does this work? The documentation says that, "The hEvent member of the OVERLAPPED structure is not used by the system, so you can use it yourself." This means that you can put in a pointer to your object. You'll see this in my sample code below:
void CChangeHandler::BeginRead()
{
::ZeroMemory(&m_Overlapped, sizeof(m_Overlapped));
m_Overlapped.hEvent = this;
DWORD dwBytes=0;
BOOL success = ::ReadDirectoryChangesW(
m_hDirectory,
&m_Buffer[0],
m_Buffer.size(),
FALSE, // monitor children?
FILE_NOTIFY_CHANGE_LAST_WRITE
| FILE_NOTIFY_CHANGE_CREATION
| FILE_NOTIFY_CHANGE_FILE_NAME,
&dwBytes,
&m_Overlapped,
&NotificationCompletion);
}
Since this call uses overlapped I/O, m_Buffer won't be filled in until the completion routine is called.
Dispatching Completion Routines
For the Balanced solution we've been discussing, there are only two ways to wait for Completion Routines to be called. If everything is being dispatched using Completion Routines, then SleepEx is all you need. If you need to wait on handles as well as to dispatch Completion Routines, then you want
WaitForMultipleObjectsEx. The Ex version of the functions is required to put the thread in an “alertable” state, which means that completion routines will be called.
To terminate a thread that's waiting using SleepEx, you can write a Completion Routine that sets a flag in the
SleepEx loop, causing it to exit. To call that Completion Routine, use
QueueUserAPC, which allows one thread to call a completion routine in another thread.
Handling the Notifications
The notification routine should be easy. Just read the data and save it, right? Wrong. Writing the Completion Routine also has its complexities.
First, you need to check for and handle the error code ERROR_OPERATION_ABORTED, which means that
CancelIo has been called, this is the final notification, and you should clean up appropriately. I describe
CancelIo in more detail in the next section. In my implementation, I used
InterlockedDecrement to decrease
cOutstandingCalls, which tracks my count of active calls, then I returned. My objects were all managed by the MFC mainframe and so did not need to be deleted by the Completion Routine itself.
You can receive multiple notifications in a single call. Make sure you walk the data structure and check for each non-zero
NextEntryOffset field to skip forward.
ReadDirectoryChangesW is a "W" routine, so it does everything in Unicode. There's no ANSI version of this routine. Therefore, the data buffer is also Unicode. The string is not NULL-terminated, so you can't just use
wcscpy. If you are using the ATL or MFC CString class, you can instantiate a wide CString from a raw string with a given number of characters like this:
FILE_NOTIFY_INFORMATION* fni = (
FILE_NOTIFY_INFORMATION
*
)buf;
CStringW wstr(fni.Data, fni.Length / sizeof(wchar_t));
Finally, you have to reissue the call to
ReadDirectoryChangesW before you exit the completion routine.You can reuse the same OVERLAPPED structure. The documentation specifically says that the OVERLAPPED structure is not accessed again by Windows after the completion routine is called. However, you have to make sure that you use a different buffer than your current call or you will end up with a race condition.
One point that isn't clear to me is what happens to change notifications in between the time that your completion routine is called and the time you issue the new call to
ReadDirectoryChangesW.
I'll also reiterate that you can still "lose" notifications if many files are changed in a short period of time. According to the documentation, if the buffer overflows, the entire contents of the buffer are discarded and the
lpBytesReturned parameter contains zero. However, it's not clear to me if the completion routine will be called with
dwNumberOfBytesTransfered equal to zero, and/or if there will be an error code specified in
dwNumberOfBytesTransfered.
There are some humorous examples of people trying (and failing) to write the completion routine correctly. My favorite is found on
stackoverflow.com, where, after insulting the person asking for help, he presents his example of how to write the routine and concludes with, "It's not like this stuff is difficult." His code is missing error handling, he doesn't handle ERROR_OPERATION_ABORTED, he doesn't handle buffer overflow, and he doesn't reissue the call to
ReadDirectoryChangesW. I guess it's not difficult when you just ignore all of the difficult stuff.
Using the Notifications
Once you receive and parse a notification, you need to figure out how to handle it. This isn't always easy. For one thing, you will often receive multiple duplicate notifications about changes, particularly when a long file is being written by its parent process. If you need the file to be complete, you should process each file after a timeout period has passed with no further updates. [Update: See the comment below by Wally The Walrus for details on the timeout.]
An
article by Eric Gunnerson points out that the documentation for FILE_NOTIFY_INFORMATION contains a critical comment:
If there is both a short and long name for the file, the function will return one of these names, but it is unspecified which one. Most of the time it's easy to convert back and forth between short and long filenames, but that's not possible if a file has been deleted. Therefore, if you are keeping a list of tracked files, you should probably track both the short and long filename. I was unable to reproduce this behavior on Windows Vista, but I only tried on one computer.
You will also receive some notifications that you may not expect. For example, even if you set the parameters of
ReadDirectoryChangesW so you aren't notified about child directories, you will still get notifications about the child directories themselves. For example. Let's assume you have two directories, C:\A and C:\A\B. You move the file info.txt from the first directory to the second. You will receive FILE_ACTION_REMOVED for the file C:\A\info.txt and you will receive FILE_ACTION_MODIFIED for the directory C:\A\B. You will not receive any notifications about C:\A\B\info.txt.
There are some other surprises. Have you ever used hard links in NTFS? Hard links allow you to have multiple filenames that all reference the same physical file. If you have one reference in a monitored directory and a second reference in a second directory, you can edit the file in the second directory and a notification will be generated in the first directory. It's like magic.
On the other hand, if you are using symbolic links, which were introduced in Windows Vista, then no notification will be generated for the linked file. This makes sense when you think it through, but you have to be aware of these various possibilities.
There's yet a third possibility, which is junction points linking one partition to another. In that case, monitoring child directories won't monitor files in the linked partition. Again, this behavior makes sense, but it can be baffling when it's happening at a customer site and no notifications are being generated.
Shutting Down
I didn't find any articles or code (even in open source production code) that properly cleaned up the overlapped call. The documentation on MSDN for canceling overlapped I/O says to call
CancelIo. That's easy. However, my application then crashed when exiting. The call stack showed that one of my third party libraries was putting the thread in an alertable state (which meant that Completion Routines could be called) and that my Completion Routine was being called even after I had called
CancelIo, closed the handle, and deleted the OVERLAPPED structure.
As I was searching various web pages with sample code that called
CancelIo, I found
this page that included the code below:
CancelIo(pMonitor->hDir);
if (!HasOverlappedIoCompleted(&pMonitor->ol))
{
SleepEx(5, TRUE);
}
CloseHandle(pMonitor->ol.hEvent);
CloseHandle(pMonitor->hDir);
This looked promising. I faithfully copied it into my app. No effect.
I re-read the documentation for
CancelIo, which makes the statement that "All I/O operations that are canceled complete with the error ERROR_OPERATION_ABORTED, and all completion notifications for the I/O operations occur normally." Decoded, this means that all Completion Routines will be called at least one final time after
CancelIo is called. The call to
SleepEx should have allowed that, but it wasn't happening. Eventually I determined that waiting for 5 milliseconds was simply too short. Maybe changing the "if" to a "while" would have solved the problem, but I chose to approach the problem differently since this solution requires polling every existing overlapped structure.
My final solution was to track the number of outstanding requests and to continue calling SleepEx until the count reached zero. In the
sample code, the shutdown sequence works as follows:
- The application calls CReadDirectoryChanges::Terminate (or simply allows the object to destruct.)
- Terminate uses QueueUserAPC to send a message to CReadChangesServer in the worker thread, telling it to terminate.
- CReadChangesServer::RequestTermination sets m_bTerminate to true and delegates the call to the CReadChangesRequest objects, each of which calls CancelIo on its directory handle and closes the directory handle.
- Control is returned to CReadChangesServer::Run function. Note that nothing has actually terminated yet.
void Run()
{
while (m_nOutstandingRequests || !m_bTerminate)
{
DWORD rc = ::SleepEx(INFINITE, true);
}
}
- CancelIo causes Windows to automatically call the Completion Routine for each CReadChangesRequest overlapped request. For each call, dwErrorCode is set to ERROR_OPERATION_ABORTED.
- The Completion Routine deletes the CReadChangesRequest object, decrements nOutstandingRequests, and returns without queuing a new request.
- SleepEx returns due to one or more APCs completing. nOutstandingRequests is now zero and m_bTerminate is true, so the function exits and the thread terminates cleanly.
In the unlikely event that shutdown doesn't proceed properly, there's a timeout in the primary thread when waiting for the worker thread to terminate. If the worker thread doesn't terminate in a timely fashion, we let Windows kill it during termination.
Network Drives
ReadDirectoryChangesW works with network drives, but only if the remote server supports the functionality. Drives shared from other Windows-based computers will correctly generate notifications. Samba servers may or may not generate notifications, depending on whether the underlying operating system supports the functionality. Network Attached Storage (NAS) devices usually run Linux, so won't support notifications. High-end SANs are anybody's guess.
ReadDirectoryChangesW fails with ERROR_INVALID_PARAMETER when the buffer length is greater than 64 KB and the application is monitoring a directory over the network. This is due to a packet size limitation with the underlying file sharing protocols.
Summary
If you've made it this far in the article, I applaud your can-do attitude. I hope I've given you a clear picture of the challenges of using
ReadDirectoryChangesW and why you should be dubious of any sample code you see for using the function. Careful testing is critical, including performance testing.
Go to the GitHub repo for this article or just
download the sample code.