Hi,
We have a situation in which we need to be able to open, read and
close about 500 data files in at most 5 seconds (preferably less) on a
relatively new machine (e.g. P4, 2.4GHz, IDE drive) with Win2K or XP
using NTFS.
Here are the relevant facts:
1) The files are organized in a hierarchical way:
\dir\subdir1\file1.dat
\dir\subdir1\file2.dat
.
.
.
\dir\subdir2\file3.dat
\dir\subdir2\file4.dat
.
.
.
\dir\subdir3\file5.dat
\dir\subdir3\file6.dat
etc.
2) We use VC++ .NET
3) The files are typically about 0.5-1.0 MB in size, but We only read
the first 4K of each file using the CreateFile, ReadFile and
CloseHandle Win32 API calls
4) In our profiling tests, we have observed that 88% of the time is
actually spent in CreateFile. In an effort to narrow down the
problem, we wrote the following test code that just opens and closes
all the files (i.e. we don't bother reading, since it contributes
negligibly.):
void TraverseTree(string &strSearchPath,
int& rnCount)
{
string strCurrentPath;
string strExtension;
string strSOPUID;
HANDLE hFile;
HANDLE hFind;
WIN32_FIND_DATA find_data;
string strWildcardPath = strSearchPath + "\\*.*";
hFind = FindFirstFile(strWildcardPath.c_str(), &find_data);
BOOL bMoreFiles = TRUE;
if (hFind == INVALID_HANDLE_VALUE)
bMoreFiles = FALSE;
while (bMoreFiles)
{
strCurrentPath = strSearchPath + "\\" +
string(find_data.cFileName);
if (strcmp(find_data.cFileName, ".") != 0 &&
strcmp(find_data.cFileName, "..") != 0)
{
if (find_data.dwFileAttributes & _A_SUBDIR)
{
TraverseTree(strCurrentPath, rnCount, rnNumOpenFailed);
}
else
{
rnCount++;
hFile = CreateFile(strCurrentPath.c_str(),
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL |
FILE_FLAG_SEQUENTIAL_SCAN,
NULL);
CloseHandle(hFile);
}
}
bMoreFiles = FindNextFile(hFind, &find_data);
}
FindClose(hFind);
}
5) For 614 data files, the above code takes about 35 seconds to run on
a Dell P4 2.4 GHz w/ a 70GB IDE drive with XP Pro using NTFS. There
were 5 subdirectories in the hierarchy, with each subdirectory
containing an average of 100+ files.
6) We tried another test in which we put all 614 files into a single
directory and executed the following code:
hFind = FindFirstFile(szSearchPath, &file_data);
int nNumFiles = 0;
int nNumFailed = 0;
BOOL bMoreFiles = TRUE;
while (bMoreFiles)
{
if (strcmp(file_data.cFileName, ".") != 0 &&
strcmp(file_data.cFileName, "..") != 0)
{
strcpy(szPathname, szRootPath);
strcat(szPathname, file_data.cFileName);
nNumFiles++;
hFile = CreateFile(szPathname,
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL |
FILE_FLAG_SEQUENTIAL_SCAN,
NULL);
if (hFile == INVALID_HANDLE_VALUE)
nNumFailed++;
CloseHandle(hFile);
}
bMoreFiles = FindNextFile(hFind, &file_data);
}
FindClose(hFind);
7) Strangely, this took about 7 seconds on the same machine--a
five-fold improvement. (The machine was rebooted between each and
every test to ensure that nothing was being cached. After the reboot,
time was alloted for the disk activity to "settle" down.)
8) To eliminate the possibility that the performance discrepancy had
anything to do with the recursive nature of the first test, we did
another test in which we enumerated all the paths of all 614 files
organized in the manner described in 1), stuck the paths into an
array, then simply iterated through the array, opening and closing the
files. Even in that case, in which no recursion was involved in the
opening of the files, it still took 35 seconds.
9) All of the above leads me to believe that there's something about
the fact that the files are organized in sub-directories (as opposed
just in a single directory) that seems to slow things down.
So, my questions are:
a) Why the discrepancy between the two tests?
b) How does NTFS actually organize files on the disk physically? Are
directories just a logical concept, or do they have some physical
meaning in terms of how the files are organized on the disk? Some
URLs to papers on this subject would be useful.
c) Does the rate of file opening (35 seconds / 614 files = 50-60
milliseconds) seem unusually slow in the hiearchical case? Is it
about right in the non-hiearchical case (7 seconds /614 files = 10
milliseconds)?
d) Besides getting better hardware, What else can we do to optimize
this process in software? Are we using the optimal flags in
CreateFile?
e) The pathnames that we actually use are quite long--approaching 200
characters. Does this make a difference? |
Clarification of Question by
stormin-ga
on
09 Jul 2003 10:47 PDT
Hi mathtalk-ga,
It's unmanaged (native) code.
BTW, we also did another test in which we copied the same 614 files
over 10 subdirectories (i.e. about 60 files per directory), expecting
that it would take longer than 35 seconds, but in fact, it only took
about 13 seconds. Our current theory is that perhaps in that original
directory (the one that took 35 seconds), the files were spread out on
the disk, resulting in longer seek times, whereas in the other cases,
when we copied them, they resided in one place on the disk. But who
really knows...just a guess. :) I'm also wondering whether we're
approaching the theoretical limit of how fast we can actually open
files. I mean, if the average seek time on an IDE drive is about
8-10ms, can we really expect anything better than 100 files/sec?
stormin-ga
|
Clarification of Question by
stormin-ga
on
10 Jul 2003 11:28 PDT
Hi mathtalk-ga,
Thanks for taking the time to work on this problem.
Having never actually used a RAMDisk, I'm not familiar with how they
work. Is it actually possible to just load the first 4K of each file
into memory like that? I should probably clarify how our app works.
In our hiearchy of data files, each directory (which contains
subdirectories, which contain the files) represents a set of data that
the user would view during a given session. We don't know ahead of
time which set of data the user will choose to view. So, it would
seem to me that we would have to traverse all our directories and load
the first 4K (to complicate matters, it isn't always just the first
4K--sometimes it's more and the only way we can tell is to by parsing
the data) of all our files into the RAM disk. Since it's not uncommon
for us to have tens of thousands of data files, it would seem that it
would take some time to preload the RAM disk with the 4K segments at
bootup, and I'm not sure how feasible it is.
We've been doing some more thinking about this, and we're thinking of
doing a hybrid between the approach we're taking now, and the approach
we took in our last version of the software, in which we cached some
of the necessary information in that 4K header in a database.
Database fetches are obviously fast.
With respect to your comment about IDE drives, is it fair to say that
the average seek time represents the theoretical least amount of time
between file opens?
Having said that though, we are still interested in answers to the
questions we posed for future reference.
stormin-ga
|