Google Answers Logo
View Question
 
Q: details of making a physical copy of a harddrive ( Answered 4 out of 5 stars,   2 Comments )
Question  
Subject: details of making a physical copy of a harddrive
Category: Computers > Security
Asked by: grthumongous-ga
List Price: $20.00
Posted: 12 Apr 2004 16:32 PDT
Expires: 12 May 2004 16:32 PDT
Question ID: 329181
details of making an exact physical copy (clone) of a harddrive.

When making an exact bit for bit clone of a harddrive how does one
know with certainty that the source and destination volumes are
identical?
I have read a little about an M D 5 hash value.  As I understand it,
this value is generated at the volume level.  A single value for the
whole volume.
If the source volume hash value is identical to the destination volume hash
value then the volumes are identical. 

I received new information today from a purported subject matter
expert that this hash value is of little relevance because it is
rarely used to certify or verify that the source and destination are,
without doubt, identical.  They went on to say that the volume-level
hash is of little importance because it would only expose if whole
files were added or subtracted from the clone, not whether an existing
file was modified.  They concluded that a file level hash value for
each of 100 000 or 200 000 files would be necessary to be of any use
and would be costly.

I used an example of how an internet download of a file concludes with
a report of the exact byte count downloaded.  Whether we write it down
is optional, but the actual transmission software does it
automatically.
But the subject matter expert was unswayed. I had to defer to to them,
at least until I can ask my own subject matter expert (aht - ga ?) in
the GAT.
   
Is a volume-level hash for source and destination useful?
If not, then what is required to certify that the two volume instances
are identical and remain certifiably so in the future?

Does their proprietary cloner software do an inline "verify" as it
writes each sector, immediately reading it back from the output media?
How can they be so sure it worked?

Clarification of Question by grthumongous-ga on 12 Apr 2004 16:34 PDT
while it should not matter, it is N T F S

Request for Question Clarification by maniac-ga on 12 Apr 2004 17:27 PDT
Hello Grthumongous,

A few years ago (well - maybe 15 by now...) I wrote a disk duplicator
and comparison program and can certainly describe how they worked. I
am also somewhat familiar with how multiple drive duplicators work as
well. If describing those would be acceptable - let me know so I can
produce a proper answer.

Your SME is most likely correct in that an MD5 checksum is not used to
compare disks. It is far simpler to read the source / destination disk
and compare on a block by block basis instead. Not sure how file level
hash values come into the discussion, though it is how tools such as
"tripwire" is used to help detect on line modification of key system
files.

You make a comment at the end about "proprietary cloner software";
perhaps if you identify the product I can dig up some data and comment
on that as well.

  --Maniac

Clarification of Question by grthumongous-ga on 12 Apr 2004 18:30 PDT
aht, thanks for popping up.  You succinctly expressed what I was
trying to say when you stated,
"the hash is essentially performed on a single 'file', that 'file'
being the entire linear/sequential contents of the drive".
My contrived term, the "volume-level" hash was imprecise. 
Your link to the logicube MD5 product was excellent.  That vendor
explains how they produce just such a hash as you described as a means
to certify and verify
that the two copies are identical. I appreciate your input.

Maniac-ga, 
Please proceed with an offical Answer.
Based on aht's comments and my review of them I hope it is clear to
you that I would be looking for some means, mechanism, method other
than hashing to certify a destination drive is identical.  If hashing
is the only way....

As for the "proprietary cloner software", the SME wouldn't tell me. Grrr.
Answer  
Subject: Re: details of making a physical copy of a harddrive
Answered By: maniac-ga on 13 Apr 2004 15:42 PDT
Rated:4 out of 5 stars
 
Hello Grthumongous,

OK. I'll first describe a method that worked "way back when" and then
summarize how more modern disk duplicators work.

I ended up writing the program because a customer test failed. When we
did the analysis, it turns out the boot sector was not initialized. It
turns out the "copy volume" utility we were using on a system did not
initialize the boot sector. Sigh.

The copy program itself was quite simple. It had a look up table of
the disk types we used (300 Mbyte, 13 Mbyte, 5 Mbyte) and based on the
type of disk - looked up the geometry of the disk
 - number of cylinders
 - number of platters
 - number of blocks per cylinder
Then there were three nested loops, one for the cylinder, one for the
platter, and the third for the block within the cylinder / platter.
Within the innermost loop, the steps were
 - write formatting information to the block on destination disk
 - read block from source disk
 - write block to destination disk
The format step was needed since we occasionally received blank disks
which had no formatting information included.

The compare disk program was the same code with the innermost loop replaced with:
 - read block from source disk
 - read block from destination disk
 - compare the two blocks, byte by byte
and generating an error message if any byte did not compare.

These programs worked quite efficiently. The maximum data transfer
rates for the disks was about 3 Mbyte/sec and with the largest disk
(300 Mbyte), the programs would take a few minutes to complete. After
we wrote these two programs, we never had the same problem with
customer testing again.

That was in the mid 80's. Today, disks are a lot faster and store a
lot more information but the basic approach is the same. The drive
mechanisms are a lot better now as well - for example, there is "bad
block replacement" where the drive will automatically stop using a bad
(or marginal) disk block and replace it with an unused block. In this
way - operating systems (and hardware duplicators) generally treat the
disk as a sequence of blocks starting at zero and going up to the
maximum block number. That simplifies the design of disk duplication
programs quite a bit.

For example, on a Linux (or Unix) system, you can use a command such as
  dd if=/dev/hda of=/dev/hdb bs=32768
to copy disk "/dev/hda" to "/dev/hdb" and can use a command such as
  cmp /dev/hda /dev/hdb
to compare two disks. Neither program generate any messages unless an
error occurs. I have actually used commands like these within the last
few years and they work just fine.

As another example, you can download a package such as "Tom's Root Boot Disk"
  http://www.toms.net/rb/
or described briefly at
  http://www.toms.net/rb/tomsrtbt.FAQ
which describes how to download and create a boot floppy that has the
utilities described above. In this way, you can take any old PC and
turn it into a "poor man's" disk duplicator.

For hardware disk duplicators, the process is basically the same
 - copy the data as fast as you can from one disk to another
 - compare the data as fast as you can (block by block / byte by byte)
though some devices are certainly more capable when they understand
the format of the file system they are copying [but then it won't
necessarily be an identical copy].

Here are a few examples:
  http://www.aberdeeninc.com/abcatg/HDP620.htm
shows a one to two disk duplicator that also understands common
Microsoft Windows disk formats and can "scale" the disk partitions to
fit the destination drive. An option near the bottom also understands
the disk format and only copies the blocks in use.
  http://www.ics-iq.com/show_item_267.cfm
shows a three disk duplicator that runs about 1000 times faster than
the old program I wrote.
  http://www.greystoneds.com/downloads/dat600.pdf
a one to six disk duplicator which can use a PC (nice menus, etc.) to
control the duplication, report errors, etc.

A few software examples include:
http://www.softforall.com/Utilities/Backup/R-Drive_Image_Hard_Disk_Backup_Software09020020.htm
which is a more general utility that generates "disk image" files
which can be put on any media and then used to clone disk (or recover
from a disk failure).
  http://www.symantec.com/sabu/ghost/ghost_personal/
Norton "Ghost" which is a very capable utility for duplicating /
compare disks. Scroll down and see several tutorials if you want to do
some advanced duplicating tasks.

For additional information on two Linux programs I referred to, check out
  http://www.die.net/doc/linux/man/man1/dd.1.html
  http://www.die.net/doc/linux/man/man1/cmp.1.html
or search for
  man page cmp linux
  man page dd linux

To search for more general information on this topic, try phrases such as
  disk image duplicator
  hard disk duplicator software
  hard disk duplicator hardware
  [product name here] features
  [product name here] problems

If this is not enough on the topic or some part is unclear, please
make a clarification request.

  --Maniac

Request for Answer Clarification by grthumongous-ga on 13 Apr 2004 16:08 PDT
Dear maniac,
so to recap, one duplicator tool generates a Hash to act a control number.
If the source and destination have the same control number ==> identical.

Another tool does a copy operation, followed by a compare operation in
series. The compare consists of reading a block from the source disk,
reading a block from the destination disk and comparing them bit by
bit.

Please explain and expand on how a clone made without the Hash can be
certified as "identical" to the satisfaction of an independent 3rd
party.

Clarification of Answer by maniac-ga on 14 Apr 2004 04:57 PDT
Hello Grthumongous,

If a third party needs to confirm the results (without a hash), the choices are:
 - that third party would repeat the comparison of the original disk to the copy
OR
 - that third party would review the process used to make the
duplicate (to ensure it works properly) and
 - the duplication company would follow that approved process (and be audited)
This latter method in brief is what companies that get ISO 9001
certification go through.

Note however, the use of a hash (e.g., MD5, simple checksum) does not
guarantee that the original matches the copy. It only guarantees that
the hash is the same. It protects against accidental (or casual)
modification of the copy - but not a determined effort by a "bad guy".
What the bad guy can do is this:
 - compute the hash value before modifying the copy
 - make any modifications he wants to the copy
 - compute the new hash value after modification
 - find a location in the copy that is not used / modify it so the new
hash value matches the original value
This kind of technique is sometimes used [in a positive way] to apply
patches to flight software. If you have to patch (modify) the
executable in an Operational Flight Program (OFP), you also have to
update the checksum value after the patch is applied - so when the OFP
starts the next time, it will get the proper result from its power on
self test (which verifies the OFP is not damaged).

So in some ways - the use of a hash as a "control number" may be more
convenient (does not require the original disk) in a comparison but is
not as secure as a complete comparison of the data on disk.

  --Maniac
grthumongous-ga rated this answer:4 out of 5 stars
hallo maniac
Danka

Comments  
Subject: Re: details of making a physical copy of a harddrive
From: bastian-ga on 12 Apr 2004 16:54 PDT
 
What is so important about copying the entire drive bit for bit?
Is it the exact same make and model?

Search google for:
forensic hard drive copy
"bit for bit" hard drive copy
Subject: Re: details of making a physical copy of a harddrive
From: aht-ga on 12 Apr 2004 17:15 PDT
 
grthumongous-ga:

I would not claim enough knowledge or expertise in this specific
application to question the word of the SME who informed you that MD5
hashes of disk volumes would not guarantee that the volume contents
are identical.

However, I think the difference in opinion comes down to how the MD5
hash is created. For example, this forensic product:

http://www.logicube.com/logicube/pressreleases/md5.asp

uses a standalone system and a bit-by-bit analysis of the source
drive, separate from any reliance on the file system used on the
drive. This treats the data as a stream, instead of individual,
randomly-accessible files. In this way, the hash is essentially
performed on a single 'file', that 'file' being the entire
linear/sequential contents of the drive; any change in the sequence of
the data, the length of the data, and the value of the data, would
therefore alter the hash.

The volume-level hash, if performed only on the file allocation table
contents, could conceivably be insufficient; it is possible to alter
the contents of a file without affecting its entry in the file
allocation tables. To generate an MD5 hash on the entire disk contents
could conceivably take the same amount of time and investment as
generating MD5 hashes for each individual file; in both cases, doing
it properly would require the scanning system to read each bit of data
on the drive.

Regards,

aht-ga
Google Answers Researcher

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy