Hi axle-ga,
You pose a very interesting question which can become overwhelming if
not approached cautiously :)
Let's consider each component one at a time:
HIGH QUALITY ARCHIVING OF IMAGES
While this may sounds very good, it is easy to get carried away with
having the "best possible" image quality retained. To give you a
parallel example, the MP3 phenomenon is based on a filetype that can
drastically reduce the size of a piece of audio, and yet the sounds is
virtually the same as the equivalent WAV file to the human ear.
Similarly, images can grow exponentially in size, yet the incremental
increase in quality quick becomes very small. I haven't forgotten that
you are dealing with negatives...but this matter is still prevalent :)
You mentioned that you want to digitize these images for backup
purposes and that you want to be able to print the images out at 10" x
8". Since you are not going to be doing any intensive photo editing,
the objective will be to maximize quality at that size for printing
with a suitable printer.
The best solution for these requirements would be using the Windows
Bitmap File Format (extension BMP). There are several reasons for
this, of which some are:
1. Up to 24-bit color (2^24 = 16.7 Million colors)
2. Effective compression without compromising quality (unlike TIFF)
3. Device independant
4. Easily converted to other formats if necessary (using software such
as Adobe Photoshop -- this is a large factor in your "future-proof"
and "avoid scanning the images again in the future" requirements)
Further technical specifications of the file format can be found here:
http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/BMP.txt
As far as the size is concerned, here is a process that is always
personally helpful when determing the right DPI level for an archive:
1. Scan at half of the DPI level that your scanner is capable. In your
case 2000 DPI
2. Increase by 50% (to 3000, then 3500, etc.) until you can no longer
see a significant quality difference in the corresponding 10x8 image
that results. It is difficult to nail this down since it can vary by
the type/brand of negative being considered.
3. Continue using this level of quality with the rest of the images to
maintain consistentcy.
LOW QUALITY IMAGES FOR INDEXING AND BROWSING
We will also need a lower quality format to use for indexing and rapid
browsing purposes. For this, the JPEG file format is sufficient since
you will not be printing any of these picture. I have found from
experience that 1024 x 768 is fine for scanning negatives (since you
don't want to make it larger than the resolution of your monitor). If
you have a lower resolution monitor you may even want to go lower than
that. Since the JPEG standard is somewhat economical in how it encodes
images, it really is not worth using a Q95 level since, in my opinion
you will be sacrificing quite a bit of space for almost unnoticeable
quality increase. If you are having doubts about the quality level
here, follow the process outlined above to obtain the right DPI level.
Also keep in mind that you may want to build on your archive in the
future, so its always a good idea to keep a little cushion for that as
well :)
Technical specifications of the JPEG image format can be found here:
http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/JPEG.txt
STORING/INDEXING IMAGES
The ideal way of actually indexing all these images would be to set up
a database system (the pure amount of information here warrants this
step). This will allow you to organize the images as records such as
this one:
Images Table
Image ID | Small Image (hyperlink) | Large Image (hyperlink) |
Comments
// each row represents an image
A program such as MS Access stores information in this manner, but
again, the size and number of entries here may slow down the speed at
which you able to move through the information. You can perhaps split
this into several database such that the size of any one file doesn't
get too big and this way you could even sort the images by category if
you like. In any case, you will be able to store any additional
comments or tags you desire by simply adding another column to your
table. This will fulfill your future need for adding additional
attributes to each image, such as "date, time, location, subject
matter, description, caption, etc..." that you mentioned in your
question.
If you want a full-scale Database Management System (DBMS), there are
several solutions such as the Microsoft SQL Server, Oracle, IBM's DB2,
and so forth that you should look into. Personally I find the SQL
Server powerful while having a quicker learning curve compared to the
others.
I know this will be a very time-consuming project, but you seem up to
the task! One final suggestion is that you name all of your images
logically so that they can be found/grouped easily in the future :)
Anyhow, it was a pleasure answering your question since I know how
many people struggle with this kind of project...if you have any
problems understanding the information above please post a
clarification.
Cheers and happy archiving!
answerguru-ga |
Request for Answer Clarification by
axle-ga
on
02 Sep 2002 05:17 PDT
Dear answerguru-ga
Thank you for your response, but I must admit I am surprised at your
answer for a number of reasons:
1) You describe the analogy with mp3 (which is a good one) and one I am
well acquainted with as my last 6 month project was to archive my
1500+ CDs to mp3. So we both agree that it is not necessarily vital to
preserve every last bit of information, but you then recommend the BMP
format which is lossless.
2) You say:
> 2. Effective compression without compromising quality (unlike TIFF)
But as as far as I know, only the 1, 2, 4 and 8bpp BMPs can be
compressed. The truecolour 24 bits cannot.
from http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/BMP.txt
Windows versions 3.0 and later support run-length encoded (RLE) formats for
compressing bitmaps that use 4 bits per pixel and 8 bits per pixel.
3) For my criteria of a 'manageable' archive, I would hope to fit my
10,000 images onto a single IDE drives. With current technology, this
gives me 160GB or 16MB per image. 42bpp scans are 140MB, 24bpp are
70MB. If I go for 24bpp, I need ~5:1 compression.
If I go for a lossless compression, my choice would appear to be
something like 1.5 for PNG or 2.5-3 for jpeg2000 or bitjazz (neither
of which I really think are widely enough supported)
(in a quick test gzip'ing a 24bpp image reduces to 82%, bz2'ing to 69%)
4) Having read many reviews of negative scanners, the reason I bought
the Nikon was because of it's 4000dpi capability, in many tests it was
proved that 4000dpi produced a marked improvement over the older
~2500dpi scanners. I'm not going to scan at a lower resolution than my
equipment allows, although I am fairly sure that the extra bits of
42bpp is spurious accuracy and is not necessary.
5) I am based almost entirely in the Linux world and would prefer to
to avoid proprietry standards (such as BMP).
|
Clarification of Answer by
answerguru-ga
on
02 Sep 2002 07:59 PDT
Hi axle-ga,
Let me address each of your concerns one at a time - I'll use the
numbering you've used in your clarification:
1. I agree with you that the BMP format is lossless, and this is part
of the reason that I chose it. It is important to understand that
although far more information is retained, the algorithm used to store
the data representing the image is extremely effective. So you get the
best of both worlds - "lossless" quality that is stored very
efficiently (relative to the other heavy duty formats such as GIF).
2. We are both thinking of "compression" here in two different ways,
and I realize how easily this misperception can be made. The
compression I'm talking about, again, is the embedded into the
algorithm that is used to create BMP files. This is the same for all
levels of quality in BMP files - it is the reason that they are all
classified as BMP files!
3. I would have to strongly advise against using either of PNG or
JPEG2000 in your scenario, but your test that revealed that a BMP was
reduced to 82% of its original size is an indicator that its original
compression algorithm is quite efficient (believe it or not) and of
course the fact that it is lossless. You will see that if you try the
same thing with other lossless formats, the BMP will originally be
smaller, but the others will zip more effectively. The sizes of the
final zipped files will be relatively close because they have all
undergone the same level of compression. Personally, I'd rather the
bulk of this take place when the image is created, thus reducing risk
of future problems if the images need to be brought out of zipped
format. To be honest, I don't think you'll get that number of pictures
into 160GB unless you lower your resolution somewhat (which of course
is not reasonable and I don't suggest you do so). My suggestion: you
will take a fair bit of time to fill up 160GB, so just go ahead and do
that, and in a few months you can add a second IDE drive to your
system. You would just be giving up too much quality (zipped or not)
by choosing a lossless format in order to conform to 160GB.
4. I didn't realize that was an issue for your particular scanner...is
there a problem when you try to scan in this format at 4000dpi? If
not, I say just go ahead scanning at that level...
5. Your environment was not made clear in the question...regardless I
know that BMP is fully supported by most Linux-based graphics
programs. I'm not quite sure what you are alluding to here..
I hope this has given you some insight into why I chose the BMP
format. I think we can close the first three questions, since there's
not much more I can say about that, but if you would like to continue
with 4 and 5 I would be happy to oblige :)
Thanks,
answerguru-ga
|
Request for Answer Clarification by
axle-ga
on
06 Sep 2002 17:46 PDT
Dear AG,
I still don't agree with your reasoning for choosing BMPs (over other
lossless formats): 24 bit BMPs are *not* compressed and are merely
stored as a [small] header followed by the pixel data in RGB triples -
there is no 'algorithm' for storing them. The fact that I can gzip the
files at all implies that they are not compressed!
I also don't understand why you started by confirming that lossy is
not a bad thing, but then recommended a lossless format.
I think it is important to know the answer to the following question:
Given the likely uses for archived photographs, is it possible to
distinguish between a Q90+ JPEG and the original uncompressed image?
If to all intents and purpose it is /not/ possible to distinguish then
I can see no reason to avoid JPG. If it /is/ possible then a lossless
image (of whatever type) is the obvious answer. By this I mean not
just me looking at a couple of images on my laptop - I wan't to know
if some hypothetical picture editor of The Sunday Glossy would bemoan
the receipt of anything about a pristine scan.
If lossless is the answer, and the quality of the scanned images is
paramount (and more important than the [large!] saving on storage
space) then it would seem foolish to limit the images to 'only' 24bit
- the Nikon is quite capable as a 42bpp scanner. Extending the file in
this way would appear to limit the lossless format choice to
TIFF.
What would be ideal would be a compromise - a 32bpp format with
nominally 10 bits per colour, possibly with a heavier weighting to
green then red then blue.
As you rightly say, 160GB is going to take a while to fill, although
at 140MB an image I may only get a thousand images per drive. I do not
require the final images to be online at all times, so extra drives
could be swapped in and out of the system as necessary.
To address Gareth's point about DVDs - Even a ~10GB DVD would only be
able to store about 70 140MB images - barely two rolls of film! I
would require maybe 150 discs to store the archive. As a backup of the
HDs this is not a bad idea, although I wouldn't fancy duplicating a
set :)
Most of the time I will not need access to these high-resolution
versions, and the screen-sized previews can be used, at maybe only
~150K per image, I could store the entire browseable archive in less
than a couple of gigabytes on my everyday laptop...
To summarize:
* If one day I may regret a lossy format, then lossless it is.
* If this is the case, then I might as well go the whole way and scan
at 42bpp to preserve as much of the original detail of the negatives
as possible.
* If 42bpp is used then it looks as if my choice is made for me - TIFF.
|
Clarification of Answer by
answerguru-ga
on
06 Sep 2002 18:14 PDT
Greetings again axle-ga,
To avoid another long-winded answer, I'm going to responds to your
summarizing questions:
1. I feel that you will come to regret choosing a lossy format. Put
simply, if I can generalize this class of format, I would say it's
ultimate purpose cannot extend beyond screen viewing (be it remotely
or locally) without noticeable quality loss.
2. I was attempting to provide a middle of the road solution by
choosing a 24-bit lossless format, however, I completely understand
where you are coming from when you state that you want to take full
advantage of your scanner and retain as much quality as possible. This
being the case (that I admit I did not consider initially), I do
believe that TIFF is the way to go.
Your decision appears to be made :)
Thanks for using Google Answers.
answerguru-ga
|