Google Answers: Database Products Focused on Big and Fast

View Question

Q: Database Products Focused on Big and Fast ( No Answer, 8 Comments )

Question

Subject: Database Products Focused on Big and Fast
Category: Computers
Asked by: jeffrey_b-ga
List Price: $100.00

Posted: 18 Jul 2004 06:49 PDT
Expires: 17 Aug 2004 06:49 PDT
Question ID: 375703

What are some alternative ways for a small business to process very
large database files; fast? We need to manipulate static datasets
(updated periodically -e.g. weekly/monthly) containing 30 - 100
million records. Records are approx +/- 300 bytes each. The total
dataset is 10GB - 1TB. We use these databases to perform automated
cross-reference lookups (analysis) against other, smaller databases.
One 'job' could require matching a 250,000 record table against a 50
million record table on 15 different (but predefined) keys. Speed is
of critical importance (the example job would be allocated less than a
half-hour on a fast, dedicated PC).

Our research uncovered kx.com. This company describes our niche
problem exactly; and offers a database product solution. Trouble is,
their product costs $60,000 and requires learning a new scripting
language. We need to solve this problem on a $10K budget.

In the past, we've used FoxPro and PostgresQL. FoxPro is lightning
fast but very limited in database size (I think 1GB is the max size
dataset). PostgresQL is inexpensive (open source) and allows for huge
datasets (virtually unlimited) but it's really slow. (We believe that
traditional RDBMS systems built with real-time transactional
processing applications in mind have too much overhead to be effective
for our analysis application. We don't require all the transactional
integrity, roll-back, etc. functionality of a traditional RDBMS.)

Is there any case study research avaialable that discusses how others
have solved this problem? We'd like to receive a short list of 3 to 5
products worthy of detailed evaluation.

Answer

There is no answer at this time.

Comments

Subject: Re: Database Products Focused on Big and Fast
From: curious7-ga on 18 Jul 2004 19:41 PDT

Lots of RAM, Opteron/Itanium, 64-bit Linux, MySQL w/MyISAM.  Max table
size 8 million TB, so OS file size is limit.  Linux does 2TB files,
more with special filesystems.  Might be worth a look--if you haven't
already.  Current info at bottom of
http://dev.mysql.com/doc/mysql/en/Table_size.html  Get expert input on
a cruise(!) http://www.mysql.com/news-and-events/events/swell-2004.html

Subject: Re: Database Products Focused on Big and Fast
From: scubapup-ga on 19 Jul 2004 11:48 PDT

If you could get as much memory as you can afford and assuming that
the data would fit, then I'd suggest using main memory databases. but
of course even the usual commercial databases will try and load/access
they're data as much as possible. You could also program your own
efficient hashed table approach to solving this problem.

your problem also looks like a good candidate for a computational grid solution.

i wonder if a distributed/cluster type database like the ones from ms
sql server, oracle or mysql could help you especially with such a
limited budget

Subject: Re: Database Products Focused on Big and Fast
From: vellmont-ga on 19 Jul 2004 14:37 PDT

Have you looked at all into performance tuning PostgreSQL?  There's
several parameters in the database that can be adjusted for higher
performance.  A good start can be found
<a href="http://www.postgresql.org/docs/aw_pgsql_book/hw_performance/">here</a>.
 If you haven't already do so you should do query optimization and add
index where appropriate, and/or modify your queries.  I would tend to
agree with the other comments made that 64 bit AMD chips and gobs of
memory could increase your RDBMS performance substatially without
costing and arm and a leg.

Subject: Re: Database Products Focused on Big and Fast
From: lynnm-ga on 21 Jul 2004 17:51 PDT

Ants software (www.antssoftware.com) is fast but I don't know what its
size limits are.

Subject: Re: Database Products Focused on Big and Fast
From: mbkirk-ga on 31 Jul 2004 07:27 PDT

You really need to work on determining where you're spending all the
time in your current product.  Otherwise you're just shooting in the
dark.

Is the machine compute bound (is there idle time)?  Is it disk bound? 
Is it spending most of its time waiting for I/O?  What OS?  What
hardware?  What are the revs?  There's a very long list of stuff you
can look at.

Once you've done that, then...

First -- as vellemont suggested, tune the database, and tune the
application.  Many database performance problems are directly related
to application design and database tuning.  Sometimes these are really
simple fixes -- for example, generating indices of the primary search
key  of a table dramatically improves things.  So does fetching large
amounts of data at once so as to reduce round trips to the database
(e.g. tweaking the application).  These changes can also improve
performance substantially.

Second thing -- you don't mention what your current environment is --
what sort of hardware are you running on, for example?  Do you have
enough memory?  Single processor/multi-processor?  Which one?  These
are things that are frequently easier to change than the database
itself.  You can get order-of-magnitude improvements off of just
having enough memory.  Multiple processors, faster backplanes, memory
busses all improve things substantially.

If you're running ancient equipment (e.g. Pentium or Pentium 2, or
early Pentium 3, and you've already tuned the database and
application, you can get a substantial improvement out of a 3 GHz P4
-- something like a recent mid-range HP ProLiant DL-380 costs far less
than $60K, has up to 5 RAID-organized drives (15K RPMS) on a hardware
RAID controller, and a nice fast backplane.  It can support 12 GB of
memory.

But beware -- if you don't optimize the application/database then
you'll waste any higher end hardware you buy.

Subject: Re: Database Products Focused on Big and Fast
From: eugene2182-ga on 05 Aug 2004 16:51 PDT

Microsoft SQL Server 2000 handles Gigabytes of Data very effectively,
is easy to use, and deploys rapidly, I recomend that for your problem.

Subject: Re: Database Products Focused on Big and Fast
From: mbkirk-ga on 12 Aug 2004 16:56 PDT

But he doesn't appear to have quantified what his problem actually is
(other than "it's slow").  Can't solve this problem effectively when
you don't know what is actually causing it.

Subject: Re: Database Products Focused on Big and Fast
From: theo_briscoe-ga on 21 Aug 2004 13:56 PDT

What is the exact dataset you are working with? 
Can you provide some example data somewhere, or an example on the Internet? 
And finally, what is the exact problem or type of problem you would
like to solve.

I have experience in Operation Research and software/database development. 
This sound like a very interesting problem, I would like to help you solve.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy