![]() |
|
![]() | ||
|
Subject:
Database Products Focused on Big and Fast
Category: Computers Asked by: jeffrey_b-ga List Price: $100.00 |
Posted:
18 Jul 2004 06:49 PDT
Expires: 17 Aug 2004 06:49 PDT Question ID: 375703 |
What are some alternative ways for a small business to process very large database files; fast? We need to manipulate static datasets (updated periodically -e.g. weekly/monthly) containing 30 - 100 million records. Records are approx +/- 300 bytes each. The total dataset is 10GB - 1TB. We use these databases to perform automated cross-reference lookups (analysis) against other, smaller databases. One 'job' could require matching a 250,000 record table against a 50 million record table on 15 different (but predefined) keys. Speed is of critical importance (the example job would be allocated less than a half-hour on a fast, dedicated PC). Our research uncovered kx.com. This company describes our niche problem exactly; and offers a database product solution. Trouble is, their product costs $60,000 and requires learning a new scripting language. We need to solve this problem on a $10K budget. In the past, we've used FoxPro and PostgresQL. FoxPro is lightning fast but very limited in database size (I think 1GB is the max size dataset). PostgresQL is inexpensive (open source) and allows for huge datasets (virtually unlimited) but it's really slow. (We believe that traditional RDBMS systems built with real-time transactional processing applications in mind have too much overhead to be effective for our analysis application. We don't require all the transactional integrity, roll-back, etc. functionality of a traditional RDBMS.) Is there any case study research avaialable that discusses how others have solved this problem? We'd like to receive a short list of 3 to 5 products worthy of detailed evaluation. |
![]() | ||
|
There is no answer at this time. |
![]() | ||
|
Subject:
Re: Database Products Focused on Big and Fast
From: curious7-ga on 18 Jul 2004 19:41 PDT |
Lots of RAM, Opteron/Itanium, 64-bit Linux, MySQL w/MyISAM. Max table size 8 million TB, so OS file size is limit. Linux does 2TB files, more with special filesystems. Might be worth a look--if you haven't already. Current info at bottom of http://dev.mysql.com/doc/mysql/en/Table_size.html Get expert input on a cruise(!) http://www.mysql.com/news-and-events/events/swell-2004.html |
Subject:
Re: Database Products Focused on Big and Fast
From: scubapup-ga on 19 Jul 2004 11:48 PDT |
If you could get as much memory as you can afford and assuming that the data would fit, then I'd suggest using main memory databases. but of course even the usual commercial databases will try and load/access they're data as much as possible. You could also program your own efficient hashed table approach to solving this problem. your problem also looks like a good candidate for a computational grid solution. i wonder if a distributed/cluster type database like the ones from ms sql server, oracle or mysql could help you especially with such a limited budget |
Subject:
Re: Database Products Focused on Big and Fast
From: vellmont-ga on 19 Jul 2004 14:37 PDT |
Have you looked at all into performance tuning PostgreSQL? There's several parameters in the database that can be adjusted for higher performance. A good start can be found <a href="http://www.postgresql.org/docs/aw_pgsql_book/hw_performance/">here</a>. If you haven't already do so you should do query optimization and add index where appropriate, and/or modify your queries. I would tend to agree with the other comments made that 64 bit AMD chips and gobs of memory could increase your RDBMS performance substatially without costing and arm and a leg. |
Subject:
Re: Database Products Focused on Big and Fast
From: lynnm-ga on 21 Jul 2004 17:51 PDT |
Ants software (www.antssoftware.com) is fast but I don't know what its size limits are. |
Subject:
Re: Database Products Focused on Big and Fast
From: mbkirk-ga on 31 Jul 2004 07:27 PDT |
You really need to work on determining where you're spending all the time in your current product. Otherwise you're just shooting in the dark. Is the machine compute bound (is there idle time)? Is it disk bound? Is it spending most of its time waiting for I/O? What OS? What hardware? What are the revs? There's a very long list of stuff you can look at. Once you've done that, then... First -- as vellemont suggested, tune the database, and tune the application. Many database performance problems are directly related to application design and database tuning. Sometimes these are really simple fixes -- for example, generating indices of the primary search key of a table dramatically improves things. So does fetching large amounts of data at once so as to reduce round trips to the database (e.g. tweaking the application). These changes can also improve performance substantially. Second thing -- you don't mention what your current environment is -- what sort of hardware are you running on, for example? Do you have enough memory? Single processor/multi-processor? Which one? These are things that are frequently easier to change than the database itself. You can get order-of-magnitude improvements off of just having enough memory. Multiple processors, faster backplanes, memory busses all improve things substantially. If you're running ancient equipment (e.g. Pentium or Pentium 2, or early Pentium 3, and you've already tuned the database and application, you can get a substantial improvement out of a 3 GHz P4 -- something like a recent mid-range HP ProLiant DL-380 costs far less than $60K, has up to 5 RAID-organized drives (15K RPMS) on a hardware RAID controller, and a nice fast backplane. It can support 12 GB of memory. But beware -- if you don't optimize the application/database then you'll waste any higher end hardware you buy. |
Subject:
Re: Database Products Focused on Big and Fast
From: eugene2182-ga on 05 Aug 2004 16:51 PDT |
Microsoft SQL Server 2000 handles Gigabytes of Data very effectively, is easy to use, and deploys rapidly, I recomend that for your problem. |
Subject:
Re: Database Products Focused on Big and Fast
From: mbkirk-ga on 12 Aug 2004 16:56 PDT |
But he doesn't appear to have quantified what his problem actually is (other than "it's slow"). Can't solve this problem effectively when you don't know what is actually causing it. |
Subject:
Re: Database Products Focused on Big and Fast
From: theo_briscoe-ga on 21 Aug 2004 13:56 PDT |
What is the exact dataset you are working with? Can you provide some example data somewhere, or an example on the Internet? And finally, what is the exact problem or type of problem you would like to solve. I have experience in Operation Research and software/database development. This sound like a very interesting problem, I would like to help you solve. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |