Google Answers: Millions of Queries / fastest/best database and/or language to implement this.

View Question

Q: Millions of Queries / fastest/best database and/or language to implement this. ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: Millions of Queries / fastest/best database and/or language to implement this.
Category: Computers > Programming
Asked by: bnc_user-ga
List Price: $21.00

Posted: 31 Jul 2003 19:57 PDT
Expires: 30 Aug 2003 19:57 PDT
Question ID: 237612

I need to have an application developped, that is calculatinog the
best solution to a given problem. These application will continuesly
execute millions of database querries in
different tables, 1 table contains about 5 to 10'000 records, one
contains 12 Million records (with 3 data fields), and then there are
several tables with just a few records and fields.

The application should be running on one computer with Windows XP and
256+ MB. The application will also contains a GIS (geographical
information system) part (GIS component needed). I also dont want to
pay any licence fees for using the database. In the future I might
consider also a mutli agent approach to speed up the calculations.

My questions now are:

-What programming language or database would be the fastest to have my
application implemented in?

-Can this be done with C++ or Delphi (without a database such as
infobase) just using the programming language's internal
database/record commands? (if so how does this speed compare to using
a database - or why would you recommend still using a database?)

-What is your suggestion to implement this in (if you need more
information just ask me)

I need to know this, because before finding possible companies and
people who can create the application I want to know what these people
or companies need to have knowledge in.

Oh and if you know of some affordable and capable programmers or
companies who could create my software, feel free to list some
websites (this won't count towards my evaluation of the question
though - so you can also just leave it out  :-)

Oh and I prefer a pre-digested answer - meaning your own text, comment
and
opinion after looking through what you found on the net, rather then a
bunch of links, where I have to get the information myself.

Thank you.

Request for Question Clarification by mathtalk-ga on 01 Aug 2003 09:22 PDT

Hi, bnc_user-ga:

The design decision of whether to use a DBMS (database management
system) like Oracle or SQL Server in your application should take into
account some factors besides the information given in your request.

Sure, it would be nice to save on licensing fees, but you hint at one
reason that it might save money in the long run to "buy" database
software when saying that "in the future" you might consider a
multi-agent approach to speed up the calculations.  One of the most
critical purposes of DBMS software is handling the contention among
multiple users trying to access the database.

From the brief description given ("calculating the best solution to a
given problem", "contains a GIS component"), one might speculate that
the application involves a logistics optimization.  In any case an
important "dimension" of your design is whether the database is being
updated in a time-critical way.

A banking application, for example, must serialize "transactions"
against an account to assure that accounts are properly credited and
debited, to prevent overdrawing of deposits.  Your application doesn't
sound particularly "transactional", so perhaps the industrial strength
protections of multi-user transactional DBMS software is not needed.

For sheer speed there is nothing faster than maintaining all the
necessary data in memory, which can be done in C++ or another
programming language.  This would be especially attractive if all the
queries that need to be executed are of a very limited variety.  One
of the things that you "buy" in a relational database engine is the
ability to execute general SQL queries.

Let's make up a hypothetical application, and see how these ideas
might apply to your design decisions.  You have a trucking business,
and you want to schedule deliveries in a large region of the country.

Initially you will be the only user of your software, but if the
business is successful, you will need to "share" the scheduling
function across several offices in remote locations.  Hence it only
makes sense to design the application from the beginning for multiple
users, as the costs of changing such a fundamental design
characteristic will be "big" in relationship to the cost of a total
implementation project.

Some parts of the database will change slowly over time, e.g. the GIS
component, but will nonetheless require maintenance and updates on a
regular basis (perhaps weekly or monthly, depending on the criticality
of the routing information).

Other parts of the database (location of trucks and drivers) will
change more rapidly and (at least in part) as a result of the system
itself functions.  However these data changes take place on a
time-scale much slower than the operation of the optimization engine
itself (assuming that needs to produce a result in seconds rather than
hours), so I don't consider this as a "time-critical transactional"
requirement.

What would form an element of transactional design would be submission
and acceptance of bids to customers.  If you need the application to
track these business elements (and not merely to answer questions
about optimal routing), then the database management function becomes
more critical (since you don't want to book the same job twice or
worse, lose track of a customer commitment entirely).

The "modern" approach to systems architecture of "fastest possible"
applications would draw a distinction between whether the application
will (for now and for always) be executed on a single "local"
computer, or whether the application will become "distributed".  The
database becomes one of several "tiers" within the distributed
architecture, and even though development can take place on one
machine, the "component" structure allows for the ultimate deployment
of the application to use multiple machines.

Therefore the best advice and design decisions will require some
additional information about what functionality is required in your
application.

regards, mathtalk-ga

Clarification of Question by bnc_user-ga on 01 Aug 2003 10:34 PDT

thanks for your clarification request, it showed clearly that I need
to be more specific.

Here is more information to your questions:

1) I am consideirng a mutli agent approach in the future not because
the result is needed at different locations or because parameters are
entered at different locations, but because I want to speed up the
calculations. All Parameters, tables and result etc. will always be
coming from one location and go to this one location (the people
operating the program)... so the application goes more towards a
SETI-type application, or calculating Pi, or fancy equations....

2) The problem of record locking etc... This is not an issue, as
pretty much all tables are read-only. There will be a table with jobs
that need to be calculated, an agent goes and gets a job, gets and
downloads the tables it needs (if it hasnt the tables already),
calculates the result, saves the result in the job-table next to the
job description and gets the next job. So two agents will never alter
the same record. --> so here it is the same like in the SETI project.

3)All tables will have to be stored in the memory.

Your clarification explained that the database commands in a regular
programming language, when all tables are being kept in the memory,
will be the fastest to implement my application - is this true?

Answer

Subject: Re: Millions of Queries / fastest/best database and/or language to implement this.
Answered By: mathtalk-ga on 03 Aug 2003 21:23 PDT
Rated: 5 out of 5 stars

Hi, bnc_user-ga:

It seems to me that your full database can easily be held "in memory".
 Assuming that the table with 12 million records (and 3 fields per
record) is mostly numeric or other "compact" data, it appears that the
total storage required is on the order of 100 Megabytes, well within
the capacity of a server-class desktop computer (or even a current
laptop).

It probably makes sense to manage the data with a database, esp. since
you already have a clear notion of "tables" that comprise it.  But all
the read-only data could be loaded, as the application starts, into
arrays of specific C++ structures that are optimized for the
queries/algorithms to be executed.

Holding the "static" data in a database allows for backups and
maintenance to be carried out with standard tools.

Since your application sounds like it is very computational, C++ is a
natural implementation language.  Of course if the "calculation" is in
fact highly combinatorial in nature, there might be some argument for
using another language, but generally C++ will give the best
performance (at a cost of somewhat more difficult code to maintain
than for other "high level" languages).

Probably you will want to design the application for
"multi-threading".  I think this is what you are getting at with your
notion of multi-agent design.  A single computer with multiple CPUs is
no longer an exotic machine, and the read-only data can be shared
easily among multiple "worker" threads, each grabbing a "job" from the
jobs-table and running with it.

Since the data is being retrieved from memory for the immediate
application, it probably matters little what database software is used
to "retain" the data while the application is not running.  I'd feel
comfortable using MySQL or perhaps even trying GNU SQL Server, both
open source database management software that would be free of
licensing costs.

A database management engine tries to "cache" the most useful
information in memory, in order to optimize its query performance.  In
your case the entire dataset should fit into memory, so the critical
design aspects will be things like creating indexes that make it quick
to locate the information needed.

I feel comfortable with the "hardware" design of putting the
application on a single box; it has the flavor of a mostly "batch"
operation.  If you want some specific ideas about sourcing the GIS
data needed for the application, I'll need to hear something further
about the kind of data required.

The operating system could logically be either a server version of
Windows, or a server version of a GNU licensed operating system, e.g.
Linux.  Here's the "mother" site of many open source projects:

[SourceForge]
http://sourceforge.net/

Consensus seems to be that Linux is much "lighter" in its operating
system overhead and demands on computer memory than is Windows, so
perhaps there's a performance edge there for your highly computational
application.

There are a variety of ways to go about locating programmers or
software development firms on the Internet.  I cannot recommend any
from first-hand experience but I'd be happy to share a few sites that
I've come across:

[eLance]
http://www.elance.com/

For a small business this site allows you to post task/project
descriptions and receive bids on proposed work.  My impression is that
they place funds "in escrow" which are released on satisfactory
completion of milestones, providing a certain amount of protection for
both parties.  Ratings/feedback are kept for both "providers" based on
evaluations completed by "consumers".

[FreelancersDirect.com]
http://www.freelancersdirect.com/

A somewhat similar if less well known outfit.  

[NuAspect]
http://www.nuaspect.com/

Seems to have a slightly different "granularity" in that a project
team may be assembled using a coterie of providers, rather than as a
result of accepting a single provider's bid.

[HotDispatch]
http://www.hotdispatch.com/officefronts

Another place to shop your projects, probably distinguished by its
emphasis on business to business commitments rather than on pick-up
players working a solo angle.

Please let me know if some part of my answer could benefit from
additional clarification.

regards, mathtalk-ga

bnc_user-ga rated this answer: 5 out of 5 stars

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy