Google Answers: How to improve speed of aggregates in postgresql

View Question

Q: How to improve speed of aggregates in postgresql ( No Answer, 1 Comment )

Question

Subject: How to improve speed of aggregates in postgresql
Category: Computers > Programming
Asked by: steneker-ga
List Price: $10.00

Posted: 26 Jul 2005 16:52 PDT
Expires: 30 Jul 2005 11:15 PDT
Question ID: 548280

Hello,

I'm working on an affiliate marketing project, written in PHP 4 and
driven by a PostgreSQL database. In the past I have worked only with
MS Access (for a really short while) and a few years with MySQL.

Because I wanted to take full advantage of PostgreSQL's features, I
decided to try it's OOP feature for the tracking table, which will
contain all impressions, clicks, leads and sales. The structure of the
transaction tables looks like this:

- Transaction (14202042 rows)
- - CampaignTransaction (14202042 rows)
- - - CampaignTransactionImpression (13395944 rows)
- - - CampaignTransactionClick (798597 rows)
- - - CampaignTransactionLead (5204 rows)
- - - CampaignTransactionSale (2297 rows)

The problem occurs when I'm trying to use aggregate functions to count
the number of rows in one of these tables. I wrote a small benchmark
tool to see the average time of the following query:

SELECT COUNT("ID") FROM "CampaignTransactionClick"

The output from the benchmark script (with an interval of 25 tries) is
a horrible 9.612 seconds! I tried tweaking the postgresql
configuration file, creating/removing indexes (although that only
helps with a where clause) and just can't figure out how to solve this
problem.

The server I'm running the query on is a Pentium 4 2,66ghz machine
with just 256 MB of memory and PostgreSQL 8. I also tried running it
on a Dual Xeon 2Ghz with 4 GB of memory, and PostgreSQL 7.4. The
output on that machine was 1.095 seconds, which is still a lot. I'm
afraid to even try it on the "CampaignTransactionImpression" table
:-(.

Another simple query with a few (completely) indexed conditions:

SELECT COUNT("ID") FROM "CampaignTransactionClick" WHERE
"transactionStatusID" = 2 AND "campaignID" = 62

Takes an average time of 12.723 seconds (1.272 on the Dual Xeon
server). Output of the query is 66325 btw.

These execution times are just terrible, and horrible compared to what
I was used to with MySQL. I have already tried increasing
"shared_buffers", "work_mem", "max_fsm_pages", "max_fsm_relations" etc
in the config. The only way I can think of to solve this is by running
a cronjob every hour or so to update the counts, but that's a terrible
solution too, since much data has to be calculated while all data has
to be (near) real-time, and these numbers have to be generated per
affiliate too. With 10.000 affiliates sites, this will be extremely
many (heavy) queries. :(

Is there another solution to solve this and keeping things real-time
at the same moment?

Thank you so much!

Tim

PS: here is the execution plan for the second query, if it can help:

Aggregate  (cost=31364.85..31364.85 rows=1 width=4) (actual
time=1247.717..1247.718 rows=1 loops=1)

  ->  Seq Scan on "CampaignTransactionClick"  (cost=0.00..31190.72
rows=69652 width=4) (actual time=0.085..1177.993 rows=68907 loops=1)

        Filter: (("transactionStatusID" = 2) AND ("campaignID" = 62))

Total runtime: 1248.098 ms

And every ID/date column is indexed, and specially for this query, I
tried this index:

CREATE INDEX "idx_CTC_transactionStatusID_campaignID" ON
"CampaignTransactionClick" USING btree ("transactionStatusID",
"campaignID")

I also tried to disable sequence scans to force an index scan. This is
the new execution plan:

Aggregate  (cost=114882.80..114882.80 rows=1 width=4)

  ->  Index Scan using "idx_CTC_transactionStatusID_campaignID" on
"CampaignTransactionClick"  (cost=0.00..114737.00 rows=58319 width=4)

        Index Cond: (("transactionStatusID" = 2) AND ("campaignID" = 62))

The query now takes 7.985 seconds on the Pentium 4 server instead of
12 seconds. A slight improvement, but not nearly enough :(.

Clarification of Question by steneker-ga on 28 Jul 2005 11:34 PDT

Hello again,

I tried running the same database on MySQL (took a while to convert
things, not everything is working but the things I wanted to check do
work). While PostgreSQL takes about 4.5 seconds for a reporting page
on a Dual Xeon server with 4GB ddr ram and a fully optimized Postgres
installation and indexed tables, MySQL on exactly the same page with
the same basic queries only takes a lousy 0.002xxx second!! (YES
0.002xxx second!!!) This is really unbelievable!! What is wrong with
Postgresql???

Answer

There is no answer at this time.

Comments

Subject: Re: How to improve speed of aggregates in postgresql
From: mathtalk-ga on 28 Jul 2005 20:24 PDT

Is there a reason to do COUNT("ID") rather than the usual COUNT(*)?

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy