Google Answers Logo
View Question
 
Q: Database Query ( Answered 4 out of 5 stars,   3 Comments )
Question  
Subject: Database Query
Category: Computers > Programming
Asked by: jimvet-ga
List Price: $25.00
Posted: 18 Dec 2004 09:10 PST
Expires: 17 Jan 2005 09:10 PST
Question ID: 444306
I know enough about Access to be dangerous but know nothing about SQL.
 I do know that I can enter a SQL query into Access and maybe that
would work for me here.

Anyway, I need to know how to create a query in MS Access (either as
SQL or Access UI query) that will do the following:

Within a single table, I need to identify records that have the same
field values as other records for some fields, but have a different
field value than other records for another field in the table.

For example: Find records in a table that have the same Address and
Zipcode but have a differnt UniqueID.

Thanks!

Clarification of Question by jimvet-ga on 18 Dec 2004 16:10 PST
This query must not only identify records with Address and Zip pairs
that appear in more than one row.  Additionally, a third field in each
record must NOT be the same.

So, I want to find records that have the same Address and Zip but
different BusinessName.

How do I do this in Access or in SQL that I can use in Access?

Thank you!
Answer  
Subject: Re: Database Query
Answered By: mathtalk-ga on 19 Dec 2004 08:15 PST
Rated:4 out of 5 stars
 
Hi, jimvet-ga:

What you need is an "EXISTS" clause, a feature of the SQL language
that doesn't fit nicely into the click-and-point methods for building
a query in Access, but which (as you suspected) is easily specifed in
the SQL View of a query.

Let's suppose we have an Access (or other SQL database) table called
Company with three fields:

BusinessName       character strings up to length  50
Address            character strings up to length 200
Zip                character strings up to length  10

The datatypes here don't matter much; I'm specifying them just for the
sake of being able to build an actual Access table to illustrate my
proposed query.  For simplicity we will assume all three fields are
required to have a value; comparing rows with missing values can be
done but needs a careful definition (do rows which are missing a zip
and address "match" or not?).

You mentioned a "unique identifier", and potentially BusinessName
might be such a field (primary key) for the table, but the query
outlined doesn't depend on this fact.  However having a primary key
guarantees that there are no duplicate rows (two rows exactly alike in
all fields).

If I understand the Question, you want all the rows in the table where
the Address and Zip are the same as at least one other row, ie. where
another row exists with Address and Zip the same as this one but a
different BusinessName.

SELECT * from Company
WHERE EXISTS ( SELECT * from Company C
               WHERE  C.BusinessName <> Company.BusinessName
               AND    C.Address       = Company.Address
               AND    C.Zip           = Company.Zip
             );

My formatting of this query is for clarity.  Access will actually
format it a bit differently, but you can use the syntax above as
input.  What you do is start to create a query in design mode, where
you go as far as selecting the table "Company" to use.  After closing
the "select table" dialog box, switch to SQL View of the query (the
lower left drop list on the menu gives Design, DataSheet, and SQL View
as options).  You'll see a fragment like:

SELECT
FROM Company;

Paste in (or retype) the SQL query above.  When you save the query,
Access will reword it to something like this, but it's equivalent:

SELECT *
FROM Company
WHERE (((Exists (SELECT * from Company C
               WHERE  C.BusinessName <>  Company.BusinessName
               AND    C.Address       = Company.Address
               AND    C.Zip           = Company.Zip
             ))<>False));

The "C" which appears next to the table name Company in the EXISTS
clause is called an "alias".  We need a new name to refer to what is
essentially a "second copy" of the table for the purpose of doing the
inner comparisons.

Let me know if I can help by Clarification of this proposed solution. 
The EXISTS clause is very useful in many tough SQL problems.

regards, mathtalk-ga

Request for Answer Clarification by jimvet-ga on 20 Dec 2004 10:54 PST
if my table has 51000 records in it with an expected 1-3k result
records, should a late model PC have its CPU maxed out for more than
15 minutes running this query?

my table is actually called SupplierMaster and the field I need to
have NOT the same between fields is actually called PhysicalHotelID. 
I chose simpler field names to illustrate my question more clearly.

Here is the SQL code in my access query.

SELECT *
FROM SupplierMaster
WHERE (((Exists (SELECT * from SupplierMaster C
               WHERE  C.PhysicalHotelID <> SupplierMaster.PhysicalHotelID
               AND    C.Address       = SupplierMaster.Address
               AND    C.Zip           = SupplierMaster.Zip
             ))<>False));

When I run the query, a datasheet view of the results show up in about
60 seconds, but then the PC processor stays pegged at 100% and I can't
seem to do anything with results window.  Is this kind of query very
complex for the computer to execute?  Or would you suspect something
else is wrong?

Thank you.

Clarification of Answer by mathtalk-ga on 20 Dec 2004 11:25 PST
It seems reasonable if there are no indexes on the table.  Performance
requires an index for SupplierMaster on the combination of Address and
Zip.  Let me figure out the best way to do it, and I'll post further
instructions.

What happens is that without an index, Access has to scan the entire
table for each row that does _not_ have a match, just to be convinced
none exists.  If a match exists, it can stop as soon as one is found
(for that row), but finding the matches also goes more quickly with an
index.

Think of trying to find a number in the phonebook if it were not alphabetized!

regards, mathtalk-ga

Request for Answer Clarification by jimvet-ga on 20 Dec 2004 11:28 PST
I did not have the fields indexed in my table!  I've indexed the
fields and now the queries run in less than one second.  I'll let you
know if I need any further clarification.  I'm looking at results now.

thank you so much.

Clarification of Answer by mathtalk-ga on 20 Dec 2004 11:56 PST
Great!  Without the index the complexity is roughly 51000 * 51000
table lookups, a quadratic order of work, but with the index in place
it should behave pretty close to a linear search (actually linear
times a logarithmic factor).

regards, mathtalk-ga

Request for Answer Clarification by jimvet-ga on 21 Dec 2004 09:23 PST
Now that I have the select query complete, I need to do an update
process on the results of the Select query.  I was going to submit
this as a new question but wanted to keep all of the notes together. 
Let me know if this way of extending the question is ok.  I'll tip you
another $20 for this answer or I could submit as a separate question
if required.

I now need to update some of the records in the result set of the
Select Query we created.  The criteria are easy for me to describe but
I don't know how to translate it into SQL.

Here is a sample of the Select Query results:

HotelID  ChangedFromID  Verified  Address       Zip
9876                    No        123 Main St   45844
8765                    No        123 Main St   45844
1234                    Yes       123 Main St   45844

The criteria for records to be updated is that within any set of
"matching records" - (i.e. those with the same Address and Zip), take
the HotelID from the record with field Verified = "Yes" and use it to
update the HotelID of the other matching records.  AND for each
updated record, update the ChangedFromID field with the value of the
old HotelID before the update AND update the Verified field to "Yes"

I'd like an Update query to create the following result update result:

HotelID  ChangedFromID  Verified  Address       Zip
1234     9876           Yes       123 Main St   45844
1234     8765           Yes       123 Main St   45844
1234                    Yes       123 Main St   45844

Please let me know if this is clear.

Clarification of Answer by mathtalk-ga on 21 Dec 2004 10:07 PST
Hi, jimvet-ga:

Unfortunately having rated this Question foreclosed the opportunity to
add a tip for formulating the update query, as you've proposed.

But let me respond to your request by raising an issue about how that
update is supposed to work.  I suspect you may have situations in
which there are multiple rows, sharing an Address and Zip, which are
Verified = Yes.  Certainly you expect this to be the case after you do
the update.  But could there be some ambiguity in such Verified rows
having different HotelID values?  If there is an ambiguity, should the
update proceed by using one or the other HotelID values (eg. the
smaller one, just to have a definite rule)?  Or should the update be
skipped (for those rows) so that the circumstances can be manually
reviewed?

As a personal favor, if you feel that one of my Answers could have
been better, I'd appreciate knowing how to improve it for future
reference.

regards, mathtalk-ga
jimvet-ga rated this answer:4 out of 5 stars
Clear and concise answer with good examples.  Thank you very much! 
And, I have a follow on question that I'll submit now.

Comments  
Subject: Re: Database Query
From: vladimir-ga on 18 Dec 2004 11:46 PST
 
SELECT ADDRESS, ZIPCODE FROM TABLE GROUP BY ADDRESS, ZIPCODE HAVING COUNT(*) > 1

This query should give you the address/zipcode pairs that appear in
more than one row in the database.

Not sure if this syntax is OK in Access. Hope this helps.

Vladimir
Subject: Re: Database Query
From: jimvet-ga on 18 Dec 2004 15:24 PST
 
Vladimir, your comment does not answer my question because I not only
need to find records that have the same Address and Zip, but a further
criterion is that a third field is NOT the same for the records with
the same Address and Zip.   Thanks!
Subject: Re: Database Query
From: vladimir-ga on 18 Dec 2004 18:58 PST
 
How about something like this:

SELECT DISTINCT t1.address, t1.zipcode, t1.businessname FROM yourtable
t1, yourtable t2 WHERE t1.address = t2.address AND t1.zipcode =
t2.zipcode AND NOT t1.businessname = t2.businessname

Vladimir

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy