Hi, jimvet-ga:
What you need is an "EXISTS" clause, a feature of the SQL language
that doesn't fit nicely into the click-and-point methods for building
a query in Access, but which (as you suspected) is easily specifed in
the SQL View of a query.
Let's suppose we have an Access (or other SQL database) table called
Company with three fields:
BusinessName character strings up to length 50
Address character strings up to length 200
Zip character strings up to length 10
The datatypes here don't matter much; I'm specifying them just for the
sake of being able to build an actual Access table to illustrate my
proposed query. For simplicity we will assume all three fields are
required to have a value; comparing rows with missing values can be
done but needs a careful definition (do rows which are missing a zip
and address "match" or not?).
You mentioned a "unique identifier", and potentially BusinessName
might be such a field (primary key) for the table, but the query
outlined doesn't depend on this fact. However having a primary key
guarantees that there are no duplicate rows (two rows exactly alike in
all fields).
If I understand the Question, you want all the rows in the table where
the Address and Zip are the same as at least one other row, ie. where
another row exists with Address and Zip the same as this one but a
different BusinessName.
SELECT * from Company
WHERE EXISTS ( SELECT * from Company C
WHERE C.BusinessName <> Company.BusinessName
AND C.Address = Company.Address
AND C.Zip = Company.Zip
);
My formatting of this query is for clarity. Access will actually
format it a bit differently, but you can use the syntax above as
input. What you do is start to create a query in design mode, where
you go as far as selecting the table "Company" to use. After closing
the "select table" dialog box, switch to SQL View of the query (the
lower left drop list on the menu gives Design, DataSheet, and SQL View
as options). You'll see a fragment like:
SELECT
FROM Company;
Paste in (or retype) the SQL query above. When you save the query,
Access will reword it to something like this, but it's equivalent:
SELECT *
FROM Company
WHERE (((Exists (SELECT * from Company C
WHERE C.BusinessName <> Company.BusinessName
AND C.Address = Company.Address
AND C.Zip = Company.Zip
))<>False));
The "C" which appears next to the table name Company in the EXISTS
clause is called an "alias". We need a new name to refer to what is
essentially a "second copy" of the table for the purpose of doing the
inner comparisons.
Let me know if I can help by Clarification of this proposed solution.
The EXISTS clause is very useful in many tough SQL problems.
regards, mathtalk-ga |
Request for Answer Clarification by
jimvet-ga
on
20 Dec 2004 10:54 PST
if my table has 51000 records in it with an expected 1-3k result
records, should a late model PC have its CPU maxed out for more than
15 minutes running this query?
my table is actually called SupplierMaster and the field I need to
have NOT the same between fields is actually called PhysicalHotelID.
I chose simpler field names to illustrate my question more clearly.
Here is the SQL code in my access query.
SELECT *
FROM SupplierMaster
WHERE (((Exists (SELECT * from SupplierMaster C
WHERE C.PhysicalHotelID <> SupplierMaster.PhysicalHotelID
AND C.Address = SupplierMaster.Address
AND C.Zip = SupplierMaster.Zip
))<>False));
When I run the query, a datasheet view of the results show up in about
60 seconds, but then the PC processor stays pegged at 100% and I can't
seem to do anything with results window. Is this kind of query very
complex for the computer to execute? Or would you suspect something
else is wrong?
Thank you.
|
Clarification of Answer by
mathtalk-ga
on
20 Dec 2004 11:25 PST
It seems reasonable if there are no indexes on the table. Performance
requires an index for SupplierMaster on the combination of Address and
Zip. Let me figure out the best way to do it, and I'll post further
instructions.
What happens is that without an index, Access has to scan the entire
table for each row that does _not_ have a match, just to be convinced
none exists. If a match exists, it can stop as soon as one is found
(for that row), but finding the matches also goes more quickly with an
index.
Think of trying to find a number in the phonebook if it were not alphabetized!
regards, mathtalk-ga
|
Request for Answer Clarification by
jimvet-ga
on
20 Dec 2004 11:28 PST
I did not have the fields indexed in my table! I've indexed the
fields and now the queries run in less than one second. I'll let you
know if I need any further clarification. I'm looking at results now.
thank you so much.
|
Clarification of Answer by
mathtalk-ga
on
20 Dec 2004 11:56 PST
Great! Without the index the complexity is roughly 51000 * 51000
table lookups, a quadratic order of work, but with the index in place
it should behave pretty close to a linear search (actually linear
times a logarithmic factor).
regards, mathtalk-ga
|
Request for Answer Clarification by
jimvet-ga
on
21 Dec 2004 09:23 PST
Now that I have the select query complete, I need to do an update
process on the results of the Select query. I was going to submit
this as a new question but wanted to keep all of the notes together.
Let me know if this way of extending the question is ok. I'll tip you
another $20 for this answer or I could submit as a separate question
if required.
I now need to update some of the records in the result set of the
Select Query we created. The criteria are easy for me to describe but
I don't know how to translate it into SQL.
Here is a sample of the Select Query results:
HotelID ChangedFromID Verified Address Zip
9876 No 123 Main St 45844
8765 No 123 Main St 45844
1234 Yes 123 Main St 45844
The criteria for records to be updated is that within any set of
"matching records" - (i.e. those with the same Address and Zip), take
the HotelID from the record with field Verified = "Yes" and use it to
update the HotelID of the other matching records. AND for each
updated record, update the ChangedFromID field with the value of the
old HotelID before the update AND update the Verified field to "Yes"
I'd like an Update query to create the following result update result:
HotelID ChangedFromID Verified Address Zip
1234 9876 Yes 123 Main St 45844
1234 8765 Yes 123 Main St 45844
1234 Yes 123 Main St 45844
Please let me know if this is clear.
|
Clarification of Answer by
mathtalk-ga
on
21 Dec 2004 10:07 PST
Hi, jimvet-ga:
Unfortunately having rated this Question foreclosed the opportunity to
add a tip for formulating the update query, as you've proposed.
But let me respond to your request by raising an issue about how that
update is supposed to work. I suspect you may have situations in
which there are multiple rows, sharing an Address and Zip, which are
Verified = Yes. Certainly you expect this to be the case after you do
the update. But could there be some ambiguity in such Verified rows
having different HotelID values? If there is an ambiguity, should the
update proceed by using one or the other HotelID values (eg. the
smaller one, just to have a definite rule)? Or should the update be
skipped (for those rows) so that the circumstances can be manually
reviewed?
As a personal favor, if you feel that one of my Answers could have
been better, I'd appreciate knowing how to improve it for future
reference.
regards, mathtalk-ga
|