|
|
Subject:
Script (?) for Comparing Colums of Data
Category: Computers > Algorithms Asked by: tikal-ga List Price: $10.00 |
Posted:
20 Jul 2003 14:59 PDT
Expires: 22 Jul 2003 11:55 PDT Question ID: 233095 |
Hi, This is my first post to GA. Here it goes :-) I am looking to compare the contents of two sets of data and end up with a third set of data that consists of "terms" matched between the two. Not only should this third row consist of exact matches, but it should also include matches that include two terms from the first set of data as well. Example: "A" "B" Identified Words ("C") one one one two nine onethree three ten ten four onethree ten onenight Please note that: - "one" was an identified word because "one" appeared in both "a" and "b" - "onethree" was an identified word because both "one" and "three" appeared in column "a" and the term "onethree" appeared in column "b" - "ten" was an identified word because "ten" appeared in both "a" and "b" - "onenight" WAS NOT and identified word because even though the term "one" appeared in column "a" the term "night" did not. Other points of note: - The size of the data source "a" is around 100,000 cells - The size of the data source "b" is around 40,000 cells - In terms of software, I really only have Excel and I have no experience with databases - The data I have I receive in text files - I have a P4 2.4GHZ with 1GB RAM machine In order to make this a complete answer, I need the following: - The exact script (or detailed instructions for another solution) that will allow me to do this data extraction - The answer should not be "here is where you find the info..." - If anything other than Excel is required, I will need EXACT instructions as to what to download (it has to be free), and what to do with the software I downloaded - If Excel is used, please take into account the limitation on the number of cells in Excel (i.e. the solution you provide will now have to work flawlessly between two columns of "a" data and basically treat them as one) - I will consider limiting my "a" data source to 65,000 if there is no way of doing this. - The solution should take into account the length of time needed to complete the extraction (i.e. I can't wait a day for the data extraction to be completed :-) - The solution should allow me to update the "b" column data on a frequent basis (daily or weekly) by "cutting and pasting" or by "referencing" a txt file. I thank you very much for your time reading this and would be happy to answer any questions. | |
| |
| |
|
|
There is no answer at this time. |
|
Subject:
Re: Script (?) for Comparing Colums of Data
From: mathtalk-ga on 20 Jul 2003 18:05 PDT |
Hi, tikal-ga: I have read with interest your question. It seems to me that in asking for detailed directions on how to carry out the required matching using some software, you are in fact asking not about algorithms but about programming. Certainly there are software packages that would be more appropriate than Excel for solving this kind of problem. Perhaps the most elegant approach would be a Prolog program. There are Prolog compilers/interpreters which are freely available. Likewise, as I think you perhaps anticipated, database programming would provide another means for tackling this matching. The basic point in both cases is that "declarative" programming languages will allow the programmer to specify the "rules" that govern solutions, and leave it to the language implementation to decide "how" the solutions will be found. You seem to propose two rules, which I'll explain using a Prolog-like syntax. Let InA(X) and InB(X) be two predicates which describe the contents of list A and list B, respectively. That is, InA(X) is true for each item X in list A, and similarly for InB(X) and the items in list B. match(X) if InA(X), InB(X). match(Z) if InA(X), InB(Y), concatenate(X,Y,Z). Once the basic "algorithm" or rules are defined, it is possible to turn one's attention to making the implementation efficient. regards, mathtalk-ga |
Subject:
Re: Script (?) for Comparing Colums of Data
From: tikal-ga on 20 Jul 2003 18:19 PDT |
Hi mathtalk-ga, Thanks very much for your comment. The only reason I phrased the question in this way is because I have struggled with it for a long time. Frustration has mounted! :-) The other reason I am looking for a detailed solution is so that I can pick it apart the answer so I can learn some data extraction methods. I'm seem to learn best by seeing the answer and working back to the question - then reapplying to other tasks... anyway. Maybe the issue with my question is the category? Please advise. I will take a look at prolog right now. Thanks! |
Subject:
Re: Script (?) for Comparing Colums of Data
From: mathtalk-ga on 20 Jul 2003 20:05 PDT |
Hi, tikal-ga: Perhaps SQL is the best approach for the volume of data you are dealing with. There are free (open source) relational databases, such as MySQL and GNU SQL Server. Indeed Microsoft makes its MSDE, a restricted version of SQL Server, freely redistributable, so it is "free" to endusers (though a developer is needed to build an application around it for you). However let me correct what I did in the second Prolog rule for your question; I made a mistake when I wrote: match(Z) if InA(X), InB(Y), concatenate(X,Y,Z). This would mean that a "match" can occur by combining items from column A with column B, which is of course not the rule you outlined. Instead you said that a match could occur by concatenating two items from column A into a matching item in column B. (Question: Did you also mean to allow for the converse, two items in column B combining to make an item in column A?) This would be expressed in Prolog as: match(Z) if InB(Z), InA(X), InA(Y), concatenate(X,Y,Z). Similar criteria can be expressed in SQL. For example, we might have a single column table A and another single column table B. Then the first rule (same item in both columns) would be expressed as a simple "join" query: SELECT A.item from A,B WHERE A.item = B.item The second (more complex) rule would be: SELECT B.item from B WHERE EXISTS ( SELECT * from A as A1, A as A2 WHERE B.item = A1.item + A2.item ) Here the + operator is used in SQL to concatenate two "strings", an item apiece from two logical copies of the table A. Again some work to make the query more "efficient" in an implementation would be the next practical step to take. regards, mathtalk-ga |
Subject:
Re: Script (?) for Comparing Colums of Data
From: tikal-ga on 20 Jul 2003 20:17 PDT |
Hi mathtalk-ga, Thanks again for your comment! To answer the question "Did you also mean to allow for the converse, two items in column B combining to make an item in column A?" - No. Ideally the solution would only work by looking for combinations (2 max) of "a" to make "b". Thanks again for the SQL reco. I plan on digging into SQL/MySQL in a couple of months (when work slows down) but until then, I'm afraid I would need some pretty detailed instructions on how to work with SQL in general. I don't even know how to set up a db, add data etc. once I get SQL installed. Thanks very much! |
Subject:
Re: Script (?) for Comparing Colums of Data
From: stephenvakil-ga on 21 Jul 2003 08:16 PDT |
Assuming you have ms access (instructions for 97) it's pretty simple: 1. Open ms access, create a new db 2. Go to File -> Get External Data -> Import 3. Change "Files of Type" to Microsoft Excel then browse to your file 4. Follow the wizard instructions (you might have to select "First Row Contains Headers". You do not need to create a primary key but may optionally do so. At the end you should have a table called Sheet1 5. Go to view->database objects->queries 6. Click New, Select Design View, OK 7. Click close on the table listing 8. Go to view->Sql View 9. Type in: SELECT s1.a & s2.a FROM Sheet1 AS s1, sheet1 AS s2 WHERE (s1.a & s2.a in (select b from sheet1)) union select b from sheet1 where b in (select a from sheet1) 10. Save this as some query name Now you can double click on it to see the results. If you want to save it to an excel sheet within your existing workbook, select the query you just created, go to save as/export, choose excel 97, and browse to your workbook. |
Subject:
Re: Script (?) for Comparing Colums of Data
From: tikal-ga on 22 Jul 2003 11:55 PDT |
Hi stephenvakil-ga, Thanks for you help. I don't have Access though :-( I am going to take a look at your code and try to figure it out though. Thanks! |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |