Google Answers Logo
View Question
 
Q: VB - Faster keyword comparison ( No Answer,   2 Comments )
Question  
Subject: VB - Faster keyword comparison
Category: Computers > Programming
Asked by: xymox-ga
List Price: $50.00
Posted: 21 Jun 2002 06:27 PDT
Expires: 21 Jun 2002 08:58 PDT
Question ID: 31134
I have a function listed below (dims and error handling taken out for
brevity) that takes a list of keywords and compares them to a body of
text looking for a keyword match in the text. In the example below the
function will find java, vb and xml as a match.

The function works fine as it is: I'm using the VB6 Split function to
load the comma delimited keyword list into an array and then I loop
through each array element one-by-one doing a compare (InStr) on the
body text. I then create another comma delimited list of the matches.

The problem is it will not scale very well. The body text could be
very large and the keyword list could eventually be several thousand.
That would take forever with this method. What I need is a much faster
way to do this process in VB6. Please include code if you have it.


Keywords = "programming, java, html, xml, vb"
BodyText = "How long will it take me to learn Java? I already know VB
and XML."

Call KeyCompare(Keywords, BodyText)

Function KeyCompare(Keywords, BodyText)
KeyArray = Split(Keywords, ",")

For i = LBound(KeyArray) To UBound(KeyArray)
    ParseResults = InStr(1, BodyText, KeyArray(ii), vbTextCompare)
    ParseResults = ParseResults - 1
    ii = ii + 1
    
If ParseResults = -1 Then
Else
xKey = xKey & Trim(KeyArray(i)) & "," & " "
End If

Next

If xKey = "" Then
Else
intLen = Len(xKey) - 2
Keystring = Left(Trim(xKey), intLen)
KeyCompare = Keystring
End If
End Function
Answer  
There is no answer at this time.

Comments  
Subject: Re: VB - Faster keyword comparison
From: j_philipp-ga on 21 Jun 2002 06:52 PDT
 
Xymox,

For one thing, explicitly use Left$() and Trim$() instead of Left()
and Trim().

Quote Steven R. Hamby at VB Helper - Performance Tuning:
http://www.vb-helper.com/perform.htm
"If you need to do a lot of string/file processing, use mid$ (and
trim$ etc.) rather than mid as the latter treats the data type as a
variant as opposed to a string, which can be up to 3 times slower"

Above resource is a good read for VB speed optimizing.
Subject: Re: VB - Faster keyword comparison
From: chuckbo-ga on 21 Jun 2002 07:37 PDT
 
Okay, here's some stuff to consider.

First.
To help the scaling, realize that you're probably looping through the
wrong list. Let's assume that your list of keywords will become very
large (thousands) and that the list of words in the sentence is
relatively small. What you want to do is extract each word from the
sentence and compare it to the array of keywords. That way, you're
performing fewer search operations.

Second.
You're going to say, "that won't help -- I still have to loop through
the large array once for each word to test." But here's where I say to
use a collection instead of an array. That way, you're taking
advantage of VB's internal search algorithms, which I hope are better
optimized. (Note that these are suggestions, not guarantees.)
For instance, if you use the code

Dim x As New Collection

    x.Add 1, "Java"
    x.Add 1, "VB"
    x.Add 1, "REXX"
    x.Add 1, "language"
    x.Add 1, "Spanish"

Now you can do isnull(x.item(strTestword)) -- if it's true, meaning
that a null was returned from the collection, then the word being
tested is not in the keyword list. A value of false means that it did
find a match. (So be careful, you may even want to do NOT IsNull(...)
so that your If-logic isn't written as a reverse logic to read.)

3) Now, the difficulty and bottleneck becomes parsing the sentence and
extracting out each word to test. This'll take some experimentation.
I'd make a string of word separators strSeparators = " ,.?!();:" and
either
   a) search through the string, character by character, converting
each of these to space, and then looping through a second time using
InStr to find spaces and find each word to test that way;
   b) search through the string, character by character, building the
next testword until you hit a separator.
   c) another idea -- to keep from having to do so much searching for
separators, maybe go through the string, character by character, and
for each character, if it's between ASCII 65 and 90 or 97 and 122 or
48-57 (maybe test for 39, the apostrophe, as well - but we don't want
this to get out of hand), append the character to next testword
string, and when you run into a character outside of these ranges, you
know you've got some type of separator, so it's time to search for the
testword that you've been building to see if it's in the collection.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy