Hello lighthousedweller-ga,
Okay, I have put code with a simple
hash table and a simple data file here:
http://nms.lcs.mit.edu/~gch/google/hashtable.tar.gz
Please let me know how it works for you.
dogbite-ga |
Clarification of Answer by
dogbite-ga
on
01 May 2003 20:34 PDT
Hi again lighthousedweller-ga,
I thought I should mention that I explained
how to compile and run the program at the
top of "hashtable.cpp." Also, there are
comments in the code to help you understand it.
dogbite-ga
|
Request for Answer Clarification by
lighthousedweller-ga
on
02 May 2003 10:27 PDT
hi there-
thanks for the quick response.
i use metrowerks codewarrior for my programs, but also have ms visual
c++. how can i compile this program with these two compilers?
also what is the hashtable file without any extension for? its in the
tar file, just named 'hashtable'.
thanks for the help!
|
Clarification of Answer by
dogbite-ga
on
02 May 2003 10:33 PDT
Hello lighthousedweller-ga,
The "hashtable" file without an extention
is an executable that I created on my
Linux machine. Since you're not on Linux
(unless you have a Mac) you can ignore it.
I am not very familiar with either codwarior
or visual c++. But in either case, I imagine
you open the hashtable.cpp source code file.
Then select "compile" or "execute" or something
like that from the menu.
Does that sound okay to you?
dogbite-ga
|
Request for Answer Clarification by
lighthousedweller-ga
on
03 May 2003 02:37 PDT
OK got it to run. Usually I do compile/execute, but in this case i
just needed to compile it to an exe and run it from dos with the
command line arguments.
a few questions.
1) in the output it says hashed 28 into 28, is it not hashing them
into different values?
2) also what happens if there are two or more with the same id? how
would you access it then?
3) what is the memcpy function doing? and why is id always referred to
as a reference?
4) do you think you could convert the ? : statment to an if/else? i'm
always confused by that syntax.
thanks for the help
|
Clarification of Answer by
dogbite-ga
on
03 May 2003 10:12 PDT
Hi lighthousedweller-ga,
I'm glad you were able to compile
and run the program. Here are
answers to your questions:
) in the output it says hashed 28 into 28, is it not hashing them
into different values?
the hashtable is simple -- it holds a fixed
number of elements. it's set at 2003 elements,
but you could change that by changing the
MAX define. also, the hash function is simple --
it just takes the id modulo the size to produce
the hash code. so anything with an id under 2003
will have a hash code equal to its id.
when you have a collision we use linear probing
to find a spot for the item. that means you go
down the list looking for an open spot.
2) also what happens if there are two or more with the same id? how
would you access it then?
id's are assumed to be unique. think of
them as a person's social security number.
note that two elements with different ids
can have identical hash codes. having unique
ids is essential so we can tell them apart.
3) what is the memcpy function doing? and why is id always referred to
as a reference?
memcpy copys two chunks of memory. so it is
copying all the information from the given
student structure into the hashtable.
memcpy needs needs pointers to its arguments.
so we take a reference of those structures so
that memcpy can do its work.
4) do you think you could convert the ? : statment to an if/else? i'm
always confused by that syntax.
sure thing -- i updated the hashtable.tar.gz
file with an if/else block.
dogbite-ga
|
Request for Answer Clarification by
lighthousedweller-ga
on
03 May 2003 17:02 PDT
thanks for those clarifications...
heres a few more small questions
1) you said that having unique ids is essential so we can tell them
apart and this makes sense, but then if they are two ids that are the
same you also say we do linear probing to find a place for it. instead
of doing linear probing, should it be an error? cause whats the point
of finding a place for it, if its not going to be able to be reached
by it's id number?
2) also could you suggest a way to change the hash function if i only
had a list of words that i wanted to put into the hash table, cause i
dont believe %max would word if it was a string i was hashing...
thanks alot for the help!
|
Clarification of Answer by
dogbite-ga
on
03 May 2003 18:58 PDT
Hi lighthousedweller-ga,
Sure, I can answer those two questions:
1) you said that having unique ids is essential so we can tell them
apart and this makes sense, but then if they are two ids that are the
same you also say we do linear probing to find a place for it. instead
of doing linear probing, should it be an error? cause whats the point
of finding a place for it, if its not going to be able to be reached
by it's id number?
Sorry for not being clear. Two items cannot
have the same ID. Two items can have ids that
map to the same hashcode though.
2) also could you suggest a way to change the hash function if i only
had a list of words that i wanted to put into the hash table, cause i
dont believe %max would word if it was a string i was hashing.... . .
For strings, the easiest thing to do would be:
1) write a function that returns a unique
id (a number) for any string. you can
use something like:
int stringtonum(char *s, int len) {
int i;
int total = 0;
for (i = 0; i < len; i++) {
total += 3*i*((int)index(s,i));
}
return total;
}
2) use the student structure again, but
put your string in the "first name"
field.
Sound good?
dogbite-ga
|
Request for Answer Clarification by
lighthousedweller-ga
on
03 May 2003 23:28 PDT
hi again-
i'm having alot of problems converting the code from an int hash to a
string hash. is there a reason to use char* instead of just using
strings?
i'm trying to convert it to something that hashes a word and then
stores the word and its definition in the hash table. so i only two
inputs instead of three.
then you're able to retrieve whether or not the word exists, and what
its definition is, using a function similar to getStudent. can you
help me rework this code using strings instead of char?
also in your previous post you said you can have "Two items can have
ids that map to the same hashcode though." OK, so if they did how
would you be able to find the correct one you're looking for when
using the function getStudent? For example, say the words run and ruin
mapped to the same hashcode. ruin would need to be put in a place
using linear probing. now if you did something like getStudent(ruin)
wouldn't it return the info for run? if you could also clarify this a
bit for me i would appreciate it.
thanks alot once again.
|
Clarification of Answer by
dogbite-ga
on
04 May 2003 08:25 PDT
Hi lighthousedweller-ga,
Those are two good questions.
First, if two items have different ids but
the same hash code, you find the correct item
using linear probing. You start at the
hash code's location in the table and then
move down, checking the id of each item
until it matches. Does that make sense?
Regarding changing the table to handle
strings, I feel this is too far beyond
the scope of your origional question to
answer (for $10). I suggest that you
submit it as a follow-up question. If
you want me to answer it, just indicate
that in the subject line.
dogbite-ga
|
Request for Answer Clarification by
lighthousedweller-ga
on
04 May 2003 14:15 PDT
OK, I will post a followup question and I really do appreciate the
help. But also notice that your answer also wasnt really what i
requested in the intial post. I mean it works, but it requires some
modification, ie working with hashing strings instead of hasing ints,
which is what I was looking for. If you re-read the intial post you'll
probably agree.
I do realize you have put in alot of work on this problem regardless
so I will post a followup. Thanks again for the help.
|
Clarification of Answer by
dogbite-ga
on
04 May 2003 15:43 PDT
Hi lighthousedweller-ga,
I appologize -- I did not properly interpret
"dictionary" in your question. Thank you
for reposting your question.
dogbite-ga
|