![]() |
|
![]() | ||
|
Subject:
DB and Algorithm solution for Degree of seperation between friends
Category: Computers Asked by: x86guru-ga List Price: $150.00 |
Posted:
18 Oct 2005 01:42 PDT
Expires: 17 Nov 2005 00:42 PST Question ID: 581587 |
![]() | ||
|
There is no answer at this time. |
![]() | ||
|
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: ssrivast-ga on 19 Oct 2005 10:52 PDT |
The latest research in this field is: http://kdl.cs.umass.edu/papers/simsek-jensen-in-progress.pdf http://kdl.cs.umass.edu/papers/simsek-jensen-ijcai2005.pdf A University of Massachusetts press release claims researchers may have developed a breakthrough algorithm for searching decentralized networks such as collaboration graphs. Their algorithm, called Expected-Value Navigation (EVN) could make life easier for developers working with robot swarms by providing the most efficient message passing route from one robot to another within the swarm. EVN may also have important implications for signal paths within neural networks. On the downside, it could make life easier for the authors of virus and worm writers (and harder for MS Windows users). Still not sure what a collaboration graph is? Everyone is probably familiar with at least one well known one: the Six Degrees of Kevin Bacon game, which is based on the ideas of the Erd?s Number project and the Six Degrees of Separation. In fact some people get a little obsessed with such ideas. An article published by the Air Force Research Lab connects these ideas with classical physics and modern Open Source software development methods. A few more details about the new algorithm are available in an Engineering Online article but for the math itself, take a look at the actual research paper, "Decentralized Search in Networks Using Homophily and Degree Disparity". |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: omedhabib-ga on 19 Oct 2005 12:14 PDT |
ssrivast, good research but i dont think that is what he is looking for. i think he wants to know how to actually perform it. x86guru, i always wondered how friendster does it. i havent done much research on the subject, but off the top of my head im thinking of something along the lines of creating a table in your database such as : table name: user_network table fields: un_id | un_user_id | un_friend_id un_id = unique identifer for each record un_user_id = the user id of the user him/herself un_friend_id = the friends that her/she is friends with so you have two subjects, subject A and B. we want to see how they're connected, right? so you would perform a mysql query on each one and load their friends into an array called A[] and B[], respectively. if there are any matches, load them into the array 1[]. this array, 1[], holds your 1 degree of separated friends. now for 2nd degree friends. take array A[] and perform a mysql query on EACH of those values, load them all into 2P[]. take 2P[] and check it against B[]. if you find any matches, load them into 2[]. since we have two in this case, it becomes a two-dimensional array. the purpose of this? to maintain the structural integrity of the link. for example: Array 1[] = 1 dimensional array this array will contain, FOR EXAMPLE, the following values: 1[] ( [3] [5] [9] ) this means 1 is friends with user 3, 5, and 9. take again A[], perform a query for each of its values and load them into array 2P[] (2 preliminary), check THESE matches against array B[] and if you find any matches, load them into 2[]. your array 2[] might look something like this now: 2[] ( 91[] ( 45[] 33[] ) 57[] ( 13[] ) 16[] ( 26[] ) ) this array, 2[], says that users A is linked to user B by 2 degrees of separation via the following links: A -> 91 -> 45 -> B A -> 91 -> 33 -> B A -> 57 -> 13 -> B A -> 16 -> 26 -> B following this logic, take it step by step up to your 6th array 6[], which will be a 6th dimentional array and will contain information about your 6 degrees of separation. this method will work, but im sure you already see the flaw in it. if friendster incorporated it, their servers will crash within seconds. for EACH time this is performed on someone, you're looking at not only thousands of lines of code being processed (which is ok), but you're looking at THOUSANDS of mysql queries being performed strictly for this code. with a table of perhaps 1,000,000+ users and 1,000,000,000,000+ records for them and their friends, its probally unrealistic. and since you're going up 6 degrees, i don't even want to imagine the numbers... probally somewhere around 1,000,000,000,000,000,000,000 queries. SOMEONE ELSE SHOULD ANSWER THIS QUESTION WHO KNOWS HOW TO DO THIS! |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: breflection-ga on 19 Oct 2005 14:47 PDT |
Six Degrees of Wikipedia does what you ask. It uses iterative deepening depth first search shortest path http://kohl.wikimedia.org/~kate/cgi-bin/six_degrees Here's the source code: http://www.qwikly.com/code/Six%20Degrees%20of%20Wikipedia/linksc.cc http://www.qwikly.com/code/Six%20Degrees%20of%20Wikipedia/linksd.cc |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: x86guru-ga on 19 Oct 2005 17:48 PDT |
breflection Isn't that pretty much the same algorithm state by the other user. It looks like its a recursive algorithm going through all possiblities. I think the reason why its fast on wiki site is because they have around 10 linkes per page so we are talking around 100 links to sort through before you get to the final destination. In my case we are talking thousands so it needs to be faster than that. Unless something I missed? |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: breflection-ga on 21 Oct 2005 16:09 PDT |
I posted it principally because the source code is there. It's a decent algorithm for what you want to do, but 1000 degrees of separation is just gigantic no matter how you think of it! |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: x86guru-ga on 22 Oct 2005 14:56 PDT |
Well its not a thousand degrees of seperation its more like 1000's of nodes in the 6 degrees of seperation. |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 04 Nov 2005 12:16 PST |
I strongly suggest an industrial grade database (Oracle), it supports trigger as well as stuff like "select friendid from friends where personid in (select friendid from friends where personid=XXXX", and Union/Intersect/Minus, it is also able to cache most of the database in memory. You would have two tables, for first and second degrees create friends(personid number ,friendid number); create friends2(personid number ,friendid1 number,friendid2 number); friends2 is derivative from friends, it contains multiple rows for each friendid1 with their friends in friendid2. you would have a trigger on insert,update,delete of friends that inserts/updates in friends2. for undirected you would always say stuff like: select friendid f from friends where personid=XXXX union select personid f from friends where friendid=XXXX how to find degree of separation between A and B: This algorith is faster with close degrees of separation and gets slower if they are farther away from eachother. we will work from both sides, the basic queries are quite simple (select friendid from friends where personid=XX and select friendid2 from friends2 where personid=XX) 1. check if any of them is a first degree friend of the other from table friends if yes --> finished, degree=1 2. check if any of them is a second degree friend of the other from table friends2 if yes --> finished, degree=2 3. ((query for 1st degree friends of A) intersect (2nd degree friends of B)) union ((query for 1st degree friends of B) intersect (2nd degree friends of A)) if found --> finished, degree=3 4. ((query for 2nd degree friends of A) intersect (2nd degree friends of B)) union ((query for 2nd degree friends of B) intersect (2nd degree friends of A)) if found --> finished, degree=4 5. Complex queries can also be done for degrees 3 on both sides. query for 3rd degree is (select friendid2 from friends2 where personid in (select friendid from friends where personid=XXX) ((query for 2nd degree friends of A) intersect (3nd degree friends of B)) union ((query for 2nd degree friends of B) intersect (3nd degree friends of A)) if found --> finished, degree=5 6. ((query for 3nd degree friends of A) intersect (3nd degree friends of B)) union ((query for 3nd degree friends of B) intersect (3nd degree friends of A)) if found --> finished, degree=3 |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 04 Nov 2005 12:16 PST |
sorry, for the last line: degree=6 |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 04 Nov 2005 12:56 PST |
another idea would be to have your own db-like daemon, you have a (huge) process written in a compiled language (maybe C/C++), it handles all of the friend data, it uses a database for redundancy only, but has a complete copy of the friends data : struct sfriend { int personid, friend *friend }; the process is the only one to deal with filling the database, from php you would only select, never insert or update, these will pass through this program, it adds it to the memory structure and then reflects it in the database. when the program starts (probably as a daemon), it will fill up it's memory structures from the database. doing all the "shortest path" algorithms should be ultra fast since it's all in memory (even with 1000000 nodes, should take milliseconds on a modern computer). the only thing you need to worry about is the communication between php and this daemon, you need to handle: remove_friend(personid, friendid); add_friend(personid, friendid); find_degrees(personid1,personid2); find_degrees could return ready to use HTML for the PHP script to display. find_degrees could cache the last XXX requested relations in the database for a specified amount of time table lastdegree(personid1 int,personid2 int,degreeHTML text,expirytimestamp int); you would delete all expired caches, then look for a ready to use result, and return it if found, otherwise run the algorithm. |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 06 Nov 2005 09:13 PST |
I am bored, so I have more comments: If you want to store all the relations and distances/paths in your database, here's a calculation of the required data size: Since you don't need to store the distance from both sides, and you would not store the distance between a node and itself, so you have to use N*(N-1)/2 cells of an adjecency matrix (you would not store more than half of it). You would store the distance,and path between every two person nodes Pi and Pj. create table distances (node1,node2,distance,path1,path2,path3,path4) since you will study up to 6 degrees, we may need up to 4 intermediate people. You would store a distance of 1 to 5 if there a no relational path of 6 degrees or less. You would store a distance of 0 if there is no relational path. You would store a distance of -1 if the distance is invalid/unknown (still to be calculated). You simply lookup the distance and path for easy displaying in your php script. When you add a new relation between two people, all paths that contain any of them will become invalid (set to -1), and will have to be re-calculated. simple for a new relation beween A,B: update distances set distance=-1 where node1=A or node1=B or node2=A or node2=B or path1=A or path1=B or path2=A or path2=B or path3=A or path3=B or path4=A or path4=B; update distances set distance=1,path1=0,path2=0,path3=0,path4=0 where (node1=A and node2=B) or (node1=B and node2=A); A batch process (back-end) would constantly update the relations database, and fix all those -1 relations (when recalculated they may even pass through another path that is shorter, you never know) this could be a quick/optimized C/C++ program. when you ask for the relation between you and a node, and it gives -1, you would tell the user that the info is temporarily unavailable. let's assume a database of 1,000 users and up to 6 degrees of separation: every relation would have two people and up to 4 other people, so if a person's is stored in 6 bytes: you would need 6* (1000*999)/2 = 2997000 bytes (3MB) for 1,000,000 people: 6*1000000*999999/2 = 2999997000000 bytes =2999997 MB = 2999GB = 3TB so i't not really scalable to 1,000,000 people. |
Subject:
Re: DB and Algorithm solution for Degree of seperation between friends
From: x86guru-ga on 08 Nov 2005 23:39 PST |
Some good ideas. However I dont think its scalable to keep a database of every path and relationship between users because updating that database will take forever. And the database size will grow exponentially not linearly. Because for every 1 user added could result in 1,000 extra records and actually with your schema i think it would result in more than that because of how 1 new user can affect other peoples paths. What do you think about that? Or am I missing something? |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |