Google Answers: DB and Algorithm solution for Degree of seperation between friends

View Question

Q: DB and Algorithm solution for Degree of seperation between friends ( No Answer, 11 Comments )

Question

Subject: DB and Algorithm solution for Degree of seperation between friends
Category: Computers
Asked by: x86guru-ga
List Price: $150.00

Posted: 18 Oct 2005 01:42 PDT
Expires: 17 Nov 2005 00:42 PST
Question ID: 581587

I need a solution that is similar to friendster in its ability to find
the degree of seperation between people. Friendster is a social
networking site and when viewing a persons profile will return
something like You -> Person1 -> Person2 -> Person3 -> Them. I need to
implement something similar and have researched several solutions.
One solution is that you can query all of your friends then all of
their friends and all of their friends until you find the userid you
are looking for then return the path. However this is not very
efficient because some people can have 1000's of people in their
friends and they can have 1000's of people too.
Another solution I looked into is that create two db's. One for normal
usage where just level 1 of each persons friends are kept. Then have
another db (or something like a map) that has everyones relationship
with everyone else and when you view someones page it will go and do a
dijkstra algorithm to find the shortest path between the two people.
However the problem with this is not sure the exact schema to use and
exactly how to go about this.
I think that there is probably a better solution that I am not
considering because I am not smart enough or experienced enough to
know exactly how to go about it. The friendster site seems to be very
responsive and looks like the level of seperation is not a bottleneck
at all.
The solution I am looking for can not be unrealistic in a production
enviorement and should include some basic schema examples and
algorithm usage examples if any. Source code is not neccessary if the
solution is explained fully and explained right. However source code
especially in php might get a tip if its useful. Money is really not a
issue and if someone answered this question very good I dont mind
leaving a very good tip for them because your saving me major time.

Clarification of Question by x86guru-ga on 22 Oct 2005 14:58 PDT

One of the most important aspects of this question is how to create a
datastructure to efficiently handle this data. This all maps to a
undirected relationship graph but how does that graph map into a good
data structure?

Answer

There is no answer at this time.

Comments

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: ssrivast-ga on 19 Oct 2005 10:52 PDT

The latest research in this field is:
http://kdl.cs.umass.edu/papers/simsek-jensen-in-progress.pdf
http://kdl.cs.umass.edu/papers/simsek-jensen-ijcai2005.pdf

A University of Massachusetts press release claims researchers may
have developed a breakthrough algorithm for searching decentralized
networks such as collaboration graphs. Their algorithm, called
Expected-Value Navigation (EVN) could make life easier for developers
working with robot swarms by providing the most efficient message
passing route from one robot to another within the swarm. EVN may also
have important implications for signal paths within neural networks.
On the downside, it could make life easier for the authors of virus
and worm writers (and harder for MS Windows users). Still not sure
what a collaboration graph is? Everyone is probably familiar with at
least one well known one: the Six Degrees of Kevin Bacon game, which
is based on the ideas of the Erd?s Number project and the Six Degrees
of Separation. In fact some people get a little obsessed with such
ideas. An article published by the Air Force Research Lab connects
these ideas with classical physics and modern Open Source software
development methods. A few more details about the new algorithm are
available in an Engineering Online article but for the math itself,
take a look at the actual research paper, "Decentralized Search in
Networks Using Homophily and Degree Disparity".

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: omedhabib-ga on 19 Oct 2005 12:14 PDT

ssrivast, good research but i dont think that is what he is looking
for. i think he wants to know how to actually perform it.

x86guru, i always wondered how friendster does it. i havent done much
research on the subject, but off the top of my head im thinking of
something along the lines of creating a table in your database such as
:

table name:  user_network

table fields:  un_id   |  un_user_id  |  un_friend_id

un_id = unique identifer for each record
un_user_id = the user id of the user him/herself
un_friend_id = the friends that her/she is friends with

so you have two subjects, subject A and B. we want to see how they're
connected, right? so you would perform a mysql query on each one and
load their friends into an array called A[] and B[], respectively. if
there are any matches, load them into the array 1[]. this array, 1[],
holds your 1 degree of separated friends.

now for 2nd degree friends. 

take array A[] and perform a mysql query on EACH of those values, load
them all into 2P[]. take 2P[] and check it against B[]. if you find
any matches, load them into 2[]. since we have two in this case, it
becomes a two-dimensional array. the purpose of this? to maintain the
structural integrity of the link.

for example: 

Array 1[] = 1 dimensional array

this array will contain, FOR EXAMPLE, the following values: 

1[]
(
   [3]
   [5]
   [9]
)

this means 1 is friends with user 3, 5, and 9.

take again A[], perform a query for each of its values and load them
into array 2P[] (2 preliminary), check THESE matches against array B[]
and if you find any matches, load them into 2[]. your array 2[] might
look something like this now:

2[]
(
  91[]
  (
    45[]
    33[]
  )  
  57[]
  (
    13[]
  )
  16[]
  (
    26[]
  )
)

this array, 2[], says that users A is linked to user B by 2 degrees of
separation via the following links:

A -> 91 -> 45 -> B
A -> 91 -> 33 -> B
A -> 57 -> 13 -> B
A -> 16 -> 26 -> B

following this logic, take it step by step up to your 6th array 6[],
which will be a 6th dimentional array and will contain information
about your 6 degrees of separation.

this method will work, but im sure you already see the flaw in it. if
friendster incorporated it, their servers will crash within seconds.

for EACH time this is performed on someone, you're looking at not only
thousands of lines of code being processed (which is ok), but you're
looking at THOUSANDS of mysql queries being performed strictly for
this code. with a table of perhaps 1,000,000+ users and
1,000,000,000,000+ records for them and their friends, its probally
unrealistic. and since you're going up 6 degrees, i don't even want to
imagine the numbers... probally somewhere around
1,000,000,000,000,000,000,000 queries.


SOMEONE ELSE SHOULD ANSWER THIS QUESTION WHO KNOWS HOW TO DO THIS!

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: breflection-ga on 19 Oct 2005 14:47 PDT

Six Degrees of Wikipedia does what you ask. It uses iterative
deepening depth first search shortest path

http://kohl.wikimedia.org/~kate/cgi-bin/six_degrees

Here's the source code:
http://www.qwikly.com/code/Six%20Degrees%20of%20Wikipedia/linksc.cc
http://www.qwikly.com/code/Six%20Degrees%20of%20Wikipedia/linksd.cc

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: x86guru-ga on 19 Oct 2005 17:48 PDT

breflection
Isn't that pretty much the same algorithm state by the other user. It
looks like its a recursive algorithm going through all possiblities.
I think the reason why its fast on wiki site is because they have
around 10 linkes per page so we are talking around 100 links to sort
through before you get to the final destination. In my case we are
talking thousands so it needs to be faster than that. Unless something
I missed?

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: breflection-ga on 21 Oct 2005 16:09 PDT

I posted it principally because the source code is there. It's a
decent algorithm for what you want to do, but 1000 degrees of
separation is just gigantic no matter how you think of it!

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: x86guru-ga on 22 Oct 2005 14:56 PDT

Well its not a thousand degrees of seperation its more like 1000's of
nodes in the 6 degrees of seperation.

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 04 Nov 2005 12:16 PST

I strongly suggest an industrial grade database (Oracle), it supports
trigger as well as stuff like "select friendid from friends where
personid in (select friendid from friends where personid=XXXX", and
Union/Intersect/Minus, it is also able to cache most of the database
in memory.

You would have two tables, for first and second degrees

create friends(personid number ,friendid number);
create friends2(personid number ,friendid1 number,friendid2 number);

friends2 is derivative from friends, it contains multiple rows for
each friendid1 with their friends in friendid2.

you would have a trigger on insert,update,delete of friends that
inserts/updates in friends2.

for undirected you would always say stuff like:
select friendid f from friends where personid=XXXX 
union
select personid f from friends where friendid=XXXX 

how to find degree of separation between A and B:
This algorith is faster with close degrees of separation and gets
slower if they are farther away from eachother.
we will work from both sides, the basic queries are quite simple
(select friendid from friends where personid=XX and select friendid2
from friends2 where personid=XX)

1. check if any of them is a first degree friend of the other from
table friends if yes --> finished, degree=1
2. check if any of them is a second degree friend of the other from
table friends2 if yes --> finished, degree=2
3. 
((query for 1st degree friends of A) intersect (2nd degree friends of B))
union
((query for 1st degree friends of B) intersect (2nd degree friends of A))
if found --> finished, degree=3

4.
((query for 2nd degree friends of A) intersect (2nd degree friends of B))
union
((query for 2nd degree friends of B) intersect (2nd degree friends of A))
if found --> finished, degree=4

5. Complex queries can also be done for degrees 3 on both sides.
query for 3rd degree is 
(select friendid2 from friends2 where personid in (select friendid
from friends where personid=XXX)

((query for 2nd degree friends of A) intersect (3nd degree friends of B))
union
((query for 2nd degree friends of B) intersect (3nd degree friends of A))
if found --> finished, degree=5


6.
((query for 3nd degree friends of A) intersect (3nd degree friends of B))
union
((query for 3nd degree friends of B) intersect (3nd degree friends of A))
if found --> finished, degree=3

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 04 Nov 2005 12:16 PST

sorry, for the last line: degree=6

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 04 Nov 2005 12:56 PST

another idea would be to have your own db-like daemon, you have a
(huge) process written in a compiled language (maybe C/C++), it
handles all of the friend data, it uses a database for redundancy
only, but has a complete copy of the friends data :

struct sfriend {
   int personid,
   friend *friend
};

the process is the only one to deal with filling the database, from
php you would only select, never insert or update, these will pass
through this program, it adds it to the memory structure and then
reflects it in the database.

when the program starts (probably as a daemon), it will fill up it's
memory structures from the database.

doing all the "shortest path" algorithms should be ultra fast since
it's all in memory (even with 1000000 nodes, should take milliseconds
on a modern computer).

the only thing you need to worry about is the communication between
php and this daemon, you need to handle:
remove_friend(personid, friendid);
add_friend(personid, friendid);
find_degrees(personid1,personid2);

find_degrees could return ready to use HTML for the PHP script to display.

find_degrees could cache the last XXX requested relations in the
database for a specified amount of time table lastdegree(personid1
int,personid2 int,degreeHTML text,expirytimestamp int); you would
delete all expired caches, then look for a ready to use result, and
return it if found, otherwise run the algorithm.

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: crazyloco-ga on 06 Nov 2005 09:13 PST

I am bored, so I have more comments:
If you want to store all the relations and distances/paths in your
database, here's a calculation of the required data size:

Since you don't need to store the distance from both sides, and you
would not store the distance between a node and itself, so you have to
use N*(N-1)/2 cells of an adjecency matrix (you would not store more
than half of it).

You would store the distance,and path between every two person nodes Pi and Pj.

create table distances (node1,node2,distance,path1,path2,path3,path4) 
since you will study up to 6 degrees, we may need up to 4 intermediate people.

You would store a distance of 1 to 5 if there a no relational path of
6 degrees or less.
You would store a distance of 0 if there is no relational path.
You would store a distance of -1 if the distance is invalid/unknown
(still to be calculated).

You simply lookup the distance and path for easy displaying in your php script.

When you add a new relation between two people, all paths that contain
any of them will become invalid (set to -1), and will have to be
re-calculated.

simple for a new relation beween A,B: 

update distances set distance=-1 where node1=A or node1=B or node2=A
or node2=B or path1=A or path1=B or path2=A or path2=B or path3=A or
path3=B or path4=A or path4=B;

update distances set distance=1,path1=0,path2=0,path3=0,path4=0 where
(node1=A and node2=B) or (node1=B and node2=A);


A batch process (back-end) would constantly update the relations
database, and fix all those -1 relations (when recalculated they may
even pass through another path that is shorter, you never know) this
could be a quick/optimized C/C++ program.

when you ask for the relation between you and a node, and it gives -1,
you would tell the user that the info is temporarily unavailable.

let's assume a database of 1,000 users and up to 6 degrees of separation:
every relation would have two people and up to 4 other people, so if a
person's is stored in 6 bytes: you would need 6* (1000*999)/2 =
2997000 bytes (3MB)

for 1,000,000 people: 6*1000000*999999/2 = 2999997000000 bytes
=2999997 MB = 2999GB = 3TB

so i't not really scalable to 1,000,000 people.

Subject: Re: DB and Algorithm solution for Degree of seperation between friends
From: x86guru-ga on 08 Nov 2005 23:39 PST

Some good ideas. However I dont think its scalable to keep a database
of every path and relationship between users because updating that
database will take forever. And the database size will grow
exponentially not linearly. Because for every 1 user added could
result in 1,000 extra records and actually with your schema i think it
would result in more than that because of how 1 new user can affect
other peoples paths. What do you think about that? Or am I missing
something?

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy