Google Answers: affinity grouping for lists?

View Question

Q: affinity grouping for lists? ( Answered, 1 Comment )

Question

Subject: affinity grouping for lists?
Category: Computers > Algorithms
Asked by: mxnmatch-ga
List Price: $20.00

Posted: 18 Feb 2003 12:19 PST
Expires: 20 Mar 2003 12:19 PST
Question ID: 163100

I have a bunch of user accounts, each of which has a list of games
that they have in their collection. I would like to find other users
with similar gaming interests.

However, many users have 100s of games in their collection and there
could be hundreds of thousands of users. How can I find users with
similar gaming interests given a list of games in the current user's
account?

One idea I had was to simply group games by genre. The number of
genres isn't too large (maybe 100 to 200). I could count the number of
games every user has in each genre and save that in a table. However,
given that information I don't know how I would find similar users. If
that's too much info to start with I could just count the top 10
genres in each collection (by the number of items in each genre) and
then use that. But I still wouldn't know how to use that information.

Another possibility is to create some sort of hash code for the top 20
games in a user's collection. Users can give their games a rating from
1 to 10, so I could use that information to determine what that user's
favorite 20 games are.

However, again, I don't know how I would use that hash code or even
how I would generate that hash code.

Anyone have any ideas? I know that there could be many ways to look at
a collection of games to see if another user has similar interests.
However, I don't know how I would implement any of them.

Answer

Subject: Re: affinity grouping for lists?
Answered By: maniac-ga on 18 Feb 2003 19:31 PST

Hello Mxnmatch,

I cannot be sure if the suggestions I am making will scale to the
range of hundreds of thousand users. The items listed below address
perhaps 10 to 100 times less relationships as described but may still
run efficent enough for your application. I can also make some
suggestions related to encoding (if by hash or another method) with
references to further information.

The first reference is at
https://doc.telin.nl/dscgi/ds.py/Get/File-22786/iadis_setten_final.pdf
which has the following abstract
"An important step in providing personalized information is
predicting the level of interest in information for a specific user.
This paper describes a technique that predicts this level of interest
for information that is described by a set of categories. The
technique is tested in a movie recommendation system and compared with
the social filtering prediction technique. The results show that the
category-based prediction technique outperforms social filtering."

This may be the most direct fit to your application - it describes the
ratings of movies and help predict the interests of users to new
movies based on assignment to one or more genres.

This paper describes predictions of user interest based on the
MovieLens data set which included 100,000 ratings of 1682 movies by
943 users. The algorithms are described (including equations), a
number of references to other papers, and results of the method they
recommend as it compares with "social filtering".

This next article describes an application in use at Boeing to help
users find "experts" in a specific application area. In your
application - you can take the game concepts (e.g., role playing,
action, magic, modern, historical) and develop a network of concepts.
Assign each game to a set of those concepts and allow the user to
describe the kind of game they are looking for and then provide
matches based on the "best fit" (see the paper for more details). The
Boeing application has a huge set of concepts (over 30,000) and
relationships (about 100,000). Also note the distance of relationships
should be strictly limited - providing an aid in performance. The
paper is available on Citeseer at
http://citeseer.nj.nec.com/309406.html
in PDF, postscript, or links to the original site at
http://www.cs.utexas.edu/users/pclark/papers/
This is included because it takes a different look at the "encoding"
problem by using a network instead of specific classifications.

An interesting site describing "collaborative filtering" is at
http://cmc.dsv.su.se/select/rating-choices.html
Scroll down to figure 1 for an illustration of data flow in a system
defined to collect rating information and feed back to users
information based on those ratings. There are also tables that
describe the kind of data to be exchanged and a top level description
of each module to be implemented. A pretty extensive set of links at
the bottom provide more references to follow up as needed (though a
few I tried were broken - sigh).

Another possibility to gather information is to track users as they go
through your site. For example, a "Personal Web Watcher" is described
in a postscript file at
http://www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/pww/papers/PWW/pwwACAI99.ps.gz
or in text in Google's cache at
http://216.239.53.100/search?q=cache:dikpTRt-WB0C:www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/pww/papers/PWW/pwwACAI99.ps.gz+source+code+compare+interests+people+social+filtering&hl=en&ie=UTF-8

which tries to categorizes interest by watching the user browse sites
and be able to make recommendations based on that browsing.

Those are a few of a number of very interesting sites that describe
methods or tools used to analyze information to provide guidance to
users.

There are some other ideas you may want to consider including features
at amazon.com such as:
- recommendations (based on what other people bought)
- sponsored reviews (by Amazon)
- "highly rated" reviews (by users - with high rankings by other
users)
and so on. Google of course provides information in a variety of ways
- such as search and directory. Looking at some "best of breed"
companies would be another way to get ideas for features to implement.

Search phrases used included:
algorithm compare interests people
source code compare interests people "social filtering"
a few other search phrases that could be mixed in include:
"rating system"
"cooperative filtering"
and so on.

If there is one of these you want me to research in more detail or
search harder for source code that you may be able to draw on, please
use a clarification request.

--Maniac

Comments

Subject: Re: affinity grouping for lists?
From: mathtalk-ga on 18 Feb 2003 19:45 PST

Hi, mxnmatch-ga:

I was working on this comment while maniac-ga was providing the
detailed answer above. I wanted to follow up on your idea of counting
the users' games in each genre.  This gives a nonnegative vector of
length 100 or 200 (depending on the number of genres).

The issue is then one of assessing how "close" two such vectors are. 
Obviously if two vectors are equal, then their "distance apart" is
zero.

But one user may simply be more affluent than another, and the former
generally purchases more games in each genre than the latter, while
the overall tastes and proportions bought in each genre are quite
similar.

A simple approach would be to "normalize" the vectors as to the total
number of games.  A Euclidean distance can then be used to measure the
closeness of two results.  A more complex approach would be to weight
each genre inversely according to the average number of games owned by
all users in that genre, and then normalize.

You might also be interested in "clustering" as an algorithm that
builds on this notion.  One finds the closest two points (in that
high-dimensional space), and replaces the two points by one (two
point) cluster located at the average (of the two vectors).  The
process repeats until all the individual points eventually are grouped
into a single cluster.  The output is a kind of tree that shows at
what point individuals and clusters combined with one another on their
way to final unification.

I guess this would be overkill if your objective were merely to
identify for a given user which other users are most like that one. 
But it might be useful in mapping out the "genres" of users.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy