Hello Mxnmatch,
I cannot be sure if the suggestions I am making will scale to the
range of hundreds of thousand users. The items listed below address
perhaps 10 to 100 times less relationships as described but may still
run efficent enough for your application. I can also make some
suggestions related to encoding (if by hash or another method) with
references to further information.
The first reference is at
https://doc.telin.nl/dscgi/ds.py/Get/File-22786/iadis_setten_final.pdf
which has the following abstract
"An important step in providing personalized information is
predicting the level of interest in information for a specific user.
This paper describes a technique that predicts this level of interest
for information that is described by a set of categories. The
technique is tested in a movie recommendation system and compared with
the social filtering prediction technique. The results show that the
category-based prediction technique outperforms social filtering."
This may be the most direct fit to your application - it describes the
ratings of movies and help predict the interests of users to new
movies based on assignment to one or more genres.
This paper describes predictions of user interest based on the
MovieLens data set which included 100,000 ratings of 1682 movies by
943 users. The algorithms are described (including equations), a
number of references to other papers, and results of the method they
recommend as it compares with "social filtering".
This next article describes an application in use at Boeing to help
users find "experts" in a specific application area. In your
application - you can take the game concepts (e.g., role playing,
action, magic, modern, historical) and develop a network of concepts.
Assign each game to a set of those concepts and allow the user to
describe the kind of game they are looking for and then provide
matches based on the "best fit" (see the paper for more details). The
Boeing application has a huge set of concepts (over 30,000) and
relationships (about 100,000). Also note the distance of relationships
should be strictly limited - providing an aid in performance. The
paper is available on Citeseer at
http://citeseer.nj.nec.com/309406.html
in PDF, postscript, or links to the original site at
http://www.cs.utexas.edu/users/pclark/papers/
This is included because it takes a different look at the "encoding"
problem by using a network instead of specific classifications.
An interesting site describing "collaborative filtering" is at
http://cmc.dsv.su.se/select/rating-choices.html
Scroll down to figure 1 for an illustration of data flow in a system
defined to collect rating information and feed back to users
information based on those ratings. There are also tables that
describe the kind of data to be exchanged and a top level description
of each module to be implemented. A pretty extensive set of links at
the bottom provide more references to follow up as needed (though a
few I tried were broken - sigh).
Another possibility to gather information is to track users as they go
through your site. For example, a "Personal Web Watcher" is described
in a postscript file at
http://www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/pww/papers/PWW/pwwACAI99.ps.gz
or in text in Google's cache at
http://216.239.53.100/search?q=cache:dikpTRt-WB0C:www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/pww/papers/PWW/pwwACAI99.ps.gz+source+code+compare+interests+people+social+filtering&hl=en&ie=UTF-8
which tries to categorizes interest by watching the user browse sites
and be able to make recommendations based on that browsing.
Those are a few of a number of very interesting sites that describe
methods or tools used to analyze information to provide guidance to
users.
There are some other ideas you may want to consider including features
at amazon.com such as:
- recommendations (based on what other people bought)
- sponsored reviews (by Amazon)
- "highly rated" reviews (by users - with high rankings by other
users)
and so on. Google of course provides information in a variety of ways
- such as search and directory. Looking at some "best of breed"
companies would be another way to get ideas for features to implement.
Search phrases used included:
algorithm compare interests people
source code compare interests people "social filtering"
a few other search phrases that could be mixed in include:
"rating system"
"cooperative filtering"
and so on.
If there is one of these you want me to research in more detail or
search harder for source code that you may be able to draw on, please
use a clarification request.
--Maniac |