I need some help using the kNN (k Nearest Neighbor) classifer to run
some tests and analyze the results. The tests I am trying to run
should probably be done in Matlab if possible because that has been
suggested as the easiest way to form graphs and simulate everything,
but other ways will work as well. (Just Matlab seems to be the easiest
way to go about it.)
Here is exactly what I am trying to do:
1. Create a syntheic domain which:
a) Has 1000 two dimensional examples
b) According to some geometric figure (can be any shape), label these
example with positive and negative classes, about half and half.
c) Create 5 different files, each of them with a different level of
added class label noise. The flip-flop p% of the class labels for p =
{5,10,15,20,25}
2. Next Implement the k-NN classifier.
3. Also Implement an algorithm to remove potentially noisy and
borderline examples using Tomek links.
4. Run tests
a) Split each data file into 10 random pairs for training/testing
sets. 50% examples for training and 50% for testing.
b) For different values of p (noise level), plot average
classification accuracies on the testing sets.
c) For the same subsampling, use the alogrithm for the removal of
boderline and noisy examples.
d) Again, with different values of p, plot average classificiation
accurates on the testing sets.
5. Summarize the data, algorithms and results and provide some graphs
to help discuss results.
I do not have too much experience with Matlab but I do have access to
it. If you would like to help using a different method just suggest it
and we can discuss it.
Thanks for any help! |