I am trying to build a statistical model for improvements to my
building. My goal is to look at suggestions from a few residents and
then pick the one that is most popular - statistically speaking.
Think of the problem as a filtering or a funnelling system, there a lot
of ideas at the top and then you keep on filtering till you reach a
The process can be broken down in the following steps:
Step 1: Run a survey with 20 residents, asking them to suggest 1
improvement to the building. I am expecting 1-20 unique improvement
ideas from this survey.
Step 2: Feed the ideas from Step 1 to another set of 20 residents, and
have them pick the top 3 that they like. Say Improvement # 7, 8 & 9
were picked the most.
Step 3: Feed the ideas from Step 2 to another set of 20 residents, and
have then pick the top 1 improvement. Say improvement # 7 was picked
the most. That would be our winner improvement.
If there is a tie, in any of these steps, I can either re-run that
entire step OR re-run that step with just the 2 tie results till I can
find the winning improvement idea. So, as you can see the number of
steps can grow quite long.
How can I build this model correctly? Is this problem similar to any
other statistical problem? I have the flexibility to change the number
of residents or the logic to pick results or the number of solicited
improvements. But, I have some ground rules & assumptions that are
1. The responders at any step can never be more than 20 but they can be
2. The winning improvement must be reached sometime within 6 steps.
3. No resident can contribute twice to the survey results.
4. There are infinitely large number of residents in the community.
One more concern: Is it good enough to just pick the most frequently
picked improvements at every step -- the MODE -- I mean it sounds
logical when I say that #7, 8 & 9 were picked the most in Step 2, and
go to the Step 3 for further review, but I'm wondering if there is a
more statistical approach to pick the winners at each step.