 View Question
Q: buckets, balls, marking, replacement -- what is the distribution? ( No Answer,   8 Comments ) Question
 Subject: buckets, balls, marking, replacement -- what is the distribution? Category: Science > Math Asked by: ken13-ga List Price: \$25.00 Posted: 18 Sep 2006 19:15 PDT Expires: 18 Oct 2006 19:15 PDT Question ID: 766437
 Consider a bucket with initially unmarked balls in it (say, M balls). We now do the following: 1) Randomly choose a ball from the bucket 2) Mark it (if the ball is already marked, we can mark it again) 3) Put the marked ball back in the bucket 4) Repeat (1) thru (3) N times At the end of the procedure above, what is the mean number of marked balls? This is easy, I can answer it. What is the distribution of marked balls (or, at the least, the variance of the number of marked balls)? I cannot answer this and seek help from google answers. The mean number of marked balls is as follows: (1 ? (1/M)) is the probability that a randomly chosen ball is not marked after the first trial. After N trials, the probability of any randomly chosen ball being not marked is (1 ? (1/M))^N. So, the probability of a given ball being marked is 1 ? (1 ? (1/M))^N. And, thus the mean number of marked balls is M *(1 ? (1 ? (1/M))^N). I first thought that the distribution of marked balls (or of unmarked balls) would be binomial, but appears not to be. The distribution (experimentally validated) is normal-shaped and with a variance less than that of the binomial distribution. A 5 star answer will tell me 1) an expression for the distribution of marked balls, and/or 2) how to analytically compute the variance. I need this result for a paper I am working on. I will be happy to acknowledge the answerer in the paper if he/she desires this. Any ideas? There is no answer at this time. Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: ansel001-ga on 23 Sep 2006 15:37 PDT
 You can compare your question to rolling dice. Marking a ball that is drawn is equivalent to rolling a number. After a given number of rolls, how many numbers have been rolled at least once? To find the distribution, first create a state transition matrix. For example, if you have already rolled two different numbers, what is the probability of rolling a third number on your next roll? There is a 2/3 chance of rolling a new number and a 1/3 chance of rolling a number you already rolled before. State transition matrix To From 0 1 2 3 4 5 6 Total 0 0 1 1 1 1/6 5/6 1 2 1/3 2/3 1 3 1/2 1/2 1 4 2/3 1/3 1 5 5/6 1/6 1 6 1 1 Now apply these probabilities to get the distribution below. Number Probability Distribution of Number of Different Numbers Of Dice That Will Come Up at Least Once Rolled 0 1 2 3 4 5 6 Total 0 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 1 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 2 0.0000 0.1667 0.8333 0.0000 0.0000 0.0000 0.0000 1.0000 3 0.0000 0.0278 0.4167 0.5556 0.0000 0.0000 0.0000 1.0000 4 0.0000 0.0046 0.1620 0.5556 0.2778 0.0000 0.0000 1.0000 5 0.0000 0.0008 0.0579 0.3858 0.4630 0.0926 0.0000 1.0000 6 0.0000 0.0001 0.0199 0.2315 0.5015 0.2315 0.0154 1.0000 7 0.0000 0.0000 0.0068 0.1290 0.4501 0.3601 0.0540 1.0000 8 0.0000 0.0000 0.0023 0.0690 0.3646 0.4501 0.1140 1.0000 9 0.0000 0.0000 0.0008 0.0360 0.2776 0.4966 0.1890 1.0000 10 0.0000 0.0000 0.0003 0.0185 0.2031 0.5064 0.2718 1.0000 As you can see, after 4 rolls of the die, the distribution of the number of different numbers rolled is: 1 .0046 2 .1620 3 .5556 4 .2778 1.0000 Hope this helps.
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: berkeleychocolate-ga on 25 Sep 2006 14:44 PDT
 This is very similar to an earlier question that ansel001 and I and others worked on. I don't remember its number. Perhaps ansel001 does. The transition matrix is diagonalizable, and the matrix Q diagonalizing it is essentially Pascal's triangle, made into a triangular matrix. Then one can theoretically compute the distribution after n draws since one can write down a closed expression for any entry of the nth power of the matrix.
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: ansel001-ga on 25 Sep 2006 16:50 PDT
 I also remember the similar question, but not its number. If there is a closed form for the equation as berkeleychocolate surmises, I am unaware of it.
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: barneca-ga on 26 Sep 2006 10:18 PDT
 in case it helps anyone, the question was, i believe, #736681. just searched for berkeleychocolate and ansel001 and this was the only one that looked similar. -cab
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: berkeleychocolate-ga on 28 Sep 2006 17:33 PDT
 Actually the mean number of balls marked is M-(M-1)*(1-1/M)^N, not what you originally calculated. The reason your formula is close but not quite correct is that the trials are not independent. So you can't just raise (1-1/M) to a power. The formula above was determined just for M=3 and M=4 by diagonalizing the transition matrix of ansel001 along the lines discussed in the earlier problem #736681. The pattern was so clear that I jumped to the conclusion of the above formula. I looked at the standard deviation, but the pattern was not so clear.
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: berkeleychocolate-ga on 28 Sep 2006 22:32 PDT
 I computed the variance (square of standard deviation) for M = 3, 4, and 5 and the pattern now seems clear: The variance is (M - 1)* (1 - 1/M)^N + (M - 1)*(M - 2)*(1 - 2/M)^N - (M - 1)^2 * (1 - 1/M)^(2N). Perhaps someone wants to check this for a few M's and N's to confirm the pattern.
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: qresq-ga on 29 Sep 2006 19:14 PDT
 Let X_i be 1 if i'th ball is unmarked. You ask for information on N-X where X=\sum X_i. as you note, E(X) = \sum E(X_i) = N E(X_1), and the expectation is known since E(X_1) = (1-1/M)^N. For the variance, E(X^2) = \sum_{i,j} E(X_i X_j) = N E(X_1^2) + {N \choose 2} E(X_1 X_2). note that E(X_1 X_2) = (1-2/M)^N, which gives a closed formula for the variance. The third and other moments can be calculated similarly. As for the complete distribution, there is no obvious distribution, but there are several cases. If N<
 Subject: Re: buckets, balls, marking, replacement -- what is the distribution? From: berkeleychocolate-ga on 02 Oct 2006 15:55 PDT
 Using ansel001's numbers for M=6, N=4 we get a mean of 3.1066. Using my formula for the mean 6-5*(1-1/6)^3 we get 3.1065 (My N is 3, not 4 since the first draw doesn't count. A ball gets marked.) Using the formula of ken13 and qresq we get 6*( 1-(1-1/6)^4 ) = 5.4823 or 6*( 1-(1-1/6)^3 ) = 5.5787. If you accept ansel001's matrix, then their formula is not correct. As I said, they are incorrectly assuming independence. (This was already discussed in problem #736681.` 