Google Answers Logo
View Question
 
Q: Probability Equation ( No Answer,   2 Comments )
Question  
Subject: Probability Equation
Category: Science > Math
Asked by: jims_nickname-ga
List Price: $30.00
Posted: 08 Jun 2005 17:29 PDT
Expires: 15 Jun 2005 14:57 PDT
Question ID: 531141
I am doing pay per click advertising and a probability question
occurred to me. I am looking for a mathematical equation that will
help me optimize my spending efforts.

For example, let?s assume that I have a banner ad. I will only keep
paying for this banner ad if it can sustain a 20% or greater
conversion rate for the next 100 clicks. This means that for the next
100 clicks, 1 out of five of them will result in a purchase from the
web site. Let?s call the conversions ?successes? and the clicks that
do not convert ?failures?.

So I decide to test the banner ad. Let?s assume that the banner
converts at the bare minimum of exactly 20%. I know that there will be
20 successes and 80 failures in the total 100 clicks. So my question
is what is the probability that X failures in a row will happen, where
X is any number between 1 and 100?

If we spread the successes out evenly among the failures, there will
be four failures in a row, then a success, then four more failures in
a row, then another success, etc. Therefore, we can deduce that the
probability that one, two, three, or four failures in a row will occur
is 100%, because no matter how you rearrange the successes and
failures, there will always be at a minimum this many failures in a
row. Conversely, the probability that 81 or more failures will happen
in a row is 0%, because there are only 100 clicks and twenty of them
are successes.

What I can?t seem to figure out is an equation that will tell me the
probability of the other numbers. For example, the probability that 6
failures will occur in a row is very close to 100%, but the
probability that 72 failures will occur in a row is probably close to
1%.

The way I see it, there are three variables for this equation. The
conversion rate (20%), the size of the sample set (100), and the
number of failures in a row (1 ? 100). I need to find an equation that
you can plug these three numbers into and it will spit out the
probability.

Request for Question Clarification by mathtalk-ga on 08 Jun 2005 17:47 PDT
Hi, jims_nickname-ga:

Are you asking for the probability that the _next_ K click throughs
will be failures (in which case the sample size of 100 doesn't really
enter the picture), or the probability that _somewhere_ in a sample of
100 consecutive click throughs, you will be able to find a string of K
consecutive failures?

regards, mathtalk-ga

Request for Question Clarification by leapinglizard-ga on 08 Jun 2005 18:02 PDT
This is not really a probability question but an exercise in
combinatorics. The conversion rate just tells you how many events in
the sample set are successes, so with a sample size of 100 and a
conversion rate of 20%, you have .20*100 = 20 successes and 100-20 =
80 failures. We can now model your problem by supposing that we have
80 black marbles and 20 white marbles. The question is this: how many 
ways are there to arrange these marbles in a row, and how many
arrangements feature K consecutive black marbles? If we divide the
latter number by the former number, we arrive at the probability that
a random arrangement will feature K consecutive black marbles.

So the problem is to compute these two numbers. There is a
conceptually simple way to do this, and the resulting formula in each
case is a series. In other words, it's a sum of multiple terms such
that each term has a similar form and the number of terms depends on
the constant values in your experiment. The value of each term in the
series, as well as that of the series on the whole, is typically so
big that you won't be able to compute them with a simple calculator.
You'll need either a fancy programmable calculator, a specialized
mathematical software package, or a program written in a
general-purpose programming language that can handle large integers. I
could write you such a program: it would be a Python script which you
could run with the freely available Python interpreter. You would plug
in your constants -- number of black marbles, number of white marbles,
number of consecutive black marbles -- and it would produce the
answer.

Are you interested? If so, I'll provide a description of the necessary
series, an implementation in Python, and whatever help you need to get
the script running on your computer. Let me know if this would
constitute a satisfactory answer.

leapinglizard

Clarification of Question by jims_nickname-ga on 08 Jun 2005 20:21 PDT
Thank you for the responses.

LeapingLizard explained the situation perfectly. Excellent work. I
need to plug this equation into my C# program, could you provide the
implementation in C++, C#, or Java? I'm not terribly familiar with
Python...

Request for Question Clarification by leapinglizard-ga on 09 Jun 2005 08:34 PDT
Yes, I can do it in Java. I'll get to it later this afternoon.

leapinglizard

Request for Question Clarification by leapinglizard-ga on 09 Jun 2005 17:53 PDT
I'm afraid this problem has turned out to be more difficult than I had
anticipated. I have written a program that counts the arrangements in
a certain algorithmic style -- top-down recursion with memoization, if
you care to know -- that can solve the problem for small cases but
turns out to be far too slow for larger ones such as you envision. I'm
afraid I don't have the time right now to develop a more efficient
algorithm, so I'll step back and let other Researchers take on your
question if they wish. My apologies. It would be helpful, by the way,
if you were to mention how urgently you need to have this problem
solved.

leapinglizard

Request for Question Clarification by mathtalk-ga on 10 Jun 2005 18:48 PDT
Let me suggest an interpretation of the Question that leads to an
easier calculation for leapinglizard-ga to implement.

In the next 100 clicks, each has an independent chance of success of 20%.

The probability that some run of 6 "failures" in a row will occur can
be calculated by constructing a 7 by 7 matrix, let's call it M:

  0.2  0.8  0.0  0.0  0.0  0.0  0.0
  0.2  0.0  0.8  0.0  0.0  0.0  0.0
  0.2  0.0  0.0  0.8  0.0  0.0  0.0
  0.2  0.0  0.0  0.0  0.8  0.0  0.0
  0.2  0.0  0.0  0.0  0.0  0.8  0.0
  0.2  0.0  0.0  0.0  0.0  0.0  0.8
  0.0  0.0  0.0  0.0  0.0  0.0  1.0

This is a so-called probability transition matrix, which represents
the chances of going from one state to another:

  Never had 6 failures in a row & immediate past is 0 failures in a row.
  Never had 6 failures in a row & immediate past is 1 failures in a row.
  Never had 6 failures in a row & immediate past is 2 failures in a row.
  Never had 6 failures in a row & immediate past is 3 failures in a row.
  Never had 6 failures in a row & immediate past is 4 failures in a row.
  Never had 6 failures in a row & immediate past is 5 failures in a row.
  Had 6 failures in a row at least once in the past.

The last state is an "absorbing" state; once you've attained six
failures in a row once, it defines the overall outcome for all the 100
clicks.

Now one only has to raise M to the power 100 and examine the upper
right hand entry of the result, which will be the chance of attaining
six failures in a row at least once during a run of 100 clicks.

regards, mathtalk-ga

Clarification of Question by jims_nickname-ga on 13 Jun 2005 07:12 PDT
Wow, lots of intense thinking going on! Thank you all. I didn't
realize this was such a tricky problem :)

I think manuka-ga has come very close to the answer, but I could not
deduce an actual equation that I can plug into my program (perhaps I
just missed it).

Can someone state an explicit equation (in math or preferably a c
based programming language) that I can use? I am increasing the payout
an additional 10 dollars.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Probability Equation
From: bonhommeenmousse-ga on 08 Jun 2005 18:49 PDT
 
if I understand your problem correctly, what you re looking for is the
bernouilli equation.
and btw, yor reasoning is false concerning the number you give. when
you throw a dice, you have one chance out of 6 to get a 1, that does
NOT mean that if you get 5 times something else than 1 then you will
get a 1 at the 6th time, so the probability is not 100% then, you
could get 100 times something else than a 1 even though the
probability indicates hat it should happen in average every 6 times.

anyway.

let say p is the probability to have a click converted (here p=0.2)
let say n is the set of clicks you want to work on (here the next 100
clicks, n=100)
and k is the number of clicks converted in the set

then the probability p(X=k) of having k clicks converted is
p(X=k)=n(n-1)...(n-k+1)/(1*2*...*k)  *  P^k * (1-p)^(n-k)
if k=0 p(X=k)= (1-p)^(n)

examples (trying to get different scenario depending on what you re looking for)

* probability of having EXACTLY 20 clicks on the next 100 clicks
assuming my conversion rate is 20%
P(X=20)=100*99*98*...81/(1*2*...*20)  * (.20)^20 * (.80)^80=0.0993= 10%

* probability of having exactly or less than 20 clicks in the next 100:
P(x<=20)= p(X=20)+P(X=19)+...+P(X=0) =...

*probability of having 0 clicks in the next 100 (still with the same assumptions)
P(X=0)= .8 ^ 100 = very close to 0% but not EQUAL (like the example of the dice)

NOW, what I think interests you more:

* probability of having  the next 4 clicks not converted for a global
rate of 20% conversion:
- we consider the next 4 clicks: n=4  and p=20%
- we want P(X=0)=.8^4=0.40 = 40%

* prob of havig the next 10 clicks not converted
P(X=0) = .8^ 10 =  10%

* prob of having 3 clicks or more in the next 5(n=5) (still with 20%)
P(X>=3)=P(X=3)+P(X=4)+P(X=5)
Subject: Re: Probability Equation
From: manuka-ga on 10 Jun 2005 02:46 PDT
 
bonhommeenmousse-ga, I think you misunderstand jims_nickname-ga's
intent. He has specified that we do get a test set with 20 successes
and 80 failures, rather than taking 100 samples from a random variable
with p=0.2 - but actually I think this is a minor issue (it would
probably only have a small impact on the results). More importantly,
he's interested in the distribution of constant sequences among the
100 results - how many of the 2^100 (approx 10^30) outcomes (random
variable) or 100 choose 20 (approx 5 * 10^20) outcomes (combinatorial)
 have a maximum consecutive string of failures of a given length
somewhere. The Bernoulli equation is not going to be much help.

For instance, the probability of getting 80 failures in a row (I'm
operating on the assumption that there are exactly 80 failures and 20
successes, as jims_nickname-ga indicated) is given by
21 / (100, 20) (the string of failures can start at positions 1
through 21, finishing at 80 through 100; one that position is picked,
the result of every trial is determined) ~ 4 * 10^-20. [I'm using (a,
b) to represent a choose b.)

For 79 down to 41 failures, we can use the following logic (I'll go
through the case for 78 first (79 doesn't illustrate it well) and then
give the general formula).
We can have 78 failures at either the beginning or the end, with one
success after or before respectively, and (21, 2) combinations for the
places to put the other failures. We can also have 78 failures in the
middle (21 possible locations) with a success at each end of the
block, and (20, 2) combinations for the places to put the other two
failures. The total number of combinations is 2*(21, 2) + 21*(20, 2) =
420 + 3990 = 4410 and the probability is 8 * 10^-18. In general, for n
between 41 and 80 inclusive, the number of combinations is 2*(99-n,
80-n) + (99-n)*(98-n, 80-n) which simplifies to 21 * (99-n, 19) [note
that (99-n, 80-n) is the same as (99-n, 19)].

Here's a partial table for you:
   80        79        78        77        76
4*10^-20  8*10^-19  8*10^-18  6*10^-17  3*10^-16

   75        74        73        72        71
2*10^-15  7*10^-15  3*10^-14  9*10^-14  3*10^-13

...

   45        44        43        42        41
7*10^-6   1*10^-5   2*10^-5   2*10^-5   4*10^-5


Note that jims_nickname-ga actually wants the totals of these down to each number. 

As you can see we're still in pretty minor territory here. The chance
of getting 41 or more consecutive failures is only 1.1*10^-4, or
0.011%. Jims_nickname was out quite a bit with his estimate of the
probability of 72 or more consecutive failures - this is not "close to
1%" but 0.00000000001%.

However, once we get down to 40 and less it gets harder, because we
have to cope with the possibility that there are two (or, from 26
down, more) blocks of that length. If we put 40 into the above formula
we will get an overestimate because combinations like ... (40
failures) (1 or more successes) (40 failures) ... will be counted
twice.

For n = 40 down to 27, we can have at most two blocks of that size.
How many ways can we do this? We could have:
 - one block at each end, with a success in the n+1th position from each end;
 - one block at the beginning or the end, with a success after or
before it respectively, with another block in the middle
 - two blocks in the middle, with a single success separating them
 - two blocks in the middle, each with a success before and after, and
arbitrary results in between.
The number of possible combinations in each sub-case are:
 - (98-2n, 18)
 - 2*(98-2n)*(97-2n, 17)
 - (98-2n)*(97-2n, 17)
 - (98-2n, 2) * (96-2n, 16)
totalling 667 * (98-2n, 18). This needs to be subtracted from the
formula given in the first part.

The total probability of getting 27 or more consecutive failures is
still only 1.68%, however.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy