We have to put some assumptions around the bare bones of what you've
stated in order to make mathematical sense of the question.
Suppose you have a sample of 500 balls, drawn independently from an
inexhaustable supply of balls both red and non-red. The supply has 3%
red balls, so that the "expected value" of the number of red balls in
such a sample is apriori 15. However in some particular sample you
see only 5 balls.
With these assumptions we have define a binomial distribution. Each
ball sampled is red with probability 0.03 and non-red with probability
0.97. Using the familiar binomial expansion of:
( 0.03 + 0.97 )^500 = 1
we can identify the probability of: 0 red balls, 1 red ball, 2 red balls, etc.
The terminology of "significance values" is confusing to most people
in large part because it is a well-intentioned attempt to put the cart
before the horse.
Instead of saying, gee, I wonder what the chance of getting 5 or fewer
red balls in this situation, where 15 would be the average, the
professional statistician will seize on that same (presumably small)
likelihood and proclaim that it is the "p value" (significance level)
of the null hypothesis.
The reason for this circumlocution is that the statistician has been
asked to apply what is known about the single sample drawn to the
larger question of whether or not we should accept the claim that the
balls were drawn in the manner described, aka the null hypothesis.
The logical answer is that we have no idea. It's not like a case
where we were promised that jelly beans would be drawn from a supply
with no licorice ones, only to find that there are 5 out of 500 of
them in the sample. Instead it's a "shades of grey" situation,
possible for the sample to occur, but perhaps so unlikely as to shake
our confidence in the original assumptions.
In this case the chance of getting 5 or fewer red balls in a random
sample of 500 is calculated like this:
Pr( 0 red out of 500 ) = C(500,0) * (0.97)^500
Pr( 1 red out of 500 ) = C(500,1) * (0.03) * (0.97)^499
Pr( 2 red out of 500 ) = C(500,2) * (0.03)^2 * (0.97)^498
Pr( 3 red out of 500 ) = C(500,3) * (0.03)^3 * (0.97)^497
Pr( 4 red out of 500 ) = C(500,4) * (0.03)^4 * (0.97)^496
Pr( 5 red out of 500 ) = C(500,5) * (0.03)^5 * (0.97)^495
The probability of 5 or fewer red balls is then:
Pr( 5 or less red out of 500 ) [total of six terms above]
or roughly 0.25%. In particular the difference between what was
observed (5 red balls) and what was expected (15 red balls) appears to
be significant at the 5% level.
Common sense tells us our assumption that the sample was independently
drawn from a "population" containing 3% red balls is shaken. However
we cannot simply say "the chance the population contains 3% red balls
is fewer than one in a thousand", because logically this is a quite
different statement from what we know (that _if_ the population
contained 3% red balls, then the chance of randomly sampling 500 and
getting at most 5 red ones is less than one in a thousand).
It is for this reason that the statistician adopts the terminology of
significance values in describing the relationship between a sample
and a hypothesis about the underlying population, which reasonably
puts a burden on us to stop and think about what is meant.
* * * * * * * * * * * * * * * * * *
To determine what sample size would be sufficient for pval 5% "if
[you] had observed 5" we must also specify what is to be assumed about
the expected number of red balls. That is, if you keep as constant
the expected number of 15 red balls, regardless of sample size, this
calls for different calculations than if we were keeping a 3% fraction
of red balls, independent of sample size. Generally experimental
design must deal with the second sort of question, ie. assuming the
expected fraction of observations rather than the absolute expected
number of observations is independent of sample size, and then
choosing a sample size based on that.
But allow me the liberty of taking first the interpretation to be one
of changing only the sample size, keeping the observed 5 and expected
15. In this case we see that decreasing the sample size only makes
the observation increasingly signficant! Imagine the limiting case of
a sample of 15, in which all the balls were expected to be red, but
only one-third turned out to be so!
To confirm this intuition, let's run through the calculation with a
sample size of 100 (in place of 500 before):
Pr( 0 red out of 100 ) = 0.85^100 = 0.0000000874767...
Pr( 1 red out of 100 ) = 100*0.15*0.85^99 = 0.0000015437071...
Pr( 2 red out of 100 ) = C(100,2)*0.15^2*0.85^98 = 0.0000134847356...
Pr( 3 red out of 100 ) = C(100,3)*0.15^3*0.85^97 = 0.0000777355349...
Pr( 4 red out of 100 ) = C(100,4)*0.15^4*0.85^96 = 0.0003326623626...
Pr( 5 red out of 100 ) = C(100,5)*0.15^5*0.85^95 = 0.0011271383581...
Pr( 5 or less red out of 100) = 0.0015526521750...
or roughly 0.16%, i.e. more significant (less probable) than the
result previously considered of observing 5 out of 500 when 15 were
On the other hand a different sort of complication ensues if we keep 5
observed red balls and lower the sample size while maintaining a
population assumption of 3% red balls, namely that pretty quickly 5
red balls will meet or exceed the expected number. For example, a
sample size of 200 means that 6 red balls are expected, while a sample
size of 100 gives 3 red balls expected! In other words the point
sample size N at which 5 observed red balls is insignificant (at the
5% level) will not be terribly less than 500 just because expectations
will now drop in proportion to the sample size.
A good way of gauging where the 5% significance "break" occurs is to
look at the single largest term, namely Pr( 5 red out of N ). Clearly
for the total to be less than 5% requires in particular that this term
by itself be less than that.
Assuming 3% of the underlying population is red (and independently sampled):
N Pr( 5 red out of N )
Now we see that 5 observed red balls would not be significant at the
5% level for a sample size of 300 (relative to a population assumption
of 3% read balls), but that it might be significant for a sample size
of 400 (we need to add in the other terms, chances of observing less
than 5 red balls, to be certain). But we can proceed in this way to
narrow down the smallest N for which observing 5 red balls would be
significant at the 5% level relative to a "null hypothesis" that the
sample is independently drawn from a population with 3% red balls.
I think however that the "right" question to ask about a breakpoint in
the sample sizes is this: At what sample size N is an observation of
1% or fewer red balls significant at the 5% level, relative to an
assumed population of 3% red balls?
I would choose a design question of this form, rather than the two we
considered earlier, because it sets both the observation and
expectation values as percents (fractions) relative to the sample
size. You may have good reasons, however, for asking the question as
you did, about what smallest N will make an observation of 5 red balls
significant (at the 5% level), but one should be leery of "shopping"
data looking for significance. At one level an exploratory study will
of course involve an exercise of this kind, but an experimental study
should have a clear hypothesis and sample size formulated before the
statistical analysis is performed. Otherwise a "reporting" bias can
be created in which only the unlikeliest aspects of one's experimental
observations are publicized.