Google Answers Logo
View Question
 
Q: Epidemiological Statistics ( Answered 1 out of 5 stars,   11 Comments )
Question  
Subject: Epidemiological Statistics
Category: Health > Conditions and Diseases
Asked by: garyresearcher-ga
List Price: $50.00
Posted: 28 Dec 2002 22:29 PST
Expires: 27 Jan 2003 22:29 PST
Question ID: 134523
In an initial study the incidence of a specific disease was 145 cases
per 100,000 person-years (or 18/12,457 person-years).  In a different
study, using different methodology five or six years following the
initial study, the incidence rate was 307 cases per 100,000
person-years (or 122/39,676 person-years). Is it proper to state the
incidence rate ratio is 2.1 (307/145) with 95% C.I., 1.3 to 3.5?  What
is the proper manner to compare rates based on two different studies?
We can assume that ascertainment of cases was 100% in each case.
Answer  
Subject: Re: Epidemiological Statistics
Answered By: answerguru-ga on 29 Dec 2002 01:17 PST
Rated:1 out of 5 stars
 
Hi garyresearcher-ga,

The short answer to your question is that the method you have used to
compare rates based on two different studies is accurate under
certain, perhaps unrealistic, conditions.

I came across an excellent resource that gives a great outline of
epidemiology and its basic practices for measurement. These come in
the form the a PowerPoint slideshow but the Google-generated HTML
version is also available:

PowerPoint version:
http://www.iihe.org/education/lectures/epi1.ppt

HTML version:
://www.google.ca/search?q=cache:tF4Dzl1lDZMC:www.iihe.org/education/lectures/epi1.ppt+Epidemiology+definition&hl=en&ie=UTF-8

The first piece of key information concerns the definition of
epidemiology (slide 2):

"Study of the distribution and determinant of diseases and injuries in
human populations ":
- Concerned with frequencies and types of injuries and illness in
groups of people
- Concerned with factors that influence the distribution of illness
and injuries

Next, it is important to consider the fundamental assumptions
associated with epidemiology (slide 4):

 1. Disease doesn’t occur at random
 2. Disease has causal and preventive factors
 3. Disease is not randomly distributed throughout a population

With the information from both the problem statement and the facts
above, the following conclusions can be made regarding the accuracy of
a method attempting to compare rates based on two different studies
(in addition to the 100% ascertainment assumption):

1. The two groups being considered in each of the studies must be
judged as being "similar" demographically and perhaps geographically
if it is relevant to the disease being considered.
2. The time differential between when the two studies were conducted
must not affect the data that has been collected. For example, if the
two groups used are not mutually exclusive, then it is reasonable to
believe that the drastic increase could be a result of the disease
spreading. This is not desireable, since we are clearly comparing the
rates on the basis of the methodologies used in each study.
3. The non-random nature of disease is a threat to any statistical
method that assumes randomness in some way form during the course of
analysis.

On the basis of these conclusions, a "perfect scenario" for obtaining
this type of data would be to perform both methods at the same time on
the same group (this will satisfy the first two conclusions). Next,
calculating the incidence ratio would be fine since it makes no
assumptions as to where in the data set each case was likely to occur.
In order to calculate the confidence intervals, a specific
distribution type (such as the normal distribution) must be specified
along with a value for standard deviation. This appears to have been
done in the question, though this information seems to have been left
out of the problem statement.

If you have any problems understanding the information above, please
feel free to post a clarification and I will respond to it in a timely
manner.

Cheers!

answerguru-ga

Request for Answer Clarification by garyresearcher-ga on 30 Dec 2002 07:55 PST
When two different methodologies are utilized in epidemiology,
generally the comparison that I attempt to make is considered invalid.
Also, due to the rare events given (and associated very low
probabilities involved), it is standard to practice to use the Poisson
distribution.  A response that demonstrates experience beyond
elementary epidemiology would recognize the above factors as a given. 
Therefore, in view of the desire to use two different studies, and the
fact that there is likely an increased incidence of disease, what does
it take to convince an audience with vast epidemiological experience
that a true increase is indeed taking place?

Request for Answer Clarification by garyresearcher-ga on 30 Dec 2002 07:57 PST
When two different methodologies are utilized in epidemiology,
generally the comparison that I attempt to make is considered invalid
unless the initial study can be used as a surrogate for basline
incidence rate. Also, due to the rare events given (and associated
very low probabilities involved), it is standard to practice to use
the Poisson distribution.  A response that demonstrates experience
beyond elementary epidemiological principles would recognize the above
factors as a given.  Therefore, in view of the desire to use two
different studies, and the fact that there is likely an increased
incidence of disease, what does it take to convince an audience with
vast epidemiological experience that a true increase is indeed taking
place?

Request for Answer Clarification by garyresearcher-ga on 30 Dec 2002 07:58 PST
Please see comments above.

Clarification of Answer by answerguru-ga on 30 Dec 2002 09:16 PST
Hi again,

Your follow-up question is a good one, but very difficult to answer
definitively when you are dealing with an analysis does not provide
proof directly (ie. the data needs to be modified prior to analysis).
In any case, here is a suggestion of what you can attempt to do:

Simulate the second methodology using the group from the first study
(or vice versa). This would involve using some sort of modeling tool
(MS Excel will suffice) whereby you emulate a methodology with
predefined data. Alternatively (if data from both studies are
unavailable), you can always use data from a third group and simulate
the result with both methodologies. I'm not certain if this is
possible in your case but definitely a valid technique if you have
access to the group's data.

The goal, of course, is to show that the differences between the two
methods are statistically insignificant. This will then lead to the
implication that there actually has been a rise in the spread of this
disease.

Statistically speaking, I think what you are attempting to achieve is
likely to receive some criticism regardless of the technique you
employ. This type of scenario makes it very difficult to make a sound
statistical argument, and so I would not be surprised if you hear from
a few "nay-sayers". A water-tight argument would need comparability,
which by the nature of your problem, there doesn't seem to be here.
However, if you can obtain more studies that use each of the two
methods, you can use the (in)consistent ratios to show that there has
or hasn't been a true increase.

Let me know if there is anything further that I can do for you :)

answerguru-ga
Google Answers Researcher

Request for Answer Clarification by garyresearcher-ga on 30 Dec 2002 15:33 PST
What I am specifically looking for is the following:  Given the
probability of occurrence of disease in the initial study, what would
the probability of obtaining the disease incidence in the second
study.  By use of conditional probabilities, it should be possible to
show how rare a two to three-fold increase in disease incidence really
is.

Clarification of Answer by answerguru-ga on 30 Dec 2002 16:16 PST
While this is an interesting (yet quite different) approach in
comparison what we were discussing earlier, it is not statistically
possible to combine conditional probabilities with a
distribution-related problem such as the one being considered. The
reason for this is that conditional probabilities are of the form:

P(A | B) = the probability of A given B

These types of probabilities are such that the entire set of objects
being considered is mutually exclusive and collectively exhaustive.
For instance, the union of A and its complement must contain all
objects being considered (ie. each object must be in exactly one set).

However, if you have taken any introductory statistics courses you
will recall that an ratio comparison such as the one you are
describing follows a specific distribution whereby a more radical
increase is only minutely possible whereas a slight increase is fairly
reasonable to expect.

So how can two arbitrary ratio values be placed into a conditional
probability model? I hope that you can see from the above that these
are two completely seperate and unrelated statistical methods, and
therefore it is impossible to come up with an "integrated solution"
such as the one you are seeking.

Statistics can be used as a powerful tool in solving a wide array of
problems, however you must appreciate two drawbacks of statistical
analysis:
1. Real-life problems often only fit into one statistical model
2. Assumptions must be made when analyzing less-than-perfect scenarios

My own recommendation is that you follow what I have suggested in
prior correspondences and accept that the situation you are analyzing
does not perfectly fit a statistical model. The worst thing you can do
in this can is try to force an inappropriate method of analysis on the
problem. Don't be frustrated - your analysis will more highly regarded
if you do the best you can with statistical tools and then identify
the shortcomings of the results.

answerguru-ga
garyresearcher-ga rated this answer:1 out of 5 stars
Given P(a)=18/12,457=0.00144 and P(b)=122/39676=0.00307. Based on P(a)
we would expect 57 cases (computed as 18x39676/12457).  The question
is what are the chances (the probability of getting 122 cases (or an
additional 65 cases) given the expectation is just 57 cases?  This
cannot be that difficult a computation and would be very meaningful to
me.

Comments  
Subject: Re: Epidemiological Statistics
From: krobert-ga on 30 Dec 2002 19:15 PST
 
I have to go with the GA researcher on this one.  The argument that
you would expect 57 more cases in the other set based on computation
of ratios would only be valid if the incidence was completely random
which, by the definition of epidemiology above, is not true.

The answer of "what is the probability of this" depends on the data
given. You could play around with a make-believe set of data to try to
find this out... get standard deviations and such, or use a gaussian
distribution... but that depends on the factor of the set being
random... which isn't the case. So, basically other than the "this is
different than that" factor, you really can't prove anything by
statistically comparing these studies.

krobert-ga
Subject: Re: Epidemiological Statistics
From: answerguru-ga on 31 Dec 2002 01:11 PST
 
Hello again Gary,

I still do disagree, and the following deductive steps should help you
understand where your methodology is flawed:

1. Your ultimate goal is to perform a hypothesis test
2. Hypothesis tests require confidence intervals
3. Confidence intervals can only be obtained when specific
distribution information is known (such as distribution type, mean,
and standard deviation)
4. Data within a given distribution MUST be random
5. Epidemiology assumes that disease is NOT randomly distributed

Since the field of study restricts the assumption of randomness, and
the nature is distributions requires randomness, it is impossible to
fit this problem into a distribution without making assumptions such
as those mentioned earlier. Since a distribution is now no longer
possible, confidence intervals cannot be calculated and hypothesis
testing cannot occur.

Do you now understand why this is not so clear cut?

Under the rules of the Google Answers agreement, I cannot have contact
with you outside of this forum so I will not be able to provide my
email address. You will need to communicate through this board.

answerguru-ga
Subject: Re: Epidemiological Statistics
From: garyresearcher-ga on 31 Dec 2002 09:31 PST
 
I have a Statistics book, "Introduction to Statistics" by Ronald E.
Walpole.
In a worked example on page 6.16 of the book, there is a problem very
similar to the one I have been asking.  It states, "Suppose that on
the average 1 person in every 1000 is an alcoholic.  Find the
probability that a random sample of 8000 people will yield fiewer than
7 alcoholics."  The worked solution states, "Since p is very close to
zero and n is quite large, we shall approximate with the Poisson
distribution using mu=(8000)(0.001)=8, Hence if X represents the
number of alcoholics, we have Pr(X<7)=0.3134."

My problem has almost an exact analogy to this problem.  It is common
in epidemiological analyses of rare events to also assume a Poisson
distribution. This is not a problem and is an acceptable practice. The
book uses Chebyshev's theorem to compute confidence intervals.  What
is your criticism of this approach?
Subject: Re: Epidemiological Statistics
From: answerguru-ga on 01 Jan 2003 11:48 PST
 
Gary,

I believe I now understand why you believe so strongly that using this
method is correct. I have no criticism regarding the example you cited
because it seems to me that alcoholism can be seen as a random
occurance (though some may beg to differ). The reason for this is that
alcoholism is not seen as contagious nor hereditary...I believe that
this is the assumption that the example has made. In this case, I have
no problem with the use of the Poisson distribution nor the use of
Chebyshev's theorem for confidence intervals.

The disease that you are considering, however, may very well be
contagious and/or hereditary (you have not specified the disease so I
cannot say for sure). These two components define whether or not
randomness can be assumed...if either are true then you cannot assume
a random nature. Under this scenario, your example about alcoholism
would not be transferrable to the problem at hand.

answerguru-ga
Subject: Re: Epidemiological Statistics
From: garyresearcher-ga on 01 Jan 2003 13:29 PST
 
The disease is Herpes Zoster (shingles), which is regarded as a
disease that is not acquired by exposure to others.  The onset of
herpes zoster occures when a persons cell-mediated immunity
(declines), usually in old age (after around 50) and the latent VZV
virus (that one acquired as a child due to onset of
chickenpox--varicella) finally reactivates and HZ results.

My data shows that rather than a gradual age-related decline in
immunity which causes increasing HZ incidence with advancing age, it
is in reality, the loss of exogenous (outside) exposures to wild-type
chickenpox that causes the increasing incidence with age.  Exogenous
exposures to natural chickenpox in the community and the frequent
re-exposures actually boosted individual's immune system to suppress
the reactivation of HZ.  Now that a varicella vaccine is universally
given to children, I am finding that children with natural disease are
experiencing shingles at a rate two-fold to three-fold higher than in
the prelicensure era.  Appreciate your further comments on this
scenario.
I have three different studies using various methodologies that all
consistently yield the hypothesis given above.
Subject: Re: Epidemiological Statistics
From: answerguru-ga on 01 Jan 2003 14:36 PST
 
Although this question has clearly gone off on a tangent from its
original statistical nature, you last comment leads me to believe that
if an external factor is present (in this case exposure to natural
chickenpox), randomness cannot be assumed. In order for this
assumption to be made validly one must consider the following:

1. The external factor (natural chickenpox) must occur randomly
2. There is no correlation between age and the exposure to this
external factor

Clearly neither of these are true since, as far as I know, chickenpox
spreads in a contagious manner and is not randomly spread through an
arbitrary region. We already know that there IS a correlation between
age and exposure to natural chickenpox (in fact, the data you have
provided states it is a negative correlation).

answerguru-ga
Subject: Re: Epidemiological Statistics
From: garyresearcher-ga on 01 Jan 2003 20:12 PST
 
The incidence rate of HZ is computed only among those individuals that
have previously had varicella (chickenpox).  While varicella is
contagious, HZ is not contagious.  Individuals in the community that
have had chickenpox receive a boost when exposed to a child with
chickenpox; however, those with chickenpox are most infectious 2 to 3
days before the rash breaks out. Therefore, the individuals that
provide the exogenous boost or exposure are generally unknown.
There is a 2nd mechanism that also provides a boost in immunity--it is
called asymptomatic edogenous reactivation--this is what limits the
occurrence of HZ to about 500/100,000 person-years in the community,
even if there is no boosting from exogenous exposures.  Therefore, it
is the overall effect of varicella in the community that plays a role
in providing exogenous boosting in adults, and the overall effect of
varicella in schools that influence the reactivation of herpes zoster
in children.

While there is a correlation between age and chickenpox (i.e.,
children usually have onset of checkenpox in Grade K or 1st grade
(first exposure in school), or at a younger age due to exposure in
pre-school.  Herpes zoster, however, among those that have a prior
history of chickenpox is due to a decline in cell-mediated immunity
which is an individual body process that is accelerated in the absense
of varicella (chickenpox) disease in the school or community.  I have
as yet found no increase in the 10 to 19 year olds (who have a more
mature immune system and have sufficient CMI to suppress reactivation
of HZ at this time). Children aged <10 years, however, with immature
immune systems received repeated exogenous boosts due to exposure to
varicella in the community.

In view of the above explanation, "the external factor chickenpox" is
reduced throughout the entire community with universal vaccination of
children. Children previously had more exogenous boosts in the school
environment then say 30-40 year-olds who received only occassional
exposures when participating in activities with their children or when
shopping and coming into close proximity to other children that were
to have onset of varicella (but have not as yet broke out), so they
would not even recognize such a contact.

I am honestly trying to comprehend these assumptions and applicability
of statistical analysis myself and do not wish to offend anyone and am
trying to fully understand the scope of others comments. In view of
the above explaination, since I have stratified my analysis to a
specific age group and am not comparing across other age groups, I
feel I have not invalidated the statistical approach.  All children
presently receive less boosts due to fewer varicella cases in the
community since the introduction of varicella vaccine. There are
actually 70-80% fewer cases of chickenpox today then there were in
1995. Those children that remain that have had natural (wild-type)
disease are the ones that are affected by the reduction of exogenous
boosts they previously received from other children in the community
that had natural chickenpox.

In view of the above, could you please site where the assumptions
could still be in error. The issue has major consequences since
instead of a cost-benefit savings, I have computed a U.S. annual cost
of $90 million due to increases in morbidity and mortality of HZ
disease in adults for the next 30 years, rather than the $80 million
in medical savings due to universal varicella vaccination.
These figures assume varicella vaccination are 100% successful in
eliminating varicella disease; but this assumption was based on the
premise that there was no immunologically-mediated link between
varicella incidence prevalence and HZ incidence.
Subject: Re: Epidemiological Statistics
From: answerguru-ga on 02 Jan 2003 12:18 PST
 
Hi again Gary,

As is often the case with these types of analyses, the devil is truly
in the detail!

After considering the details of the disease as well as those
pertaining to your study, I can conclude the following about the
validity of the assumptions you would need to make:

1. With the new information you provided regarding HZ, I can see
validity in the assumption that occurances within the population you
are studying are truly random.
2. Since it is clear that chickenpox are a prerequisite for getting
HZ, your population must consist entirely of inidividuals who have
already had chickenpox. Furthermore, you must disprove that there is
any correlation between age and HZ (this only needs to be done for the
age ranges being considered in your study).

Since I can validate that these two elements have been conclusively
resolved (based on the sum of the information you have provided
throughout our correspondence), your use of an analysis similar to the
example you provided is now justified.

We really needed to dig deep to accurately answer this question, but I
think you now have a true understanding of the validity of your
analysis. This should help you when you face the group who will be
evaluating your methodology. I believe I have provided you with all
the information you required. If you now feel satisfied that your
question has been answered sufficiently, I would appreciate if you
could revise your rating/tip decision. If you are unable to do this on
the website, you can contact the GA Editors at:

answer-editors@google.com

Please cite Question ID #134523 and provide the information you would
like to update. I thank you for you patience, and I hope you are
satisfied with your Google Answers experience.

Best Regards,

answerguru-ga
Google Answers Researcher
Subject: Re: Epidemiological Statistics
From: garyresearcher-ga on 02 Jan 2003 13:08 PST
 
My only comment would be, that some statistical expert out there
should have taken my original data, and onsidered a binomial trail
with n=39676 and probability of success p=0.001445 and computed that
the probability of observing 122 cases would be (combination 39676 and
122)x0.001445 raised to the 122 power x 9.999855 raisted to the 39,554
power which is approx. equal to e (2.718...) raised to the negative
5.75302 power x 5.75302 raised to the 122 power, divided by 122
factorial = 1.6377738710x10 raised to the -113 which is approximately
zero.  Thus, the interpretation is that in the 1st study the chance of
opersving the case is p=0.001445. If we assume that the chance of
observing the case in the new study remains the same, then the
probability of observing 122 cases in 39,676 person-years would be
virtually zero. But the fact is that we have observed 122 cases in the
new study. Logically, this means that the chance of observing cases in
the new study is significantly better than in the old study. In other
words, we can conclude that the cases have increased significantly. 
To further argue about the significance of the new study, we can
formulate the following statistical hypotheses:
Null Hypothesis:  There is no difference between the two studies in
terms fo the chance of observing cases, i.e., H0:  p=0.00145
Alternative Hypothesis:  The chance of observing cases is greater in
the new study than in the old study, i.e., H1:  p>0.00145.

Let X be the random variable that follows a binomial distribution with
39676 trials and the probability of success is some p (0<p<1). Under
the null hypothesis, p=0.001445.

P-value=P(X>=122)+P((X-np)/sqrt(np(1-p))>=(122-39676*0.001445)/sqrt(39676*0.001445(1-0.001445))
= P(Z>=4.84691) which is approx. 0.0000.

Where Z follows the standard normal distribution. We have applied the
Central Limit Theorem for the computation above. Since p-value is
extremely small, we can reject the null hypothesis by concluding that
p>0.00144. That is to say, we can conclude that the chance of
observing cases is greater inthe new study than in the old study.

This is essentially what I was looking for which is not much different
from the original problem statement.
Subject: Re: Epidemiological Statistics
From: answerguru-ga on 02 Jan 2003 13:38 PST
 
I agree that the formulation you provided in your last comment is
accurate, though it is worth noting that your specific study is an
exception within the field of epidemiology...the details necessary to
undertake a meaningful analysis came later. Any statistician would
follow a similar protocol - a full understanding of the problem is
always needed before a statistical method can be applied.

answerguru-ga
Subject: Re: Epidemiological Statistics
From: garyresearcher-ga on 02 Jan 2003 14:01 PST
 
I am updating my rating from 1 Start to 5 Stars.  Lastly, does anyone
have a computer program listing of the software to do such similar
calculations as the one I provided above? Thanks again.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy