Hi garyresearcher-ga,
The short answer to your question is that the method you have used to
compare rates based on two different studies is accurate under
certain, perhaps unrealistic, conditions.
I came across an excellent resource that gives a great outline of
epidemiology and its basic practices for measurement. These come in
the form the a PowerPoint slideshow but the Google-generated HTML
version is also available:
PowerPoint version:
http://www.iihe.org/education/lectures/epi1.ppt
HTML version:
://www.google.ca/search?q=cache:tF4Dzl1lDZMC:www.iihe.org/education/lectures/epi1.ppt+Epidemiology+definition&hl=en&ie=UTF-8
The first piece of key information concerns the definition of
epidemiology (slide 2):
"Study of the distribution and determinant of diseases and injuries in
human populations ":
- Concerned with frequencies and types of injuries and illness in
groups of people
- Concerned with factors that influence the distribution of illness
and injuries
Next, it is important to consider the fundamental assumptions
associated with epidemiology (slide 4):
1. Disease doesnt occur at random
2. Disease has causal and preventive factors
3. Disease is not randomly distributed throughout a population
With the information from both the problem statement and the facts
above, the following conclusions can be made regarding the accuracy of
a method attempting to compare rates based on two different studies
(in addition to the 100% ascertainment assumption):
1. The two groups being considered in each of the studies must be
judged as being "similar" demographically and perhaps geographically
if it is relevant to the disease being considered.
2. The time differential between when the two studies were conducted
must not affect the data that has been collected. For example, if the
two groups used are not mutually exclusive, then it is reasonable to
believe that the drastic increase could be a result of the disease
spreading. This is not desireable, since we are clearly comparing the
rates on the basis of the methodologies used in each study.
3. The non-random nature of disease is a threat to any statistical
method that assumes randomness in some way form during the course of
analysis.
On the basis of these conclusions, a "perfect scenario" for obtaining
this type of data would be to perform both methods at the same time on
the same group (this will satisfy the first two conclusions). Next,
calculating the incidence ratio would be fine since it makes no
assumptions as to where in the data set each case was likely to occur.
In order to calculate the confidence intervals, a specific
distribution type (such as the normal distribution) must be specified
along with a value for standard deviation. This appears to have been
done in the question, though this information seems to have been left
out of the problem statement.
If you have any problems understanding the information above, please
feel free to post a clarification and I will respond to it in a timely
manner.
Cheers!
answerguru-ga |
Request for Answer Clarification by
garyresearcher-ga
on
30 Dec 2002 07:55 PST
When two different methodologies are utilized in epidemiology,
generally the comparison that I attempt to make is considered invalid.
Also, due to the rare events given (and associated very low
probabilities involved), it is standard to practice to use the Poisson
distribution. A response that demonstrates experience beyond
elementary epidemiology would recognize the above factors as a given.
Therefore, in view of the desire to use two different studies, and the
fact that there is likely an increased incidence of disease, what does
it take to convince an audience with vast epidemiological experience
that a true increase is indeed taking place?
|
Request for Answer Clarification by
garyresearcher-ga
on
30 Dec 2002 07:57 PST
When two different methodologies are utilized in epidemiology,
generally the comparison that I attempt to make is considered invalid
unless the initial study can be used as a surrogate for basline
incidence rate. Also, due to the rare events given (and associated
very low probabilities involved), it is standard to practice to use
the Poisson distribution. A response that demonstrates experience
beyond elementary epidemiological principles would recognize the above
factors as a given. Therefore, in view of the desire to use two
different studies, and the fact that there is likely an increased
incidence of disease, what does it take to convince an audience with
vast epidemiological experience that a true increase is indeed taking
place?
|
Request for Answer Clarification by
garyresearcher-ga
on
30 Dec 2002 07:58 PST
Please see comments above.
|
Clarification of Answer by
answerguru-ga
on
30 Dec 2002 09:16 PST
Hi again,
Your follow-up question is a good one, but very difficult to answer
definitively when you are dealing with an analysis does not provide
proof directly (ie. the data needs to be modified prior to analysis).
In any case, here is a suggestion of what you can attempt to do:
Simulate the second methodology using the group from the first study
(or vice versa). This would involve using some sort of modeling tool
(MS Excel will suffice) whereby you emulate a methodology with
predefined data. Alternatively (if data from both studies are
unavailable), you can always use data from a third group and simulate
the result with both methodologies. I'm not certain if this is
possible in your case but definitely a valid technique if you have
access to the group's data.
The goal, of course, is to show that the differences between the two
methods are statistically insignificant. This will then lead to the
implication that there actually has been a rise in the spread of this
disease.
Statistically speaking, I think what you are attempting to achieve is
likely to receive some criticism regardless of the technique you
employ. This type of scenario makes it very difficult to make a sound
statistical argument, and so I would not be surprised if you hear from
a few "nay-sayers". A water-tight argument would need comparability,
which by the nature of your problem, there doesn't seem to be here.
However, if you can obtain more studies that use each of the two
methods, you can use the (in)consistent ratios to show that there has
or hasn't been a true increase.
Let me know if there is anything further that I can do for you :)
answerguru-ga
Google Answers Researcher
|
Request for Answer Clarification by
garyresearcher-ga
on
30 Dec 2002 15:33 PST
What I am specifically looking for is the following: Given the
probability of occurrence of disease in the initial study, what would
the probability of obtaining the disease incidence in the second
study. By use of conditional probabilities, it should be possible to
show how rare a two to three-fold increase in disease incidence really
is.
|
Clarification of Answer by
answerguru-ga
on
30 Dec 2002 16:16 PST
While this is an interesting (yet quite different) approach in
comparison what we were discussing earlier, it is not statistically
possible to combine conditional probabilities with a
distribution-related problem such as the one being considered. The
reason for this is that conditional probabilities are of the form:
P(A | B) = the probability of A given B
These types of probabilities are such that the entire set of objects
being considered is mutually exclusive and collectively exhaustive.
For instance, the union of A and its complement must contain all
objects being considered (ie. each object must be in exactly one set).
However, if you have taken any introductory statistics courses you
will recall that an ratio comparison such as the one you are
describing follows a specific distribution whereby a more radical
increase is only minutely possible whereas a slight increase is fairly
reasonable to expect.
So how can two arbitrary ratio values be placed into a conditional
probability model? I hope that you can see from the above that these
are two completely seperate and unrelated statistical methods, and
therefore it is impossible to come up with an "integrated solution"
such as the one you are seeking.
Statistics can be used as a powerful tool in solving a wide array of
problems, however you must appreciate two drawbacks of statistical
analysis:
1. Real-life problems often only fit into one statistical model
2. Assumptions must be made when analyzing less-than-perfect scenarios
My own recommendation is that you follow what I have suggested in
prior correspondences and accept that the situation you are analyzing
does not perfectly fit a statistical model. The worst thing you can do
in this can is try to force an inappropriate method of analysis on the
problem. Don't be frustrated - your analysis will more highly regarded
if you do the best you can with statistical tools and then identify
the shortcomings of the results.
answerguru-ga
|
The incidence rate of HZ is computed only among those individuals that
have previously had varicella (chickenpox). While varicella is
contagious, HZ is not contagious. Individuals in the community that
have had chickenpox receive a boost when exposed to a child with
chickenpox; however, those with chickenpox are most infectious 2 to 3
days before the rash breaks out. Therefore, the individuals that
provide the exogenous boost or exposure are generally unknown.
There is a 2nd mechanism that also provides a boost in immunity--it is
called asymptomatic edogenous reactivation--this is what limits the
occurrence of HZ to about 500/100,000 person-years in the community,
even if there is no boosting from exogenous exposures. Therefore, it
is the overall effect of varicella in the community that plays a role
in providing exogenous boosting in adults, and the overall effect of
varicella in schools that influence the reactivation of herpes zoster
in children.
While there is a correlation between age and chickenpox (i.e.,
children usually have onset of checkenpox in Grade K or 1st grade
(first exposure in school), or at a younger age due to exposure in
pre-school. Herpes zoster, however, among those that have a prior
history of chickenpox is due to a decline in cell-mediated immunity
which is an individual body process that is accelerated in the absense
of varicella (chickenpox) disease in the school or community. I have
as yet found no increase in the 10 to 19 year olds (who have a more
mature immune system and have sufficient CMI to suppress reactivation
of HZ at this time). Children aged <10 years, however, with immature
immune systems received repeated exogenous boosts due to exposure to
varicella in the community.
In view of the above explanation, "the external factor chickenpox" is
reduced throughout the entire community with universal vaccination of
children. Children previously had more exogenous boosts in the school
environment then say 30-40 year-olds who received only occassional
exposures when participating in activities with their children or when
shopping and coming into close proximity to other children that were
to have onset of varicella (but have not as yet broke out), so they
would not even recognize such a contact.
I am honestly trying to comprehend these assumptions and applicability
of statistical analysis myself and do not wish to offend anyone and am
trying to fully understand the scope of others comments. In view of
the above explaination, since I have stratified my analysis to a
specific age group and am not comparing across other age groups, I
feel I have not invalidated the statistical approach. All children
presently receive less boosts due to fewer varicella cases in the
community since the introduction of varicella vaccine. There are
actually 70-80% fewer cases of chickenpox today then there were in
1995. Those children that remain that have had natural (wild-type)
disease are the ones that are affected by the reduction of exogenous
boosts they previously received from other children in the community
that had natural chickenpox.
In view of the above, could you please site where the assumptions
could still be in error. The issue has major consequences since
instead of a cost-benefit savings, I have computed a U.S. annual cost
of $90 million due to increases in morbidity and mortality of HZ
disease in adults for the next 30 years, rather than the $80 million
in medical savings due to universal varicella vaccination.
These figures assume varicella vaccination are 100% successful in
eliminating varicella disease; but this assumption was based on the
premise that there was no immunologically-mediated link between
varicella incidence prevalence and HZ incidence. |
My only comment would be, that some statistical expert out there
should have taken my original data, and onsidered a binomial trail
with n=39676 and probability of success p=0.001445 and computed that
the probability of observing 122 cases would be (combination 39676 and
122)x0.001445 raised to the 122 power x 9.999855 raisted to the 39,554
power which is approx. equal to e (2.718...) raised to the negative
5.75302 power x 5.75302 raised to the 122 power, divided by 122
factorial = 1.6377738710x10 raised to the -113 which is approximately
zero. Thus, the interpretation is that in the 1st study the chance of
opersving the case is p=0.001445. If we assume that the chance of
observing the case in the new study remains the same, then the
probability of observing 122 cases in 39,676 person-years would be
virtually zero. But the fact is that we have observed 122 cases in the
new study. Logically, this means that the chance of observing cases in
the new study is significantly better than in the old study. In other
words, we can conclude that the cases have increased significantly.
To further argue about the significance of the new study, we can
formulate the following statistical hypotheses:
Null Hypothesis: There is no difference between the two studies in
terms fo the chance of observing cases, i.e., H0: p=0.00145
Alternative Hypothesis: The chance of observing cases is greater in
the new study than in the old study, i.e., H1: p>0.00145.
Let X be the random variable that follows a binomial distribution with
39676 trials and the probability of success is some p (0<p<1). Under
the null hypothesis, p=0.001445.
P-value=P(X>=122)+P((X-np)/sqrt(np(1-p))>=(122-39676*0.001445)/sqrt(39676*0.001445(1-0.001445))
= P(Z>=4.84691) which is approx. 0.0000.
Where Z follows the standard normal distribution. We have applied the
Central Limit Theorem for the computation above. Since p-value is
extremely small, we can reject the null hypothesis by concluding that
p>0.00144. That is to say, we can conclude that the chance of
observing cases is greater inthe new study than in the old study.
This is essentially what I was looking for which is not much different
from the original problem statement. |