View Question
Q: Simulation of two correlated data series ( Answered ,   0 Comments )
 Question
 Subject: Simulation of two correlated data series Category: Science > Math Asked by: asiatechnicals-ga List Price: \$14.00 Posted: 20 Nov 2002 19:13 PST Expires: 20 Dec 2002 19:13 PST Question ID: 111686
 ```I am simply trying to model two correlated data series. Please forgive the mechanical and highly-structured nature of this question. I am trying to make myself as clear as possible, so as to avoid wasting your time with endless clarifications. Anyway, it should be fun (for the more twisted of us), quick and easy to give a definitive answer. Clarifications are welcome, if needed. - AsiaTechnicals Background ========== I am building a simulation model. I wish to generate two data series in MS-Excel with a given correlation of deviation from the mean. Specifically: Both series are stationary (the Mean does not change with progression of the dataseries) Both series are psuedo-random, with white noise (uniform distribution) Series Y is partially explained by series X Where: The explanatory power of X is given by a coefficient of determination, constrained by -1
 ```Dear asiatechnicals-ga, Thank you for your exceptionally well-articulated question. I will write the answer, but not in Excel notation since I do not use Excel normally and would probably use erroneous syntax. I hope you don't mind just plain mathematical notation... ** What's wrong: As you correctly state, the observed coefficient of determination (cell D3), which I will denote by CD, is related to, but not equal to, the original "Weighting" (cell D1), which I will denote by W. The relation between them is: W^2 CD = ------------- W^2 + (1-W)^2 ( in words, in case I mess up the formatting: CD is W squared over [W squared plus (1-W) squared] ) The only thing that's "wrong" here is your belief that what you call the "weighting" is the same as the coefficient of determination; it is not. I will explain how CD is computed from W, and after that I will explain "how to fix it". I hope you don't mind a bit of algebra: X is the first sample Y is the second sample, which you define as Y = W*X + (1-W)*Z, where Z is a random sample which is independent of Z. Denote by sd(X) the standard deviation of X, sd(Y) that of Y etc. The CD is defined as the square of the "correlation coefficient" (CC). CC(X,Y) is defined as the normalized covariance between X and Y, that is, the covariance, divided by the product of the standard deviations: Cov(X,Y) CC(X,Y) = ----------- sd(X)*sd(Y) In layman's terms, the covariance measures correlation between X,Y, but you have to choose the appropriate scale, or units of measure; the correct units of measure are those of the standard deviations of X and Y. This gives you as CC(X,Y) a "pure" (=without units) measure of correlation, which is a number between -1 and 1. Now, to compute CC(X,Y), we write Cov(X,Y) = Cov(X, W*X + (1-W)*Z) = W*Cov(X,X) + (1-W)*Cov(X,Z) since Cov behaves linearly in the second argument. Now, Cov(X,X) is just the variance of X, or the square of sd(X). And Cov(X,Z) is 0 since X and Z are just independent random samples. Therefore Cov(X,Y) = W * sd(X)^2 (W times the square of sd(X)) To get CC, we have to divide by the standard deviations. Notice that sd(Y)^2 = W^2 * sd(X)^2 + (1-W)^2 * sd(Z)^2 since variance of a sum of independent samples is the sum of the variances. sd(Z)=sd(X) because they are both independent random samples chosen from the sample RAND() function, so (taking square roots in the above equation) sd(Y) = sd(X) * square root of [W^2 + (1-W)^2] Therefore Cov(X,Y) W*sd(X)^2 CC(X,Y) = ----------- = ----------------------------- = sd(X)*sd(Y) sd(X)*sd(X)*sqrt[W^2+(1-W)^2] W = ----------------- sqrt[W^2+(1-W)^2] (sqrt means square root; "^2" means squared) Taking the square of both sides of the equation gives the formula for CD that I wrote at the beginning. *** How to fix it I assume that by "fix", you mean how to change this code to generate two random samples with given CD. Well, you thought you were specifying CD by specifying the "weighting" W, but this led to a CD which was related to W by the equation I wrote above. All you need to do is, starting with a given CD, find the correct W that gives you the coefficient of determination that you want. In other words, you want to solve the equation W^2 CD = ------------- W^2 + (1-W)^2 for W, when CD is given. This can be done with a little bit of high-school algebra. I won't bother writing the calculation, (if you want me to write it please ask) but the result is CD + sqrt[CD*(1-CD)] W = -------------------- 2*CD - 1 except when CD=1/2, in which case W = 1/2. So what you should do is add two lines at the beginning of the program which are, in approximate EXCEL notation: CoefficientDetermination (Cell E1) = 80% Weighting (Cell D1) = (\$E\$1+SQRT(\$E\$1*(1-\$E\$1)))/(2*\$E\$1-1) Like I said I don't use Excel so I can't give you a worked example, but it should be easy enough to implement the changes that I suggested. If I were to try to explain in layman's terms what led to your mistake, I would say that it was the introduction of the third independent random sample Z which affected the correlation coefficient (and hence the CD) in a way that you did not expect. I can't explain it much better than that - I guess you have to stick to the formulas. For your information, I have been working on this question for about an hour.(It was 3-4 minutes to understand your mistake, then another 7-8 minutes to work out the algebra, and the rest of the time just to write the solution in a readable and friendly format...) I will leave it at your discretion to decide which rating to give me and whether you want to tip. In any case, I hope this answer has been helpful, and if you still need clarification please ask and I will do my best to help. Regards, dannidin.``` Request for Answer Clarification by asiatechnicals-ga on 21 Nov 2002 18:48 PST ```dannidin, Thank you for your excellent and very rapid response. It answered all I asked for, was easily comprehendible and you have already comfortably earned a 5-star answer rating. I will be happy to tip according to the charge rate promised: an additional US\$10 for a half hour well spent. I have not yet entered a rating and closed out the question, as I need clarification of your answer. It’s beyond the scope of my original question, but Google Answers won’t allow me to request a specific researcher. Therefore I propose to continue to pay for your time at the current rate, through the tip. If you aren’t able to respond, I’ll post your 5-star rating and US\$10 tip on Monday and submit the clarification as a new question. Clarification ============= (1) Have I misunderstood the meaning or use of CD? I understood, from elementary textbooks, that CD is the explanatory power of one data series in the other. (2) If I want to produce two paired data series, where Y was W% explained by X, should I continue to use the weighting procedure, normalise (adjust) it with your formula, or is there a more appropriate technique? (Please see question below before answering) (3) The purpose of these data series is to feed a relatively simple Monte-Carlo simulation model I am building. I need data series bound within 0> Thanks very much for getting back to me so quickly. FYI, RAND() produces a uniformly distributed psuedorandom number between 0 and 1. That's just what I need to input into NORMINV(Percentile,Mean,SD), which converts it to a normally distributed number. The weighting method does not produce an appropriate correlation, but it does keep the result within bounds. I will apply a linear scalar as you suggested, which will produce the data I need. It's a bit contrived, but it's practical and adequate for my purposes. I will close out this question now and hand over your rating and tip. Given the professional quality of the responses I have seen, I aim to offer interesting, adequately priced questions with specific objectives and scoring. If you see more of my questions posted, I hope you'll take time to look them over. Kindest regards, AsiaTechnicals``` Clarification of Answer by dannidin-ga on 24 Nov 2002 00:46 PST `thanks man ;-)`
 asiatechnicals-ga rated this answer: and gave an additional tip of: \$20.00 ```Thank you for your complete answer and rapid response to my question and clarification. I enclose a tip equivalent to an extra hour's work well spent.```