|
|
Subject:
Liner regression? Least square? Simultaneous equations?
Category: Science > Math Asked by: xm-ga List Price: $10.00 |
Posted:
15 Nov 2004 22:13 PST
Expires: 15 Dec 2004 22:13 PST Question ID: 429586 |
I sell 6 different items to my clients. Each client chooses when to pay, and they choose how much they want to pay. I divide my data up into days, so for each day I have a set of data that tells me how many of each item my client bought (let's call the items A-F), and how much the client paid that day (which I'll call P). Since my clients only pay me for the sum of all the products they purchase, and they pay me irregular amounts at irregular times, I don't know what the value of each of my items A-F is, and that's what I want to find out. This is complicated by the fact that some clients may buy nothing of one or two products. Some clients may buy a lot of A and a little D and pay a lot, some clients may buy a lot of B and C and pay a little, etc... So what I'd like to know how to do is calculate what I'm calling the 'fair market value' of each of my items A-F. And (I think) I'd like to be able to do this over a specific period of time. I'm not a strong mathemetician, but I'm a strong computer programmer. I'm looking to make sure that I use the correct math to calculate this, and I'm looking to find out as easily as possible how to implement it in a computer program. So what I'm looking for is: 1) What mathematical model/equation is the solution to this problem, and why is it the correct solution? If there is not one correct solution, what are the available solutions and how may I decide which one is best for my needs 2) How can I implement the mathematical solution to the problem in a programming language. (Any programming language can be used as an example, although I'll be implementing this in C. If libraries are a fast way to do this then it's fine to include an example using a library.) |
|
There is no answer at this time. |
|
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: frde-ga on 16 Nov 2004 05:28 PST |
More info on unit costs is required. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 16 Nov 2004 06:36 PST |
Please clarify your comment. What other information is there other than the number that is the cost of the item? |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: frde-ga on 17 Nov 2004 02:14 PST |
You say nothing about the /cost/ of each item. ie: what each item costs /you/ to supply You also talk about the 'value' and 'fair market value' when in fact you mean the selling price. I suspect there is quite a lot more to the story than you have told us so far, if it is what I think, then it makes a real difference to the way one looks at things. I have a suspicion that this is like the costing and pricing of airline seats over a season. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 17 Nov 2004 09:23 PST |
My cost of each item is not relevant (and not included) in the original problem (as it stands at the moment. If I've made an error in my planning to date and need to take into account my costs, I'll do so in a future Question). The terms I use are also not relevant to the problem. I can possibly rephrase this in more simple mathematical terms. If I sum the past 30 days history for each client, I will have the total amount of each product A-E that they've purchased, and a total amount P that they've decided to pay over that period. If I use a lower case 'a' for the quantity of each product purchased, and upper case 'A' for the unit cost of each, then this will give me an equation like the following for each customer: P=aA+bB+cC+dD+eE I.e., lots of lines of data like: Customer 1: 10=1A+2B+0C+4D+10E Customer 2: 50=5A+10B+1C+3D+9E ... So I'd like to calculate what each A, B, C, D, E and F is worth to my whole customer group. I'd be happy to discuss further the background to this question, but it's outside the scope of this Google Answer request, which is summarized in this comment. If anyone needs any further information to solve the above problem, please ask. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: bcrounse-ga on 17 Nov 2004 13:35 PST |
You really don't need to write any code to solve this. This is a basic linear algebra problem; please don't re-invent the wheel! Use a freely available program like Octave (http://www.octave.org/) that can solve matrix equations of the form Ax=b, A is a 2-D matrix (more on that in a second), x is a vector of unknowns (which in your case consists of Pa, Pb, Pc... Pf, the price of each product), and b is a vector of the total amount paid by each client in a certain time period (to follow you example, this would be ($10, $50, etc.) The matrix A is of the form Qa1 Qb1 ... Qe1 Qa2 Qb2 ... Qe2 where Qa1 is the quantity of product A purchased by client #1 in a certain time frame. You can get an exact answer for the price of each product if your matrix has 6 rows. These rows may represent one month of purchases for 6 different clients, 6 different months (or weeks, or days) for the same client, or some combination of the two. The only danger is if 2 of the rows are not linearly independent: if one row of A is just a combination of multiples of other rows, you are hosed. But that's easy to check for with any linear algebra code. If you use more than 6 rows, that's great, but the result of solving the matrix will essentially be a linear regression- you'll get 'best fit' prices. This is probably a better thing to do, b/c you can also get a sense of how much scatter there is in the price across products and clients. Heck, you can do this in Excel: Starting in cell A1, type this: (blank) Qa Qb Qc Qd Qe Qf Rev C1 1 1 1 1 1 1 3000 C2 0 0 0 1 0 1 1000 C3 0 0 1 0 1 0 1000 C4 0 0 2 1 0 0 1000 C5 0 1 0 1 0 0 1000 C6 2 0 0 0 1 0 1000 where the columns (Qa - Qf) are the quantity of each product purchased by customer 1 in a month (or week, or whatever), etc. Rev is your total revenue for each customer. Then, in a cell below that, type: =LINEST($H$2:$H$7,$B$2:$G$7,FALSE,TRUE) Actually, you want to apply that equation to a whole range to get each price, the r-squared stat (should be zero in this case), etc. Just read the excel documentation on LINEST to figure this one out. I can send you a spreadsheet, need be. I only suggest Excel b/c I am most familiar with it. But it's not a real stats/linear algebra program, like Octave/Matlab or R/S-Plus. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 17 Nov 2004 15:54 PST |
bcrounse-ga, thanks for that. (I do need to write code for this, because eventually it will be integrated into a code component, however for now Excel is fine to get the theory correct) I entered your example in Excel, and got the LINEST function seemingly returning a value for each of the columns Qa-Qf as per your example and the Excel manual. I also added an additional column which used the output from the LINEST function and the input paramaters for each row to check what the LINEST function thought the total price would be. But, problems: 1) In your example figures, my checking column was pretty far out 2) When I put in some potential 'real life' figures, my checking column was WAYYY out (e.g., in the price column I had 15,5,6,3,15,10 , and the check calculation (Qa * LINEST for Qa column + Qb * LINEST for Qb column...) returned 403,52,67,185,782,95 . Which was interesting because the row that returned 782 had used __approximately__ twice as many items as the row that returned 403, even though they'd both paid 15. 3) There were negative results returned for the prices on some columns. In the manual there was information to how to get at the standard error, etc. The value r2 was defined as 'If it is 1, there is a perfect correlation in the sample', and it was 1. So what's going on here that I'm not understanding. Also could you clarify your comment on there being some types of data that won't work with this method. It's likely in my data set that a fair proportion of my customers will purchase similar ratios of some products. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 17 Nov 2004 16:57 PST |
Also with the Excel LINEST method, if I choose to measure item A in bags of 10 instead of single units, it affects the outcome, when it shouldn't, as this implies that the way I measure item A is relative somehow to the rest of the items, when you can't measure item A in terms of item B. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: frde-ga on 18 Nov 2004 00:33 PST |
My questions about 'cost of production' and tightening up on terminology were an attempt to get an idea of what you want to /do/ with the data. As you can see : 2A + 3B = 10 has many multiple solutions And it gets odder when also : 2A + 3B = 5 - you said your customers choose what they want to pay Of course you can 'solve' a load of equations, but the chances are that there will be multiple solutions - something I think you have already realized. You could start 'making assumptions' like: A = B * 1.1 But all you will get is the result of your assumptions - a rather circular and pointless exercize. It might be interesting to find the combinations that give you the highest revenue - and to do some trend spotting on that Similarly just looking at the total sales of each item. From what you later said, it looks as if you are trying to find the revealed preferences of your individual customers. You may be able to infer that from simply looking at the data. I really do not think that there is a 'mathematical solution' to this problem, my suspicion is that you need to make up a theory and then test it out. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 18 Nov 2004 16:44 PST |
Despite what others have written, your's is not a simple linear algebra or least-squares problem. The complication arises because you presumably want to impose constraints on the possible solutions such that the "fair market values" you are seeking are greater than zero. As you have found out, an unconstrained model can give rise to negative coefficients, which would imply that your customers place a negative value on some of your products and that you should be paying them to take them off your hands! Incorporation of constraints like A>=0 make this a quadratic programming problem, which is a very important class of optimization problems that is the subject of active research. (The least squares optimization problem is actually a special case of a quadratic programming problem that can be dealt with because of some unique properties of the problem. Because of this, it is quite easily solved. The constraints in your problem make this a harder problem.) For your purposes, however, you can probably get by without enlisting the full arsenal of techniques that have been developed to attack such problems. Your model assumes that the money each customer sends you each month is a linear function of the number of each item they bought that month. I assume that you have N products, and M customers, where M > N (preferably *much* greater). In a given time period, each customer buys p_j of product j (j = 1 to N), and pays you D_i (i = 1 to M) dollars. D_i = c_1*p_1 + c_2*p_2 + c_3*p_3 + .......c_N*p_N (i = 1 to M) You want to find the coefficients c_j ( j = 1 to N) that minimize: X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) } At this point this is the standard least squares, or multiple linear regression model, with a linear constraint that the constant term be equal to zero -- i.e., that somsone who buys nothing from you in a given period will pay you nothing that period. Your problem actually involves N additional constraints: c_j >=0 (j = 1 to N) You can actually use Excel to solve this problem, if you have the "Solver" add-in installed in Excel. The "Solver" add-in can be used to solve the minimization problem given above, subject to a variety of user-defined constraints, including inequality or "bounds" constraints such as those in this problem. If, as you state in one of your comments, you want to incorporate an algorithm into a stand-alone code, then I suggest you use the Nelder-Mead simplex method to solve the optimization problem. Do a Google search for 'Nelder Mead' and you'll get lots of hits. This method is very easy to program, and it naturally lends itself to the inclusion of inequality constraints in the problem. There are many published examples of code for this method (some C and FORTRAN examples are given at http://home.earthlink.net/~mfhutt/neldermead.htm, and books like "Numerical Recipies (see http://www.nr.com/) have code you can use or purchase on-line). Something to be aware of is that if you use the raw item counts and total dollar amounts for each customer, customers who buy a lot from you (i.e., those with large D_j's) will have a larger "weight" in determining the value you assign to each product. This may or may not be appropriate, depending on your philosophy. You can force each customer to have an equal weight by "normalizing" your data before you do the optimization analysis. This would simply entail dividing each term in the first equation above by D_i: 1 = c_1*p_1/D_i + c_2*p_2/D_i + c_3*p_3/D_i + .......c_N*p_N/D_i (i = 1 to M) and the resulting optimization problem involves minimizing: X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j/D_i - 1)^2) } You can also do the normalization by dividing each term in each customer's equation by the total number of products (i.e., the sum of the p_j's for each customer). Each of these three alternatives will give different results. To see how this affects your results, let's consider a really simple, and somewhat exaggerated, case of your problem. Say that you only make one product and you have 3 customers. Two of customers buy just one copy of your product and pay you $1 each. The third customer buys 10 copies of your item and pays you $100. Obviously, the third customer values the item at $10/copy while the other two only value it at $1/copy. What would you say would be the "best" estimate of the worth your entire customer base places on the item? If you use the raw numbers, the least squares solution to the "value" of the item is $9.82/item (rounded to the nearest cent). The "answer" is heavily weighted by the high-volume buyer. If you normalize the problem by dividing the third customer's equation by $100 (the amount you received from them), the the least squares solution to the "value" is $1.04/item. Finally, if, you "normalize" the problem by dividing the third customer's equation by 10 (the number of items bought), then the least squares solution to the "value" is $4.00/item (equal to the simple average of the individual customer's valuations). Which of these is the correct? Any of them could be, depending on how you want to use the results. All this having been said, I have a number of reservations and cautions about what you are trying to do. First off is the fact that your customers may all have very different "value systems", and assign quite different values to each of your products. In this case, I don't see what useful conclusions you will be able to draw from this sort of analysis, other than some sense of the "average" value of the product assigned by your set of customers. This could easily be the reason for the large values of your "check" column (values that a mathematician would call "residuals") in the calculation you did. In addition, it's not clear to me that the amount of money you receive from each customer should be modeled as a linear function of the number of each product they purchase. If I'm a customer, and I buy 100 of an item, I may think I should pay less for each item than if I only bought 1 of that item (i.e., I might think I should be entitled to a volume discount). In that case, the value I assign to an item is a function of the number of items purchased, which would mean the coefficients you are solving for (the c_j's) are a function of the independent variables (the p_j's), and your model (which assumes the c_j's are independent of the p_j's) is incorrect. Your model also assumes that your customer's "value systems" don't vary over time, yet there is a good possibility that they will, either for external reasons (the price charged by your competitors for similar products is going up), or internal reasons (a customer might be short of cash one month, and decide to pay you less than they might otherwise). In fact, I find your pricing strategy, in which your customers decide how much to pay, a rather strange one overall! |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: bcrounse-ga on 19 Nov 2004 10:50 PST |
Two things: I agree with hfshaw-ga's comments regarding constraints. I was naively assuming that the prices would end up being positive- although I'd argue if some prices are coming back negative, that's pretty darn interesting. Second, if you are using an n by n matrix (6 by 6 in your case), the R value will always be 1, if the matrix is not indeterminant. That's b/c you are solving the equations exactly. It's when the matrix has more than 6 rows that R can be less than 1. Please note that linest returns the coefficients in reverse order, going left to right. So the price indicated in column 1 is actually the price for column 6, and so forth. Don't ask me why they chose to do it this way. If you are using a 6x6 matric, the predicted amount paid for each customer should be exactly what they did pay. Again, we are solving the equations exactly in this case. If prices come back negative, then either customers hate that product (but buy it anyway?), or the perceived value of products varies enough between customers that you need to use more data and solve the (constrained) optimization problem. If you have data from the same customer for 6 days/weeks/months/years, you may want to see if you get any negative prices (longer time periods would be better). Also, I tend to agree with the rest of hfshaw's comments- esp., why do you price this way?? :) |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 19 Nov 2004 18:42 PST |
hfshaw - thanks very much for your detailed comment! I had a go with the Solver add-in in Excel, it did not find a solution when I set that the target value must be 0, but did find a reasonably acceptable solution when I chose to minimise the target value. I set it up like this: - Set 6 cells for the prices of my products, and initiated them with vague guesses - Columns B-G for each row were the usage of 6 items - Column H on corresponding row was the price paid - Column I was the price according to usage times the 6 cells with current price settings - Columm J was the error (i.e. I-H) - And because the Solver needed a single value to work on, I used the STDDEV function calculate the standard deviation - thinking the lower this number was the better, with 0 an ideal value. I then ran solver asking it to change the 6 cells for the values of the product, and to minimise the final STDDEV calculation cell. Is this the correct way to run the Solver add-in? It came back with some reasonably acceptable answers, and set prices to 0 for two items that I personally perceive as least valuable but are not necessarily, however was still pretty far off target. With it's item prices the total price paid was initially half what the test-set-data total price paid was. I re-did a little of my test data as it had got a bit mangled with constant fiddling and Solver came into a line a bit, but was still 20% off when checked. I presume this is related to the small data set (I was testing with ~10 customers), and the constraints messing with the output (see following comment relating to Excel LINEST, which I don't know how to implement constraints with, but gives a more accurate answer numerically, although possibly not a 'better' answer). With the Nelder Mead, I vaguely understand the code, but I'm not sure how to translate the problem into the input for the functions that are available - they all seem to only have one input for data? E.g. from the page you suggested the function prototype is: double simplex(double (*func)(double[]), double start[],int n, double EPSILON, double scale As far as I can tell start[] being all the data? and n being the number of items? Or is func() in this code their way of getting the test value into the code..? Out of your examples of different ways to normalize (?) the data, the case that I'm most interested in is the one where an average cost for the product is returned, unweighted by the quantity purchased or amount spent (in your example $4). I'd expect a bell-curve type range of payments from my customers. How can I normalize of set of data like mine by number of products purchased, when each customer purchases different amounts of each? The point you made that customers who purchase more expect to buy it more cheaply is very valid and something I hadn't considered. At first thought, I think I'd solve this by after an initial period, diving the customers up say 3 groups with High, Medium or Low usage, on which calculations can then independently be run. Also with the extra information brcounse supplied about the Excel function LINEST returning data in reverse order, the output from LINEST now actually makes some sense (minus the negative price values), so I'd like to know (in lay-man's terms, if this is possible) why something like the Nelder Mead solution to this problem is more valid. The information supplied is definitely very useful and I'd be prepared to accept an answer for my first question for just for your first comment if you'd care to repost it as an answer, but in summary I now need to also like to: 1) How do I implement this problem with the C function as supplied on the page http://home.earthlink.net/~mfhutt/crosen.c . 2) Since I'm most interested in the case you gave with normalized product quantities, how do I normalize product quantities in my data? 3) Why is one particular model more valid than another? Thanks! |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 20 Nov 2004 10:49 PST |
bcrounse - thanks for notifying me about the LINEST function returning data in reverse order.. I'd noticed in the help page that the order of the returned data went n,n-1,n-2, etc., but just assumed it was something I didn't understand as it makes no sense to return data in reverse order to me! Anyway, with my check functions the LINEST function now returns data that actually isn't a factor of 1000 from being correct, and is within a reasonable margin of error. However it's not clear that this is the best method, and the negative values make the results much less usable. I don't have a direct answer for questions about this, I think it's an interesting idea, but probably in need of some refinement to be useful. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 20 Nov 2004 18:35 PST |
I'm looking at functions available in the GSL (GNU Scientific Library) at this page: http://www.gnu.org/software/gsl/manual/gsl-ref_34.html The examples provided appear to use a function that takes constant parameters for the solution as well as the current vector which is what changes through the iterations. In my case, I don't have a 'continuous' function that can give a reply for any input vector and parameters f(x, params)= { .. }, I have a set of data where I know the solution but not the parameters, like P=aA+bB+cC, so do I have to work this into a function something like f(P,a,b,c)=(aA+bB+cC)-P ? Also I have a set of rows of data I want to work through, but in examples shown the parameters stay fixed and only the vector changes, so presumably with each iteration I have to cycle through the available data? I can stumble through this, but would prefer a piece of working code based on available GSL functions (which I'd prefer over any other code source for various reasonsfor) for a set of say, 10 rows of data of the style P=aA+bB+cC+dD+eE+fF (P and a-f being procided, A-F being the unit costs I'm trying to find). If anyone can help me with this inside the scope of this question, great, otherwise let me know and I'll post another question for this. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 22 Nov 2004 11:16 PST |
Sorry for the delay in getting back to you -- I was away for the weekend. > I set it up like this: > - Set 6 cells for the prices of my products, and initiated them > with vague guesses > - Columns B-G for each row were the usage of 6 items > - Column H on corresponding row was the price paid > - Column I was the price according to usage times the > 6 cells with current price settings > - Columm J was the error (i.e. I-H) > - And because the Solver needed a single value to work on, > I used the STDDEV function to calculate the standard > deviation - thinking the lower this number was the better, > with 0 an ideal value. > I then ran solver asking it to change the 6 cells for > the values of the roduct, and to minimise the final STDDEV > calculation cell. Is this the correct way to run the Solver add-in? It's the right way to use SOLVER, but you are minimizing the wrong quantity. You are doing a constrained least-squares problem. The "least-squares" part of the name refers to the quantity that you are trying to minimize, which is the sum of the squares of the differences between the observed values of the independent variable (in your case, the actual prices paid, which you have in colume H), and the "predicted" values (the values in your column I). You need to add a new column (or simply modify column J) so that you calculate the *square* of the values in column J. You then add these up, and minimize that sum. That is the essence of doing a least squares fit. > I'm looking at functions available in the GSL (GNU Scientific Library) at > this page: <http://www.gnu.org/software/gsl/manual/gsl-ref_34.html> > The examples provided appear to use a function that takes constant > parameters for the solution as well as the current vector which is what > changes through the iterations. In my case, I don't have a 'continuous' > function that can give a reply for any input vector and parameters > f(x, params)= { .. }, I have a set of data where I know the solution > but not the parameters, like P=aA+bB+cC, so do I have to work > this into a function something like f(P,a,b,c)=(aA+bB+cC)-P ? Again, you simply need to recognize that you are trying to minimize the sum of the squares of the deviations between the actual and the predicted "costs paid". Going back to my original comment, if you have N products, and M customers, where M > N (preferably *much* greater), and in a given time period, each customer buys p_j of product j (j = 1 to N), and pays you D_i (i = 1 to M) dollars. D_i = c_1*p_1 + c_2*p_2 + c_3*p_3 + .......c_N*p_N (i = 1 to M) You want to find the coefficients c_j ( j = 1 to N) that minimize: X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) } To use any of the "canned" optimization routines, you will need to write a function or subroutine that calculates and returns the value of X^2 for a given input (the next "guess") of the unknown parameters, c_j. X^2 is the continuous function you are looking for. I suggested the simplex method only because it is easy to code in constraints like "all c_j >=0". One does this by simply setting c_j = 0 for any candidate solutions that have c_j < 0. This page may help you visualize exactly what's going on in the simplex search: http://www.grabitech.com/algorithm.htm |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 22 Nov 2004 14:05 PST |
hfshaw - After your correction, Excel Solver now returns a result only 3% away from my test price data, which is excellent! I'm testing with only 10 customers and 6 products, and Solver is returning zero (or rather 'zeroed') costs for 3 of the products, but their prices are definitely less significant than the other 3, so I'm happy with zero price on these for the time being. > Again, you simply need to recognize that you are trying to minimize > the sum of the squares of the deviations between the actual and the > predicted "costs paid". I understand this, however I don't understand in: X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) } what D_j is and why D_i isn't included, as from my understanding D_j would represent the total cost paid for quantity p of item j by customer i, which by definition of the problem we don't know. To translate from the Excel, aren't I looking for something like: X^2 = Sum from i = 1 to M (D_i - {Sum j = 1 to N (c_j*p_j)} )^2 Or am I getting confused... Would you be able to help me with the question regarding normalizing the data on quantity, which you alluded to earlier, as it's important for this not to be skewed by people who happen to purchase more. Thanks again |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 22 Nov 2004 15:06 PST |
> I understand this, however I don't understand in: > X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) } > what D_j is and why D_i isn't included, as from my understanding D_j > would represent the total cost paid for quantity p of item j by > customer i, which by definition of the problem we don't know. Sorry, I was sloppy with my subscripts and parentheses. Your restatement is correct. To be more correct, we should use a double subscript on the "p" parameters: N = # products, M = # customers, p_ij = amount of product j purchased by customer i D_i = cash paid you by customer i c_j = unknown parameters (value/item for each of N products) you want to minimize: x^2 = Sum i = 1 to M of { ( [Sum j = 1 to N ( c_j*p_ij)] - D_i )^2 } > Would you be able to help me with the question regarding normalizing > the data on quantity, which you alluded to earlier, as it's important > for this not to be skewed by people who happen to purchase more... Normalize the data for each customer by the total number of items that customer bought. That is, divide the dollar amount the customer paid paid you by the total number of items he/she bought, and divide each of the individual item counts for that customer by the total bought by that customer. Do the same for each customer. Use these normalized values instead of the "raw" values in your calculations. The "normalized item counts" (the P_ij's) can now be interpreted as the fraction of the customer's order made up of each item, and each order will have equal weight (as opposed to using the un-normalized data, in which would result in a weighting based on the number of items purchased). For example, suppose you have 3 items, and a customer bought 2 of item A, 3 of item B and 5 of item C, and paid you $100. The customer bought a total of 10 items. Divide 2, 3, 5, and 100 by 10. The normalized order would then be a $10 order in which 20% (0.2) of the items were A, 30% (0.3) were B, and 50% (0.5) were C. |
Subject:
Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 22 Nov 2004 21:22 PST |
Just one more suggestion... Once you get your Excel spreadsheet and data set up correctly, it'd be a good idea to try solving the problem using a couple of different initial "guesses" for the unknown parameters just to make sure the algorithm converges to the same (or very close to the same) answers. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |