Google Answers Logo
View Question
 
Q: Liner regression? Least square? Simultaneous equations? ( No Answer,   17 Comments )
Question  
Subject: Liner regression? Least square? Simultaneous equations?
Category: Science > Math
Asked by: xm-ga
List Price: $10.00
Posted: 15 Nov 2004 22:13 PST
Expires: 15 Dec 2004 22:13 PST
Question ID: 429586
I sell 6 different items to my clients. Each client chooses when to
pay, and they choose how much they want to pay. I divide my data up
into days, so for each day I have a set of data that tells me how many
of each item my client bought (let's call the items A-F), and how much
the client paid that day (which I'll call P).

Since my clients only pay me for the sum of all the products they
purchase, and they pay me irregular amounts at irregular times, I
don't know what the value of each of my items A-F is, and that's what
I want to find out.

This is complicated by the fact that some clients may buy nothing of
one or two products. Some clients may buy a lot of A and a little D
and pay a lot, some clients may buy a lot of B and C and pay a little,
etc...

So what I'd like to know how to do is calculate what I'm calling the
'fair market value' of each of my items A-F. And (I think) I'd like to
be able to do this over a specific period of time.

I'm not a strong mathemetician, but I'm a strong computer programmer.
I'm looking to make sure that I use the correct math to calculate
this, and I'm looking to find out as easily as possible how to
implement it in a computer program. So what I'm looking for is:

1) What mathematical model/equation is the solution to this problem,
and why is it the correct solution? If there is not one correct
solution, what are the available solutions and how may I decide which
one is best for my needs

2) How can I implement the mathematical solution to the problem in a
programming language. (Any programming language can be used as an
example, although I'll be implementing this in C. If libraries are a
fast way to do this then it's fine to include an example using a
library.)
Answer  
There is no answer at this time.

Comments  
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: frde-ga on 16 Nov 2004 05:28 PST
 
More info on unit costs is required.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 16 Nov 2004 06:36 PST
 
Please clarify your comment. What other information is there other
than the number that is the cost of the item?
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: frde-ga on 17 Nov 2004 02:14 PST
 
You say nothing about the /cost/ of each item.
ie: what each item costs /you/ to supply

You also talk about the 'value' and 'fair market value' when in fact
you mean the selling price.

I suspect there is quite a lot more to the story than you have told us so far, 
if it is what I think, then it makes a real difference to the way one
looks at things.
I have a suspicion that this is like the costing and pricing of
airline seats over a season.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 17 Nov 2004 09:23 PST
 
My cost of each item is not relevant (and not included) in the
original problem (as it stands at the moment. If I've made an error in
my planning to date and need to take into account my costs, I'll do so
in a future Question).

The terms I use are also not relevant to the problem.

I can possibly rephrase this in more simple mathematical terms.

If I sum the past 30 days history for each client, I will have the
total amount of each product A-E that they've purchased, and a total
amount P that they've decided to pay over that period.

If I use a lower case 'a' for the quantity of each product purchased,
and upper case 'A' for the unit cost of each, then this will give me
an equation like the following for each customer:

P=aA+bB+cC+dD+eE

I.e., lots of lines of data like:

Customer 1: 10=1A+2B+0C+4D+10E
Customer 2: 50=5A+10B+1C+3D+9E
...

So I'd like to calculate what each A, B, C, D, E and F is worth to my
whole customer group.

I'd be happy to discuss further the background to this question, but
it's outside the scope of this Google Answer request, which is
summarized in this comment.

If anyone needs any further information to solve the above problem, please ask.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: bcrounse-ga on 17 Nov 2004 13:35 PST
 
You really don't need to write any code to solve this.  This is a
basic linear algebra problem; please don't re-invent the wheel!

Use a freely available program like Octave (http://www.octave.org/)
that can solve matrix equations of the form Ax=b, A is a 2-D matrix
(more on that in a second), x is a vector of unknowns (which in your
case consists of Pa, Pb, Pc... Pf, the price of each product), and b
is a vector of the total amount paid by each client in a certain time
period (to follow you example, this would be ($10, $50, etc.)

The matrix A is of the form 

Qa1 Qb1 ... Qe1
Qa2 Qb2 ... Qe2

where Qa1 is the quantity of product A purchased by client #1 in a
certain time frame.

You can get an exact answer for the price of each product if your
matrix has 6 rows.  These rows may represent one month of purchases
for 6 different clients, 6 different months (or weeks, or days) for
the same client, or some combination of the two.  The only danger is
if 2 of the rows are not linearly independent: if one row of A is just
a combination of multiples of other rows, you are hosed.  But that's
easy to check for with any linear algebra code.

If you use more than 6 rows, that's great, but the result of solving
the matrix will essentially be a linear regression- you'll get 'best
fit' prices.  This is probably a better thing to do, b/c you can also
get a sense of how much scatter there is in the price across products
and clients.

Heck, you can do this in Excel:

Starting in cell A1, type this:

(blank)	Qa	Qb	Qc	Qd	Qe	Qf	Rev
C1	1	1	1	1	1	1	3000
C2	0	0	0	1	0	1	1000
C3	0	0	1	0	1	0	1000
C4	0	0	2	1	0	0	1000
C5	0	1	0	1	0	0	1000
C6	2	0	0	0	1	0	1000

where the columns (Qa - Qf) are the quantity of each product purchased
by customer 1 in a month (or week, or whatever), etc.  Rev is your
total revenue for each customer.

Then, in a cell below that, type:

=LINEST($H$2:$H$7,$B$2:$G$7,FALSE,TRUE)

Actually, you want to apply that equation to a whole range to get each
price, the r-squared stat (should be zero in this case), etc.  Just
read the excel documentation on LINEST to figure this one out.  I can
send you a spreadsheet, need be.

I only suggest Excel b/c I am most familiar with it.  But it's not a
real stats/linear algebra program, like Octave/Matlab or R/S-Plus.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 17 Nov 2004 15:54 PST
 
bcrounse-ga, thanks for that.

(I do need to write code for this, because eventually it will be
integrated into a code component, however for now Excel is fine to get
the theory correct)

I entered your example in Excel, and got the LINEST function seemingly
returning a value for each of the columns Qa-Qf as per your example
and the Excel manual.

I also added an additional column which used the output from the
LINEST function and the input paramaters for each row to check what
the LINEST function thought the total price would be.

But, problems:

1) In your example figures, my checking column was pretty far out
2) When I put in some potential 'real life' figures, my checking
column was WAYYY out (e.g., in the price column I had 15,5,6,3,15,10 ,
and the check calculation (Qa * LINEST for Qa column + Qb * LINEST for
Qb column...) returned 403,52,67,185,782,95 . Which was interesting
because the row that returned 782 had used __approximately__ twice as
many items as the row that returned 403, even though they'd both paid
15.
3) There were negative results returned for the prices on some columns.

In the manual there was information to how to get at the standard
error, etc. The value r2 was defined as 'If it is 1, there is a
perfect correlation in the sample', and it was 1.

So what's going on here that I'm not understanding. Also could you
clarify your comment on there being some types of data that won't work
with this method. It's likely in my data set that a fair proportion of
my customers will purchase similar ratios of some products.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 17 Nov 2004 16:57 PST
 
Also with the Excel LINEST method, if I choose to measure item A in
bags of 10 instead of single units, it affects the outcome, when it
shouldn't, as this implies that the way I measure item A is relative
somehow to the rest of the items, when you can't measure item A in
terms of item B.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: frde-ga on 18 Nov 2004 00:33 PST
 
My questions about 'cost of production' and tightening up on
terminology were an attempt to get an idea of what you want to /do/
with the data.

As you can see :
    2A + 3B = 10    has many multiple solutions
And it gets odder when also :
    2A + 3B = 5     - you said your customers choose what they want to pay

Of course you can 'solve' a load of equations, but the chances are
that there will be multiple solutions - something I think you have
already realized.

You could start 'making assumptions' like:  A = B * 1.1
But all you will get is the result of your assumptions
- a rather circular and pointless exercize.

It might be interesting to find the combinations that give you the
highest revenue - and to do some trend spotting on that
Similarly just looking at the total sales of each item.

From what you later said, it looks as if you are trying to find the
revealed preferences of your individual customers. You may be able to
infer that from simply looking at the data.

I really do not think that there is a 'mathematical solution' to this
problem, my suspicion is that you need to make up a theory and then
test it out.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 18 Nov 2004 16:44 PST
 
Despite what others have written, your's is not a simple linear
algebra or least-squares problem.  The complication arises because you
presumably want to impose constraints on the possible solutions such
that the "fair market values" you are seeking are greater than zero. 
As you have found out, an unconstrained model can give rise to
negative coefficients, which would imply that your customers place a
negative value on some of your products and that you should be paying
them to take them off your hands!  Incorporation of constraints like
A>=0 make this a quadratic programming problem, which is a very
important class of optimization problems that is the subject of active
research.  (The least squares optimization problem is actually a
special case of a quadratic programming problem that can be dealt with
because of some unique properties of the problem.  Because of this, it
is quite easily solved.  The constraints in your problem make this a
harder problem.)  For your purposes, however, you can probably get by
without enlisting the full arsenal of techniques that have been
developed to attack such problems.

Your model assumes that the money each customer sends you each month
is a linear function of the number of each item they bought that
month.   I assume that you have N products, and M customers, where M >
N (preferably *much* greater).  In a given time period, each customer
buys p_j of product j (j = 1 to N), and pays you D_i (i = 1 to M)
dollars.

     D_i = c_1*p_1 + c_2*p_2 + c_3*p_3 + .......c_N*p_N  (i = 1 to M)


You want to find the coefficients c_j ( j = 1 to N) that minimize:

     X^2 = Sum from i = 1 to M {Sum j = 1 to N    (c_j*p_j - D_j)^2) }

At this point this is the standard least squares, or multiple linear
regression model, with a linear constraint that the constant term be
equal to zero -- i.e., that somsone who buys nothing from you in a
given period will pay you nothing that period.  Your problem actually
involves N additional constraints:

     c_j >=0 (j = 1 to N)


You can actually use Excel to solve this problem, if you have the
"Solver" add-in installed in Excel.  The "Solver" add-in can be used
to solve the minimization problem given above, subject to a variety of
user-defined constraints, including inequality or "bounds" constraints
such as those in this problem.

If, as you state in one of your comments, you want to incorporate an
algorithm into a stand-alone code, then I suggest you use the
Nelder-Mead simplex method to solve the optimization problem.  Do a
Google search for 'Nelder Mead' and you'll get lots of hits.  This
method is very easy to program, and it naturally lends itself to the
inclusion of inequality constraints in the problem.  There are many
published examples of code for this method (some C and FORTRAN
examples are given at
http://home.earthlink.net/~mfhutt/neldermead.htm, and books like
"Numerical Recipies (see http://www.nr.com/) have code you can use or
purchase on-line).

Something to be aware of is that if you use the raw item counts and
total dollar amounts for each customer, customers who buy a lot from
you (i.e., those with large D_j's) will have a larger "weight" in
determining the value you assign to each product.  This may or may not
be appropriate, depending on your philosophy.  You can force each
customer to have an equal weight by "normalizing" your data before you
do the optimization analysis.  This would simply entail dividing each
term in the first equation above by D_i:

     1 = c_1*p_1/D_i + c_2*p_2/D_i + c_3*p_3/D_i + .......c_N*p_N/D_i  (i = 1 to M)

and the resulting optimization problem involves minimizing:

     X^2 = Sum from i = 1 to M {Sum j = 1 to N    (c_j*p_j/D_i - 1)^2) }

You can also do the normalization by dividing each term in each
customer's equation by the total number of products (i.e., the sum of
the p_j's for each customer).

Each of these three alternatives will give different results.

To see how this affects your results, let's consider a really simple,
and somewhat exaggerated, case of your problem.  Say that you only
make one product and you have 3 customers.  Two of customers buy just
one copy of your product and pay you $1 each.  The third customer buys
10 copies of your item and pays you $100.  Obviously, the third
customer values the item at $10/copy while the other two only value it
at $1/copy.  What would you say would be the "best" estimate of the
worth your entire customer base places on the item?  If you use the
raw numbers, the least squares solution to the "value" of the item is
$9.82/item (rounded to the nearest cent).  The "answer" is heavily
weighted by the high-volume buyer.  If you normalize the problem by
dividing the third customer's equation by $100 (the amount you
received from them), the the least squares solution to the "value" is
$1.04/item.   Finally, if, you "normalize" the problem by dividing the
third customer's equation by 10 (the number of items bought), then the
least squares solution to the "value" is $4.00/item (equal to the
simple average of the individual customer's valuations).  Which of
these is the correct?  Any of them could be, depending on how you want
to use the results.

All this having been said, I have a number of reservations and
cautions about what you are trying to do.  First off is the fact that
your customers may all have very different "value systems", and assign
quite different values to each of your products.  In this case, I
don't see what useful conclusions you will be able to draw from this
sort of analysis, other than some sense of the "average" value of the
product assigned by your set of customers.  This could easily be the
reason for the large values of your "check" column (values that a
mathematician would call "residuals") in the calculation you did.

In addition, it's not clear to me that the amount of money you receive
from each customer should be modeled as a linear function of the
number of each product they purchase.  If I'm a customer, and I buy
100 of an item, I may think I should pay less for each item than if I
only bought 1 of that item (i.e., I might think I should be entitled
to a volume discount).  In that case, the value I assign to an item is
a function of the number of items purchased, which would mean the
coefficients you are solving for (the c_j's) are a function of the
independent variables (the p_j's), and your model (which assumes the
c_j's are independent of the p_j's) is incorrect.  Your model also
assumes that your customer's "value systems" don't vary over time, yet
there is a good possibility that they will, either for external
reasons (the price charged by your competitors for similar products is
going up), or internal reasons (a customer might be short of cash one
month, and decide to pay you less than they might otherwise).  In
fact, I find your pricing strategy, in which your customers decide how
much to pay, a rather strange one overall!
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: bcrounse-ga on 19 Nov 2004 10:50 PST
 
Two things:

I agree with hfshaw-ga's comments regarding constraints.  I was
naively assuming that the prices would end up being positive- although
I'd argue if some prices are coming back negative, that's pretty darn
interesting.

Second, if you are using an n by n matrix (6 by 6 in your case), the R
value will always be 1, if the matrix is not indeterminant.  That's
b/c you are solving the equations exactly. It's when the matrix has
more than 6 rows that R can be less than 1.

Please note that linest returns the coefficients in reverse order,
going left to right.  So the price indicated in column 1 is actually
the price for column 6, and so forth.  Don't ask me why they chose to
do it this way.

If you are using a 6x6 matric, the predicted amount paid for each
customer should be exactly what they did pay.  Again, we are solving
the equations exactly in this case.

If prices come back negative, then either customers hate that product
(but buy it anyway?), or the perceived value of products varies enough
between customers that you need to use more data and solve the
(constrained) optimization problem.  If you have data from the same
customer for 6 days/weeks/months/years, you may want to see if you get
any negative prices (longer time periods would be better).

Also, I tend to agree with the rest of hfshaw's comments- esp., why do
you price this way?? :)
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 19 Nov 2004 18:42 PST
 
hfshaw - thanks very much for your detailed comment! I had a go with
the Solver add-in in Excel, it did not find a solution when I set that
the target value must be 0, but did find a reasonably acceptable
solution when I chose to minimise the target value. I set it up like
this:

- Set 6 cells for the prices of my products, and initiated them with vague guesses
- Columns B-G for each row were the usage of 6 items
- Column H on corresponding row was the price paid
- Column I was the price according to usage times the 6 cells with
current price settings
- Columm J was the error (i.e. I-H)
- And because the Solver needed a single value to work on, I used the
STDDEV function calculate the standard deviation - thinking the lower
this number was the better, with 0 an ideal value.

I then ran solver asking it to change the 6 cells for the values of
the product, and to minimise the final STDDEV calculation cell.

Is this the correct way to run the Solver add-in?

It came back with some reasonably acceptable answers, and set prices
to 0 for two items that I personally perceive as least valuable but
are not necessarily, however was still pretty far off target. With
it's item prices the total price paid was initially half what the
test-set-data total price paid was. I re-did a little of my test data
as it had got a bit mangled with constant fiddling and Solver came
into a line a bit, but was still 20% off when checked. I presume this
is related to the small data set (I was testing with ~10 customers),
and the constraints messing with the output (see following comment
relating to Excel LINEST, which I don't know how to implement
constraints with, but gives a more accurate answer numerically,
although possibly not a 'better' answer).

With the Nelder Mead, I vaguely understand the code, but I'm not sure
how to translate the problem into the input for the functions that are
available - they all seem to only have one input for data? E.g. from
the page you suggested the function prototype is:

double simplex(double (*func)(double[]), double start[],int n, double
EPSILON, double scale

As far as I can tell start[] being all the data? and n being the
number of items? Or is func() in this code their way of getting the
test value into the code..?

Out of your examples of different ways to normalize (?) the data, the
case that I'm most interested in is the one where an average cost for
the product is returned, unweighted by the quantity purchased or
amount spent (in your example $4). I'd expect a bell-curve type range
of payments from my customers.

How can I normalize of set of data like mine by number of products
purchased, when each customer purchases different amounts of each?

The point you made that customers who purchase more expect to buy it
more cheaply is very valid and something I hadn't considered. At first
thought, I think I'd solve this by after an initial period, diving the
customers up say 3 groups with High, Medium or Low usage, on which
calculations can then independently be run.

Also with the extra information brcounse supplied about the Excel
function LINEST returning data in reverse order, the output from
LINEST now actually makes some sense (minus the negative price
values), so I'd like to know (in lay-man's terms, if this is possible)
why something like the Nelder Mead solution to this problem is more
valid.

The information supplied is definitely very useful and I'd be prepared
to accept an answer for my first question for just for your first
comment if you'd care to repost it as an answer, but in summary I now
need to also like to:

1) How do I implement this problem with the C function as supplied on
the page http://home.earthlink.net/~mfhutt/crosen.c .
2) Since I'm most interested in the case you gave with normalized
product quantities, how do I normalize product quantities in my data?
3) Why is one particular model more valid than another?

Thanks!
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 20 Nov 2004 10:49 PST
 
bcrounse - thanks for notifying me about the LINEST function returning
data in reverse order.. I'd noticed in the help page that the order of
the returned data went n,n-1,n-2, etc., but just assumed it was
something I didn't understand as it makes no sense to return data in
reverse order to me!

Anyway, with my check functions the LINEST function now returns data
that actually isn't a factor of 1000 from being correct, and is within
a reasonable margin of error. However it's not clear that this is the
best method, and the negative values make the results much less
usable.

I don't have a direct answer for questions about this, I think it's an
interesting idea, but probably in need of some refinement to be
useful.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 20 Nov 2004 18:35 PST
 
I'm looking at functions available in the GSL (GNU Scientific Library)
at this page:

http://www.gnu.org/software/gsl/manual/gsl-ref_34.html

The examples provided appear to use a function that takes constant
parameters for  the solution as well as the current vector which is
what changes through the iterations. In my case, I don't have a
'continuous' function that can give a reply for any input vector and
parameters f(x, params)= { .. }, I have a set of data where I know the
solution but not the parameters, like P=aA+bB+cC, so do I have to work
this into a function something like f(P,a,b,c)=(aA+bB+cC)-P ?

Also I have a set of rows of data I want to work through, but in
examples shown the parameters stay fixed and only the vector changes,
so presumably with each iteration I have to cycle through the
available data?

I can stumble through this, but would prefer a piece of working code
based on available GSL functions (which I'd prefer over any other code
source for various reasonsfor) for a set of say, 10 rows of data of
the style P=aA+bB+cC+dD+eE+fF (P and a-f being procided, A-F being the
unit costs I'm trying to find). If anyone can help me with this inside
the scope of this question, great, otherwise let me know and I'll post
another question for this.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 22 Nov 2004 11:16 PST
 
Sorry for the delay in getting back to you -- I was away for the weekend.

> I set it up like this:
> - Set 6 cells for the prices of my products, and initiated them
>   with vague guesses
> - Columns B-G for each row were the usage of 6 items
> - Column H on corresponding row was the price paid
> - Column I was the price according to usage times the 
>   6 cells with current price settings
> - Columm J was the error (i.e. I-H)
> - And because the Solver needed a single value to work on, 
>   I used the STDDEV function to calculate the standard 
>   deviation - thinking the lower this number was the better, 
>   with 0 an ideal value. 
>   I then ran solver asking it to change the 6 cells for 
>   the values of the roduct, and to minimise the final STDDEV 
>   calculation cell. Is this the correct way to run the Solver add-in? 

It's the right way to use SOLVER, but you are minimizing the wrong
quantity.  You are doing a constrained least-squares problem.  The
"least-squares" part of the name refers to the quantity that you are
trying to minimize, which is the sum of the squares of the differences
between the observed values of the independent variable (in your case,
the actual prices paid, which you have in colume H), and the
"predicted" values (the values in your column I).  You need to add a
new column (or simply modify column J) so that you calculate the
*square* of the values in column J.  You then add these up, and
minimize that sum.  That is the essence of doing a least squares fit.

> I'm looking at functions available in the GSL (GNU Scientific Library) at
> this page: <http://www.gnu.org/software/gsl/manual/gsl-ref_34.html>
> The examples provided appear to use a function that takes constant 
> parameters for the solution as well as the current vector which is what
> changes through the iterations. In my case, I don't have a 'continuous' 
> function that can give a reply for any input vector and parameters 
> f(x, params)= { .. }, I have a set of data where I know the solution 
> but not the parameters, like P=aA+bB+cC, so do I have to work 
> this into a function something like f(P,a,b,c)=(aA+bB+cC)-P ? 

Again, you simply need to recognize that you are trying to minimize
the sum of the squares of the deviations between the actual and the
predicted "costs paid".  Going back to my original comment, if you
have N products, and M customers, where M > N (preferably *much*
greater), and  in a given time period, each customer buys p_j of
product j (j = 1 to N), and pays you D_i (i = 1 to M) dollars.

      D_i = c_1*p_1 + c_2*p_2 + c_3*p_3 + .......c_N*p_N (i = 1 to M) 

You want to find the coefficients c_j ( j = 1 to N) that minimize: 

     X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) } 

To use any of the "canned" optimization routines, you will need to
write a function or subroutine that calculates and returns the value
of X^2 for a given input (the next "guess") of the unknown parameters,
c_j.  X^2 is the continuous function you are looking for.

I suggested the simplex method only because it is easy to code in
constraints like "all c_j >=0".  One does this by simply setting c_j =
0 for any candidate solutions that have c_j < 0.  This page may help
you visualize exactly what's going on in the simplex search:
http://www.grabitech.com/algorithm.htm
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: xm-ga on 22 Nov 2004 14:05 PST
 
hfshaw - After your correction, Excel Solver now returns a result only
3% away from my test price data, which is excellent! I'm testing with
only 10 customers and 6 products, and Solver is returning zero (or
rather 'zeroed') costs for 3 of the products, but their prices are
definitely less significant than the other 3, so I'm happy with zero
price on these for the time being.

> Again, you simply need to recognize that you are trying to minimize
> the sum of the squares of the deviations between the actual and the
> predicted "costs paid".

I understand this, however I don't understand in:

X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) }

what D_j is and why D_i isn't included, as from my understanding D_j
would represent the total cost paid for quantity p of item j by
customer i, which by definition of the problem we don't know.

To translate from the Excel, aren't I looking for something like:

X^2 = Sum from i = 1 to M (D_i - {Sum j = 1 to N (c_j*p_j)} )^2 

Or am I getting confused...

Would you be able to help me with the question regarding normalizing
the data on quantity, which you alluded to earlier, as it's important
for this not to be skewed by people who happen to purchase more.

Thanks again
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 22 Nov 2004 15:06 PST
 
> I understand this, however I don't understand in:
> X^2 = Sum from i = 1 to M {Sum j = 1 to N (c_j*p_j - D_j)^2) }

> what D_j is and why D_i isn't included, as from my understanding D_j
> would represent the total cost paid for quantity p of item j by
> customer i, which by definition of the problem we don't know.

Sorry, I was sloppy with my subscripts and parentheses.  Your
restatement is correct.  To be more correct, we should use a double
subscript on the "p" parameters:

N  = # products, 
M = # customers, 
p_ij  = amount of product j purchased by customer i
D_i = cash paid you by customer i
c_j = unknown parameters (value/item for each of N products)

you want to minimize:

  x^2 = Sum i = 1 to M of  {  ( [Sum j = 1 to N ( c_j*p_ij)] - D_i )^2  }

> Would you be able to help me with the question regarding normalizing
> the data on quantity, which you alluded to earlier, as it's important
> for this not to be skewed by people who happen to purchase more...

Normalize the data for each customer by the total number of items that
customer bought.  That is, divide the dollar amount the customer paid
paid you by the total number of items he/she bought, and divide each
of the individual item counts for that customer by the total bought by
that customer.  Do the same for each customer.  Use these normalized
values instead of the "raw" values in your calculations.  The
"normalized item counts" (the P_ij's) can now be interpreted as the
fraction of the customer's order made up of each item, and each order
will have equal weight (as opposed to using the un-normalized data, in
which would result in a weighting based on the number of items
purchased).

For example, suppose you have 3 items, and a customer bought 2 of item
A, 3 of item B and 5 of item C, and paid you $100.  The customer
bought a total of 10 items.  Divide 2, 3, 5, and 100 by 10.  The
normalized order would then be a $10 order in which 20% (0.2) of the
items were A, 30% (0.3) were B, and 50% (0.5) were C.
Subject: Re: Liner regression? Least square? Simultaneous equations?
From: hfshaw-ga on 22 Nov 2004 21:22 PST
 
Just one more suggestion...

Once you get your Excel spreadsheet and data set up correctly, it'd be
a good idea to try solving the problem using a couple of different
initial "guesses" for the unknown parameters just to make sure the
algorithm converges to the same (or very close to the same) answers.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy