Hi,
You would be using C++ because you have the words "data warehouse" in
the question. With transactions, tight normalization, dimension
scopes, multiple hierachies and just for spice, a few hundred gigs of
data on a few machines to drill through, over head isnt' something you
have a lot of.
Your scripting languages are out, even though Perl would probably be
faster to develop with (or PHP for that matter) they would just cost
too much. Your other langauges, Pascal and so forth, get kicked out
because of lack of available talent in personel. There just aren't
that many Pascal experts who do data warehousing. Its a practical
'real-world' restraint, not an IT reason. I'm assuming we aren't
making this in a vacume. The personel at data warehouse level are C
and C++ people mainly.
Now with that in mind, you would choose C++ because of a preferance to
object, yes. Google, for example is C++, with some C in there as well
(probably, I don't really know that, but its a good guess, in areas
when C++ over head was just too much in some hashing area).
://www.google.com/programming-contest/
I guess it didn't have to be C++, but it was, and has been, for so
long that the wieght of real world resources and high level control on
overhead required for performance, its really quite difficult to get
away from at this point. A good object-pointer to this reality is
Java. A fine Object language, plenty of people, and a datbase driven
company backing it tooth and nail. Oracle itself is written in Java.
So, why not that? Well, again real world steps in, for the practiacly
minded. Fact is, that Oracal only runs on Oracal. Oracal is very well
supported and a good sysetm, but ... that's a heck of a limitation to
put yourself in with a large system like a dataware house.
Of course when I first read this I thought "its not, its written in SQL"
A data warehouse is the design structure of data, into groupings which
are 'most likly to be required' Then there is the engine which adjusts
those groupings and indexes and dimensions as it 'learns' from the
needs of the clients. That's the C++ part, or if its Oracel, then
Java. Then there are the clients working to see the data. Those are
made from just about anything these days, from C++ to Visual Basic.
Heck there's even a few Perl clients that helped with the Genome
project, and SETI.
But moving that amount of data around is no small task. Your langage
can't be the cause of the slow downs, or the thing that is limiting
what you can do with your data. Juat as an example, which you probably
"understand too much" to really see the humor in it, is a head line I
saw recently.
"FedEx Lowers Inventory Management Costs by Linking Shipping Receipt
Data to Customer Records"
Between the lines there, you just know that it was a no brainer to put
those to things together, and they probaly wanted to do this years
ago. Shipping Receipts tied to Customer Data.. seems pretty obvious to
me.. so, what took them so long? Weight, that's what.
http://www.dw-institute.com/research/display.asp?id=7070
quoted from the artical "To launch FedEx InSight, the company needed
to be able to match name and address data from shipping receipts with
records in its business-customer database. Creating cleaner, more
consistent data was vital for this to work."
Matching sounds simple to little guys, like me, who deal with websites
mainly these days. But the trick here is, if I have to do something
really complicated, and it blows up, I can always reload. That's
simple. Not for these guys. A crash on that level, is, monsterous.
Whole sections go into black out.. so you got a back up? cool, it was
usless 5 minutes after it was made really.
The point (yes, I have one) is that the engine has to be stable. For
it to be stabel you need three basic things. Flexabilty, history, and
personel with experience. A data warehouse, existing in the real world
is a huge thing, a juggernaut moving through cyber space. They don't
fall well. Currently, at this time, C++ is the master of those three
areas.
I was talking to a guy who said he came up with a better SQL. A thing
that blew SQL away, objects, speed and just about everything else. I
thought that was fantastic, not new, but fantastic. I'll probably
never use it in the real world, but it is great. New, faster, and
better are the least of my worries. Dependable and supportable are my
worries. I know there are faster, better ways to develop. But what
good are they if only 15 people in the world are capable of creating a
data warehouse with them? That's not worthless, that's dangerous.
As far as your thoughts on OLAP.. well.
http://www.olapreport.com/fasmi.htm
Sure, they could do that.. whatever the heck that is. But probably
not, because you don't need it, and its expensive to code in C++. The
price you pay for the rest of the stuff I was talking about. I believe
any software company developing at that level is looking for that fast
easy stuff. Not at the risk of them selves, but they are willing to
experiment and move around. Again, the FedEx artical comes to mind.
New ways to do that old sins.
Same with ETL, again, we are working "with" a data warehouse at this
level, not supporting it, or maintainint it or, eek, moving it around
in a real way. We are extracting, transforming and loading. SQL
statments with a bit of sugar. Again, we can experiment and keep costs
down stepping away from C++ and working with something that is more
pliable, easier to change fast and adapt to customer and marketing
needs.
At this level there's probably a C++ guy or guru Java gal sitting in
the back rooms, keeping everyone safe, but I doubt that anyone is
developing with C++ as the primary at this stage in the game. We are
back to that practicality thing again.
From your question I feel that you've already done a few nights of
research on this subject, so I'm really wondering what to give you as
far as hyperlinks here. Is there a specific thing you need looked
into? I can pile on hyperlinks about dataware housing, marketing has
seen to that over tha last few years. :-) but I don't want to just
fill a page with links that aren't going to do you any goood. So why
don't you use the Clarfication button there and let me know where to
spend a bit of time doing some research. Normally we don't do it this
way, but mind reading isn't my strong point today.
Thanks,
webadept-ga |