Thank you for posing this question. Issues about speedup and
vectorization are often quite controversial and emotional; I used to
do
a lot of work with vectorized and parallel codes, and researchers (and
marketers) would argue endlessly about how much speedup was achieved.
I did not receive a clarification as to whether you intended a speedup
of 10 or 20 in vectorizable code. Therefore, I will simply answer your
question as stated. If you would like it re-solved for a speedup of 20
instead of 10, just use the "request clarification" button to do so.
In this particular case, the solution to your problem, with the
speedup for vectorized code of 10 and not 20, is available in pdf
from:
http://www.cs.ucf.edu/courses/cda5106/summer02/hw1_sol.pdf . You need
a pdf reader to access this, which you can get here:
http://www.adobe.com/products/acrobat/readstep.html .
This solution is useful also for defining the key terms involved.
a:
Let S be the net speedup, let V be the speedup for vectorized code,
and let F be the fraction of code that is vectorizable, and let P be
the speedup on vectorizable code.
Then S is the time_without_vectorization/time_with_vectorization .
Suppose the time without vectorization is 1 .
Then using vectorization, 1-F of the code runs at the same speed,
taking time 1-F, and F runs at time F/P. Hence the total time for the
vectorizing code is 1-F + (F/P). The net speedup is then:
1
-------
1-F+(F/P)
Here we are using a value of P of 10, so the speedup is:
1
-------
1-F+(F/10)
This can be rewritten:
1/(1-.9F)
Now remember that F is going to be the percentage of vectorization
divided by 100, so if G is the percentage of vectorization, the graph
is:
Speedup = 1/(1-.009 G)
G here ranges from 0% to 100%.
You can graph this I assume yourself using your favorite graphing
software. The X axis should be "precent vectorization" and the Y-axis
should be Speedup (as in the solution link posted above).
Specific values are:
Percent Vectorization Speedup
-------------------------------------------
0 1
.1 1.0009
1 1.0091
10 1.0989
50 1.8
90 5.2632
100 10
Oddly enough, an example in a page on how to use gnuplot includes this
graph! I used google's cache of the web page, which was unavailable,
at: http://216.239.51.100/search?q=cache:BqeynDIQvEMC:www.cs.virginia.edu/helpnet/Authoring_Tools/gnuplot/+gnuplot+percent*+vectorization&hl=en&ie=UTF-8
to see this.
b:
Speedup = 1/(1-.009 G)
1-.009G = 1/Speedup
G=(1-1/Speedup)/.009
For speedup of 2, we get
G=(1-1/2)/.009
= .5/.009
55.5556
thus a percent vectorization of 55.6% will be necessary to achieve a
speedup of 2.
c:
If a speedup of 2 is achieved, then 55.6% of the code is vectorized.
Suppose the original code takes 100 seconds. Then 100-55.6 = 44.4 % of
the code will run unvectorized, which will take 44.4 seconds. We know,
since a speedup of 2 is achieved that the total run time is 50
seconds. Hence, 50-44.4=5.6 seconds are spent in vectorized code.
Thus, 5.6% of the runtime will be spent running vectorized code.
d:
The maximum speedup is 10, as we saw above, so half the maximum
speedup is 5.
As in part b, we have:
G=(1-1/Speedup)/.009
= .8/.009
=88.89%
e:
Let P' be the new speedup.
If we doubled hardware speed, then P'=20 so that speedup is
1/(1-F+F/20)
Consider a program that runs unoptimized in 100 seconds.
In the current hardware/compiler configuration, the speedup is:
1/(1-.7+.7/10)
=
1/(.3+.07)
=
2.7
Hence, this program takes
100/2.7 seconds, or 37 seconds to run currently.
On the other hand if we use the hardware speedup, then the speedup is:
1/(1-.7+.7/20)
=
2.9851
So that the total time is:
100/2.9851 = 33.5 seconds .
The percent improvement is
Now, we want to know, what percent of vectorization would result in
such
a speedup with the current hardware?
We just plug in our formula from part b:
G=(1-1/Speedup)/.009
G= (1-1/(2.9851)/.009)
= 73.9 %
Originally our percent vectorization was 70%. Hence, the percent
increase in vectorization is:
3.9/70 = 5.5 %
So trying to improve the compiler a little bit seems like a natural
choice, and is the answer they are looking for.
(In real life, things are more complicated. For example, sometimes
"important" codes, codes that rich customers really need or want or
will pay to have sped up, are the most highly vectorizable; and it is
more important to speed these up than to speed up other codes. For
example, suppose you are a supercomputer vendor who sells 10 machines
each year at a cost of 20 million dollars each. If two of your
customers are using your machines to do a particular computational
fluid dynamics simulation that is 99.9% vectorizable, then maybe
investing in hardware IS the best choice.
Furthermore, remember that those customers who really want to improve
vectorizability can often tweak codes by hand; but the fluid dynamics
people have already done all the hand-tuning to increase vectorization
they need). |