Hello Questionaskcomputers,
Perhaps the most complete and straight forward reference is at
http://packetstormsecurity.nl/programming-tutorials/Assembly/fpuopcode.html
this appears to be duplicated at
http://tdenis.tripod.com/download/fpuopcode.txt
and
http://pheatt.emporia.edu/courses/1997/cs561f97/hand25.htm
They all appear to be created by "John Allen" (view the web page
source), perhaps from Northrup Grumman - see
http://packetstormsecurity.nl/0410-advisories/FakeRedhatPatchAnalysis.txt
for a message on one of the sites referring with that name, but that
is speculation on my part.
Let me explain the first reference and how I matched it to some
technical articles written in the 1980's to verify that the timing
appears to be correct.
Scroll down to FMUL for a complete example. The values in the table
are in CPU cycles - so for a 8 Mhz 8087, a cycle would take 0.125
microseconds each. Converting to microseconds, the values for the 8087
would be calculated as:
8087 in microseconds
fmul reg s 11.250 to 13.125
fmul reg 16.250 to 18.125
fmul mem32 13.750 to 15.625 + EA
fmul mem64 19.250 to 21.000 + EA
fmulp reg s 11.750 to 13.500
fmulp reg 16.750 to 18.500
where the "s" suffix indicates a value with 40 trailing zeros in the
fraction and EA refers to the time to calculate the effective address.
From the integer page at
http://packetstormsecurity.nl/programming-tutorials/Assembly/opcode.html
the time for effective address calculations is as follows.
EA = cycles to calculate the Effective Address
8088/8086:
base = 5 BP+DI or BX+SI = 7 BP+DI+disp or BX+SI+disp = 11
index = 5 BX+DI or BP+SI = 8 BX+DI+disp or BP+SI+disp = 12
disp = 6 segment override = +2
a reasonable value to use is 8-10 cycles on average, so adding a
microsecond to the values in my table would be a quick approximation
to the typical execution time.
To convert to FLOPS, do a calculation like
1/.000020 (for a typical 64 bit multiply)
to get 50,000 FLOPS for 64 bit multiplies.
If you have Microsoft Excel, I suggest a simple spreadsheet; something like
A1 = 'HZ'
B1 = 8000000
(select B1, use Insert -> Name... -> Define and define the name HZ to
refer to cell B1)
A2 = 'CycleTime'
B2 = =1/HZ
Then you can define rows like
A5 = (Name of instruction) [e.g., fmul reg s]
B5 = (lower value of cycles) [e.g., 90]
C5 = (upper value of cycles) [e.g., 105]
D5 = =B5*CycleTime
E5 = =C5*CycleTime
F5 = 1/D5
G5 = 1/E5
to compute the values needed and can vary the HZ value to get results
for another CPU speed (e.g., 4700000 for 4.7 MHz.
To make it look nice, I formatted columns D and E to be a number with
10 digits after the decimal point and columns F and G to be a number
with 0 digits after the decimal point.
I have a spreadsheet built like this (but cannot post it directly on
Google Answers - let me know if you want a copy and I can put it on a
public location for you to download if needed.
The technical articles I found were a series titled "DTACK GROUNDED"
which has several articles in the early 1980's that describe the
development of the 8087 and has some timing information. Search for
"DTACK GROUNDED"
to find archives at
http://www.amigau.com/68K/dg/dg.htm
and
http://linux.monroeccc.edu/~paulrsm/dg/dg.htm
I find it interesting to note that this series of journals is for
"simple 68000 systems" yet has quite a bit of performance data for
Intel processors as well.
See #17
http://linux.monroeccc.edu/~paulrsm/dg/dg17.htm
and scroll down to page 1, column 2 for an article titled "Math Chips
Revisited". It talks about a claim made by Intel that a floating point
multiply can be done in 19 microseconds (and scoffs at the claim).
Note if you set HZ to 4.7 MHz and use the 90 cycles from fmul reg s,
you get 19.149 microseconds - pretty close to the claimed 19
microseconds.
The article claims that a fpmul takes 27 microseconds. You can get
this result by setting HZ to 4.7 MHz and use the 130 cycles from fmul
reg to get 27.656 microseconds - again close to the claimed 27
microseconds.
As a cross check, see #19
http://linux.monroeccc.edu/~paulrsm/dg/dg19.htm
and scroll down to page 8, column 1 for an article titled "Truth In
Advertising". It indicates an Intel advertisement for 1.7 microseconds
(should be 17 microseconds) for an 8 MHz 8087 for a double precision
multiply. Adjusting HZ to 8 MHz and looking at the fmul values, I see
a range of 16.250 to 18.125 (as I showed above), so 17 microseconds
matches the values I calculated.
Another comparative reference is at
http://www.geocities.com/SiliconValley/Bay/9187/
which describes "FLOPS" for the 8087 in 1983 as roughly 30,000 - the
data I calculated for timing can justify that kind of value.
To answer your specific question for performance of an 8087 for 32 bit
floating point add, multiply, and divide, some typical values are:
Cycles Seconds FLOPS Microseconds
Minimum Maximum Minimum Maximum Minimum Maximum Min Max
fadd 70 100 0.0000087500 0.0000125000 114286 80000 8.75 12.5
fdiv 193 203 0.0000241250 0.0000253750 41451 39409 24.125 25.375
fmul 130 145 0.0000162500 0.0000181250 61538 55172 16.25 18.125
the above are for an 8 MHz 8087. Similar results can be calculated for
other execution rates.
The searches I used for this answer included:
8087 floating point multiply timing
8087 cycles floating point multiply add divide
8087 fmul fdiv fadd microseconds
8087 fmul fdiv fadd cycles
"FPU instruction timings" (to find multiple sources of the reference data)
[a couple fruitless searches... had some intersesting information but
no complete results]
8087 timing site:intel.com
8087 cycles site:intel.com
If some part of the answer is unclear, you need further information,
or want the spreadsheet I prepared, please make a request for
clarification. i would be glad to help you as needed.
--Maniac |