Google Answers Logo
View Question
 
Q: Controlling compilation by gcc of C code ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Controlling compilation by gcc of C code
Category: Computers > Programming
Asked by: ignorant-ga
List Price: $10.00
Posted: 01 Oct 2004 06:14 PDT
Expires: 31 Oct 2004 05:14 PST
Question ID: 408843
I would like to optimize out the kazoo (i.e., -O3 plus whatever other
optimizations I can indicate) EXCEPT there are three expressions which
I positively need to have executed in a particular order and all memory
side effects of each must be complete before the next.  The first is a while
statement, the second is a switch statement, and the third is a multiple
assignment (via the comma operator).  Moreover, the structure of the code
is

t:
  statement1;
  statement2;
  statement3;
  goto t;

Unfortunately, 
  gcc -O3 -mmmx -msse -fmove-all-movables -funroll-loops
-fschedule-insns2 -fexpensive-optimizations
-fdelete-null-pointer-checks -frerun-loop-opt -fstrength-reduce
-finline-limit=100000

has the nasty effect of violating my requirement for sequential execution
and completion of side effects between the three statements.

Can someone explain:
1) What about the compilation is messing with the order and sideefffect
   as defined above (and please, dont say "just use -O", I already know
   that works; I want to know what the rules are so that I can deduce
   how a given level of optimization behaves!)
2) How at any level of optimization can one explicitly specify that
   certain expressions are to be completed in a specific order.

Unfortunately, I cannot provide specific details as to what exactly
the code is.

Request for Question Clarification by maniac-ga on 01 Oct 2004 12:00 PDT
Hello Ignorant,

I am still looking for a more complete answer to the first question but
  http://freshmeat.net/articles/view/730/
has a concise summary of what -O2 and -O3 do and do not do. Providing
a code example would help provide the "right" answer, but if you can't
post that - I understand. Note that -O3 may not be "faster" than -O2
due to system cache effects. You really need to measure the
application to see which works best for you.

The easiest method I can think of to answer your second question is to
call an externally defined function between each of the statements you
want to force serial execution. Something like:
  statement1;
  f();
  statement2;
  f();
  statement3;
where f is a function that is neither pure nor const (has possible
side effects). Since you are also asking for memory side effects to be
complete - f() should be some kind of memory barrier. For example, in
the Linux kernel, a call to one of rmb(), wmb(), mb() will serialize
access to memory in various ways (read, write, both).

Please use a question clarification to indicate if this kind of
solution is acceptable so I can prepare a more complete answer. If it
is not suitable, please indicate the additional constraints that make
this approach inappropriate.

  --Maniac

Clarification of Question by ignorant-ga on 04 Oct 2004 06:21 PDT
The url http://freshmeat.net/articles/view/730/ is a nice general place to
start, but I am specifically interested in when it is (under what conditions)
the order of statements gets permuted and/or their side effects do not take
place in the expected order (which is the order given in the source).  Also,
even if that article helps, I may be too ignorant to realize it... for example
it speaks of "reorder blocks", but what precisely does that mean, and when
exactly does that take place (as opposed to being *eligible* to take place,
which I presume is what -03 does; it enables certain optimizations).

I am ignorant of "memory barriers", but suspect they might be an answer for me.
I was hoping there were some sort of flags or compiler directives or something
(and perhaps these "memory barriers" are that something) which allow programmer
control over reordering/sideffects.

Interleaving function calls like
  statement1;
  f();
  statement2;
  f();
  statement3;
would be acceptable if there were not overhead involved with f(), since
it is not strictly speaking f() that is the point of the computation and
since speed is a concern.  Perhaps there is some very fast executing f()
which suffices for my serialization requirement, and in that case knowing
what it was would definately be useful.

Request for Question Clarification by maniac-ga on 04 Oct 2004 18:09 PDT
Hello Ignorant,

Well, there is a nice summary of block reordering at
  http://www.gnu.org/software/gcc/news/reorder.html
which has generated code for a complex set of if statements within a
loop. Code shown is for with and without block reordering. The
reordering is done to improve the branch predictions and help prevent
unused code being loaded into the instruction cache [both improve code
performance].

Basically, the compiler optimizer will often reorder code that it
determines are not "dependent" based on knowledge of how the CPU
processes work. For example, if you have code doing both floating and
integer arithmetic, it may be better to interleave the two types like
this:
  Fop  Iop  Iop  Fop  Iop  Iop  Fop
[Fop refers to Floating operation, Iop refers to Integer operation]
since that spreads out the work to multiple parts of the CPU. Note
that if you do a series of operations with several variables that are
relatively independent, those are also candidates for reordering.

A technical paper on the kinds of optimizations that can be performed
(not limited to gcc) is at
  http://www.cs.washington.edu/research/projects/cecil/www/pubs/pldi03.html
which discusses both methods used in typical compilers as well as how
to "prove" the methods preserve the semantics of original C code.

If you are using an X86 processor, the main difference between -O2 and
-O3 is the function inlining. The other major effect (register
renaming) is beneficial on machines with a LOT of registers; the
freshmeat article makes that point. On the use of function inlining, I
do not recommend that unless:
 - you have measured the performance before and after inlining
 - you have a CPU with large instruction caches
The code bloat you get with inlining can be quite sizeable. I have
even seen C++ coding standards that basically make similar statements
such as:
  http://www.doc.ic.ac.uk/lab/cplus/c++.rules/
or more specifically:
  http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap6.html#sect1
  http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap7.html#sect2
  http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap9.html#sect5
  http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap4.html#Example1
which recommends function inlining for very restricted uses [the
function call overhead is more than the cost of performing the
function inline].

Note also with gcc that function inlining may cause the compiler to
consider many more code blocks for reordering (making your ordering
problem "worse").

Going to the second part of the question, a "memory barrier" is
necessary whenever you need to serialize the references to memory.
Adding a memory barrier can slow down the code execution by some
amount but is sometimes required to guarantee the proper memory
sequence. There are a number of CPU's or memory systems that do not
guarantee the order of memory writes will be the same as statement
execution. The DEC Alpha comes to mind, see
  http://h71000.www7.hp.com/doc/72final/6459/6459pro_009.html#rwo_sec
which has a short explanation of the effects of two Alpha processors
reading / writing the same location in memory.

You can get similar problems when working with I/O devices. A common
operation is to "set up" the I/O device by writing to a few I/O
locations and then starting the operation with another write to a
control location. If you really want to guarantee the set up is done
before the I/O start, add a write memory barrier before the I/O start
write. There is a reasonably short explation of this type of effect at
  http://www.cs.pdx.edu/~jrb/ui/linux/driver8.txt
Scroll down to "I/O registers and conventional memory" which
recommends use of both volatile declarations and memory barrier
function calls. This appears to be a set of lecture notes related to
the Linux Device Drivers book which is on line at
  http://www.xml.com/ldd/chapter/book/
or more specifically referring to
  http://www.xml.com/ldd/chapter/book/ch08.html
This latter reference for example indicates on the X86 that memory
reads can be freely reordered but memory writes are always ordered.

You should also see
  http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles
which describes when gcc will access a volatile variable. This can
have some effects on the instruction stream as well. Since this refers
to sequence points, see
  http://publications.gbdirect.co.uk/c_book/chapter8/sequence_points.html
for an explanation of that concept as well. I also suggest a look at
  http://publications.gbdirect.co.uk/c_book/chapter8/const_and_volatile.html
which describes volatile variables in more detail.

Please use another question clarification to indicate if your problem
is solved or other conditions I need to explain.

  --Maniac

Clarification of Question by ignorant-ga on 05 Oct 2004 15:44 PDT
I think you have got me on the right track.
Question answered.
Thanks.
Answer  
Subject: Re: Controlling compilation by gcc of C code
Answered By: maniac-ga on 05 Oct 2004 16:49 PDT
Rated:5 out of 5 stars
 
Hello Ignorant,

Good. Let's recap the original question and the followups in the clarifications.

#1 - I want to know what the rules are so that I can deduce how a
given level of optimization behaves.

The link in the original clarification request:
  http://freshmeat.net/articles/view/730/
has a brief explanation of what gcc does with both -O2 and -O3
optimization. The main difference between the two is the function
inlining and register renaming. Inlining can have a significant effect
on the scope of optimization - allowing the compiler to move more code
around to be "more efficient" but from your original statement may be
causing the order of operations to memory to be incorrect. Another
link that provides further information is
  http://gcc.gnu.org/onlinedocs/gcc/index.html
which is the top level of the gcc online manual or more specifically
  http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
which lists the optimization options and which ones are enabled by -O, -O2 and -O3.

There are some additional references available including:
  http://www.gnu.org/software/gcc/news/reorder.html
a summary of code reordering in gcc
  http://www.cs.washington.edu/research/projects/cecil/www/pubs/pldi03.html
a technical paper describing optimization in general and describes a
validation method to ensure the optimizations preserve the semantic
meaning of the code. Searching with phrases like:
  compiler optimizer method
  compiler optimize description
will bring up several additional references.

#2 - How at any level of optimization can one explicitly specify that
certain expressions are to be completed in a specific order.

There are some steps that can be performed including:
 - adding function calls
 - declaring key variables as volatile
 - adding memory barrier function calls (which may be inlined)

I mentioned the first one as a "quick and dirty" way to ensure code is
executed in order. The only real requirement is that the function is:
 - in a separate compilation unit
 - is not pure nor const
 - should not be inlined
the second requirement means the function has some global side effect.
It could be something as simple as negating a global value. Adding the
function call forces the compiler to complete all statements before
the function call, then call the function, then start the statements
after the function call.

The second method (volatile declarations) is described at:
  http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles
for the gcc compiler specifically or at
  http://publications.gbdirect.co.uk/c_book/chapter8/const_and_volatile.html
for a more general reference. Since there is also some interaction
with "sequence points", I also suggest a look at
  http://publications.gbdirect.co.uk/c_book/chapter8/sequence_points.html
for an explanation of that concept. The idea behind a volatile
variable (or type) is that each access (read or write) goes to the
address each time referenced in the code. The compiler is not allowed
to cache the value in a register (or intermediate value in a common
sub expression).

The third method (memory barrier) addresses a situation that can occur
with some high performance processors. These processors do not
guarantee the order of reads / writes to memory (or an I/O device) is
the same as the order of reads / writes in the code. The reference
describing the Alpha processor at
  http://h71000.www7.hp.com/doc/72final/6459/6459pro_009.html#rwo_sec
is an example where two CPU's in a system can get inconsistent results
depending on subtle timing issues.

The general solution for out of order reads / writes is to add a
memory barrier (read or write or both) between two statements. A read
barrier will tell the CPU to complete pending read accesses before
completing the read barrier instruction. A write barrier cause a
similar effect for write accesses. The need for this in Linux device
drivers is desribed at
  http://www.xml.com/ldd/chapter/book/ch08.html
which discusses how to use both volatile variables and memory barriers
to ensure proper access to I/O devices.

If you have any further problems in this area, please do not hesitate
to make a clarification request. Good luck with your work.

  --Maniac
ignorant-ga rated this answer:5 out of 5 stars
As usual, maniac-ga delivers!

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy