Google Answers: Controlling compilation by gcc of C code

View Question

Q: Controlling compilation by gcc of C code ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: Controlling compilation by gcc of C code
Category: Computers > Programming
Asked by: ignorant-ga
List Price: $10.00

Posted: 01 Oct 2004 06:14 PDT
Expires: 31 Oct 2004 05:14 PST
Question ID: 408843

I would like to optimize out the kazoo (i.e., -O3 plus whatever other optimizations I can indicate) EXCEPT there are three expressions which I positively need to have executed in a particular order and all memory side effects of each must be complete before the next. The first is a while statement, the second is a switch statement, and the third is a multiple assignment (via the comma operator). Moreover, the structure of the code is t: statement1; statement2; statement3; goto t; Unfortunately, gcc -O3 -mmmx -msse -fmove-all-movables -funroll-loops -fschedule-insns2 -fexpensive-optimizations -fdelete-null-pointer-checks -frerun-loop-opt -fstrength-reduce -finline-limit=100000 has the nasty effect of violating my requirement for sequential execution and completion of side effects between the three statements. Can someone explain: 1) What about the compilation is messing with the order and sideefffect as defined above (and please, dont say "just use -O", I already know that works; I want to know what the rules are so that I can deduce how a given level of optimization behaves!) 2) How at any level of optimization can one explicitly specify that certain expressions are to be completed in a specific order. Unfortunately, I cannot provide specific details as to what exactly the code is.
Request for Question Clarification by maniac-ga on 01 Oct 2004 12:00 PDT Hello Ignorant, I am still looking for a more complete answer to the first question but http://freshmeat.net/articles/view/730/ has a concise summary of what -O2 and -O3 do and do not do. Providing a code example would help provide the "right" answer, but if you can't post that - I understand. Note that -O3 may not be "faster" than -O2 due to system cache effects. You really need to measure the application to see which works best for you. The easiest method I can think of to answer your second question is to call an externally defined function between each of the statements you want to force serial execution. Something like: statement1; f(); statement2; f(); statement3; where f is a function that is neither pure nor const (has possible side effects). Since you are also asking for memory side effects to be complete - f() should be some kind of memory barrier. For example, in the Linux kernel, a call to one of rmb(), wmb(), mb() will serialize access to memory in various ways (read, write, both). Please use a question clarification to indicate if this kind of solution is acceptable so I can prepare a more complete answer. If it is not suitable, please indicate the additional constraints that make this approach inappropriate. --Maniac
Clarification of Question by ignorant-ga on 04 Oct 2004 06:21 PDT The url http://freshmeat.net/articles/view/730/ is a nice general place to start, but I am specifically interested in when it is (under what conditions) the order of statements gets permuted and/or their side effects do not take place in the expected order (which is the order given in the source). Also, even if that article helps, I may be too ignorant to realize it... for example it speaks of "reorder blocks", but what precisely does that mean, and when exactly does that take place (as opposed to being eligible to take place, which I presume is what -03 does; it enables certain optimizations). I am ignorant of "memory barriers", but suspect they might be an answer for me. I was hoping there were some sort of flags or compiler directives or something (and perhaps these "memory barriers" are that something) which allow programmer control over reordering/sideffects. Interleaving function calls like statement1; f(); statement2; f(); statement3; would be acceptable if there were not overhead involved with f(), since it is not strictly speaking f() that is the point of the computation and since speed is a concern. Perhaps there is some very fast executing f() which suffices for my serialization requirement, and in that case knowing what it was would definately be useful.
Request for Question Clarification by maniac-ga on 04 Oct 2004 18:09 PDT Hello Ignorant, Well, there is a nice summary of block reordering at http://www.gnu.org/software/gcc/news/reorder.html which has generated code for a complex set of if statements within a loop. Code shown is for with and without block reordering. The reordering is done to improve the branch predictions and help prevent unused code being loaded into the instruction cache [both improve code performance]. Basically, the compiler optimizer will often reorder code that it determines are not "dependent" based on knowledge of how the CPU processes work. For example, if you have code doing both floating and integer arithmetic, it may be better to interleave the two types like this: Fop Iop Iop Fop Iop Iop Fop [Fop refers to Floating operation, Iop refers to Integer operation] since that spreads out the work to multiple parts of the CPU. Note that if you do a series of operations with several variables that are relatively independent, those are also candidates for reordering. A technical paper on the kinds of optimizations that can be performed (not limited to gcc) is at http://www.cs.washington.edu/research/projects/cecil/www/pubs/pldi03.html which discusses both methods used in typical compilers as well as how to "prove" the methods preserve the semantics of original C code. If you are using an X86 processor, the main difference between -O2 and -O3 is the function inlining. The other major effect (register renaming) is beneficial on machines with a LOT of registers; the freshmeat article makes that point. On the use of function inlining, I do not recommend that unless: - you have measured the performance before and after inlining - you have a CPU with large instruction caches The code bloat you get with inlining can be quite sizeable. I have even seen C++ coding standards that basically make similar statements such as: http://www.doc.ic.ac.uk/lab/cplus/c++.rules/ or more specifically: http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap6.html#sect1 http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap7.html#sect2 http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap9.html#sect5 http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap4.html#Example1 which recommends function inlining for very restricted uses [the function call overhead is more than the cost of performing the function inline]. Note also with gcc that function inlining may cause the compiler to consider many more code blocks for reordering (making your ordering problem "worse"). Going to the second part of the question, a "memory barrier" is necessary whenever you need to serialize the references to memory. Adding a memory barrier can slow down the code execution by some amount but is sometimes required to guarantee the proper memory sequence. There are a number of CPU's or memory systems that do not guarantee the order of memory writes will be the same as statement execution. The DEC Alpha comes to mind, see http://h71000.www7.hp.com/doc/72final/6459/6459pro_009.html#rwo_sec which has a short explanation of the effects of two Alpha processors reading / writing the same location in memory. You can get similar problems when working with I/O devices. A common operation is to "set up" the I/O device by writing to a few I/O locations and then starting the operation with another write to a control location. If you really want to guarantee the set up is done before the I/O start, add a write memory barrier before the I/O start write. There is a reasonably short explation of this type of effect at http://www.cs.pdx.edu/~jrb/ui/linux/driver8.txt Scroll down to "I/O registers and conventional memory" which recommends use of both volatile declarations and memory barrier function calls. This appears to be a set of lecture notes related to the Linux Device Drivers book which is on line at http://www.xml.com/ldd/chapter/book/ or more specifically referring to http://www.xml.com/ldd/chapter/book/ch08.html This latter reference for example indicates on the X86 that memory reads can be freely reordered but memory writes are always ordered. You should also see http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles which describes when gcc will access a volatile variable. This can have some effects on the instruction stream as well. Since this refers to sequence points, see http://publications.gbdirect.co.uk/c_book/chapter8/sequence_points.html for an explanation of that concept as well. I also suggest a look at http://publications.gbdirect.co.uk/c_book/chapter8/const_and_volatile.html which describes volatile variables in more detail. Please use another question clarification to indicate if your problem is solved or other conditions I need to explain. --Maniac
Clarification of Question by ignorant-ga on 05 Oct 2004 15:44 PDT I think you have got me on the right track. Question answered. Thanks.

Answer

Subject: Re: Controlling compilation by gcc of C code
Answered By: maniac-ga on 05 Oct 2004 16:49 PDT
Rated: 5 out of 5 stars

Hello Ignorant,

Good. Let's recap the original question and the followups in the clarifications.

#1 - I want to know what the rules are so that I can deduce how a
given level of optimization behaves.

The link in the original clarification request:
  http://freshmeat.net/articles/view/730/
has a brief explanation of what gcc does with both -O2 and -O3
optimization. The main difference between the two is the function
inlining and register renaming. Inlining can have a significant effect
on the scope of optimization - allowing the compiler to move more code
around to be "more efficient" but from your original statement may be
causing the order of operations to memory to be incorrect. Another
link that provides further information is
  http://gcc.gnu.org/onlinedocs/gcc/index.html
which is the top level of the gcc online manual or more specifically
  http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
which lists the optimization options and which ones are enabled by -O, -O2 and -O3.

There are some additional references available including:
  http://www.gnu.org/software/gcc/news/reorder.html
a summary of code reordering in gcc
  http://www.cs.washington.edu/research/projects/cecil/www/pubs/pldi03.html
a technical paper describing optimization in general and describes a
validation method to ensure the optimizations preserve the semantic
meaning of the code. Searching with phrases like:
  compiler optimizer method
  compiler optimize description
will bring up several additional references.

#2 - How at any level of optimization can one explicitly specify that
certain expressions are to be completed in a specific order.

There are some steps that can be performed including:
 - adding function calls
 - declaring key variables as volatile
 - adding memory barrier function calls (which may be inlined)

I mentioned the first one as a "quick and dirty" way to ensure code is
executed in order. The only real requirement is that the function is:
 - in a separate compilation unit
 - is not pure nor const
 - should not be inlined
the second requirement means the function has some global side effect.
It could be something as simple as negating a global value. Adding the
function call forces the compiler to complete all statements before
the function call, then call the function, then start the statements
after the function call.

The second method (volatile declarations) is described at:
  http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles
for the gcc compiler specifically or at
  http://publications.gbdirect.co.uk/c_book/chapter8/const_and_volatile.html
for a more general reference. Since there is also some interaction
with "sequence points", I also suggest a look at
  http://publications.gbdirect.co.uk/c_book/chapter8/sequence_points.html
for an explanation of that concept. The idea behind a volatile
variable (or type) is that each access (read or write) goes to the
address each time referenced in the code. The compiler is not allowed
to cache the value in a register (or intermediate value in a common
sub expression).

The third method (memory barrier) addresses a situation that can occur
with some high performance processors. These processors do not
guarantee the order of reads / writes to memory (or an I/O device) is
the same as the order of reads / writes in the code. The reference
describing the Alpha processor at
  http://h71000.www7.hp.com/doc/72final/6459/6459pro_009.html#rwo_sec
is an example where two CPU's in a system can get inconsistent results
depending on subtle timing issues.

The general solution for out of order reads / writes is to add a
memory barrier (read or write or both) between two statements. A read
barrier will tell the CPU to complete pending read accesses before
completing the read barrier instruction. A write barrier cause a
similar effect for write accesses. The need for this in Linux device
drivers is desribed at
  http://www.xml.com/ldd/chapter/book/ch08.html
which discusses how to use both volatile variables and memory barriers
to ensure proper access to I/O devices.

If you have any further problems in this area, please do not hesitate
to make a clarification request. Good luck with your work.

  --Maniac

ignorant-ga rated this answer: 5 out of 5 stars

As usual, maniac-ga delivers!

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy