|
楼主 |
发表于 2009-11-9 08:49:05
|
显示全部楼层
http://www.linuxfromscratch.org/ ... October/023106.html
=> http://nickclifton.livejournal.com/
=> http://nickclifton.livejournal.com/4128.htmlGNU Toolchain Update, October 2009
- Oct. 19th, 2009 at 10:46 AM
Hi Guys,
Well the major news this month is that a big new feature has been
added to gcc: Link-Time Optimization.
When this feature is enabled (via the -flto command line option) gcc
interrupts the processing of a source file after it has converted
it into the GIMPLE format (one of GCC's internal representations).
Then, before carrying on with its optimizations, gcc writes the
GIMPLE out to into special sections in the output object file.
After that gcc carries on as normal to optimize the GIMPLE and then
convert it into machine instructions which go into the normal
sections in the object file.
When object files containing these special GIMPLE sections are
linked together they can be read in and optimized before the final
link actually takes place. This allows for greater optimization
opportunities, especially with inter-procedural optimizations.
To use the link-timer optimizer -flto needs to be specified at both
compile time and during the final link. For example,
gcc -c -O2 -flto foo.c
gcc -c -O2 -flto bar.c
gcc -o myprog -flto -O2 foo.o bar.o
Another (simpler) way to enable link-time optimization is,
gcc -o myprog -flto -O2 foo.c bar.c
Note that when a file is compiled with -flto, the generated object
file will be larger than a regular object file because it will
contain GIMPLE bytecodes and the usual final code. This means that
object files with LTO information can be linked as a normal object
file. So, in the previous example, if the final link is done with:
gcc -o myprog foo.o bar.o
The only difference will be that no inter-procedural optimizations
will be applied to produce "myprog". The two object files foo.o and
bar.o will be simply sent to the regular linker.
Additionally, the optimization flags used to compile individual
files are not necessarily related to those used at link-time. For
instance:
gcc -c -O0 -flto foo.c
gcc -c -O0 -flto bar.c
gcc -o myprog -flto -O3 foo.o bar.o
This will produce individual object files with unoptimized assembler
code, but the resulting binary "myprog" will be optimized at -O3.
Now, if the final binary is generated without -flto, then "myprog"
will not be optimized.
When producing the final binary with -flto, GCC will only apply
link-time optimizations to those files that contain bytecodes.
Therefore, you can mix and match object files and libraries with
GIMPLE bytecodes and final object code. GCC will automatically
select which files to optimize in LTO mode and which files to link
without further processing.
There are some code generation flags that GCC will preserve when
generating bytecodes, as they need to be used during the final link
stage. Currently, the following options are saved into the GIMPLE
bytecode files: -fPIC, -fcommon and all the -m target flags.
At link time, these options are read-in and reapplied. Note that
the current implementation makes no attempt at recognizing
conflicting values for these options. If two or more files have a
conflicting value (e.g., one file is compiled with -fPIC and another
isn't), the compiler will simply use the last value read from the
bytecode files. It is recommended, then, that all the files
participating in the same link be compiled with the same options.
Another feature of LTO is that it is possible to apply
interprocedural optimizations on files written in different
languages. This requires some support in the language front end.
Currently, the C, C++ and Fortran front ends are capable of emitting
GIMPLE bytecodes, so something like this should work
gcc -c -flto foo.c
g++ -c -flto bar.cc
gfortran -c -flto baz.f90
g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran
Notice that the final link is done with g++ to get the C++ runtime
libraries and -lgfortran is added to get the Fortran runtime
libraries. In general, when mixing languages in LTO mode, you
should use the same link command used when mixing languages in a
regular (non-LTO) compilation. This means that if your build
process was mixing languages before, all you need to add is
-flto to all the compile and link commands.
If object files containing GIMPLE bytecode are stored in a library
archive, say libfoo.a, it is possible to extract and use them
in an LTO link if you are using gold as the linker (which, in turn
requires GCC to be configured with --enable-gold). To enable this
feature, use the command line option -use-linker-plugin at
link-time. Eg:
gcc -o myprog -O2 -flto -use-linker-plugin a.o b.o -lfoo
With the linker plugin enabled, gold will extract the needed GIMPLE
files from libfoo.a and pass them on to the running GCC to make them
part of the aggregated GIMPLE image to be optimized.
If you are not using gold and/or do not specify -use-linker-plugin
then the objects inside libfoo.a will be extracted and linked as
usual, but they will not participate in the LTO optimization
process.
Link time optimizations do not require the presence of the whole
program to operate. If the program does not require any symbols to
be exported, it is possible to combine -flto with -fwhole-program to
allow the interprocedural optimizers to use more aggressive
assumptions which may lead to improved optimization opportunities.
Regarding portability: the current implementation of LTO makes no
attempt at generating bytecode that can be ported between different
types of hosts. The bytecode files are versioned and there is a
strict version check, so bytecode files generated in one version of
GCC will not work with an older/newer version of GCC.
One problem with link time optimization is that it can require a lot
of computer resources (memory and processing time). For large
programs this can be a problem. One solution is to use the new
-fwhopr command line option. This option is identical in
functionality to -flto but it differs in how the final link stage is
executed. Instead of loading all the function bodies in memory, the
callgraph is analyzed and optimization decisions are made (whole
program analysis or WPA). Once optimization decisions are made, the
callgraph is partitioned and the different sections are compiled
separately (local transformations or LTRANS).
Cheers
Nick |
|