GHC HEAD compiled with -fllvm (milestone 4!)

After milestone 3 it is natural to proceed to milestone 4. Unfortunately this was not so easy and in fact it’s not reached yet thanks to the LLVM issue. Anyway, I’ve started with this mk/build.mk file:

GhcUnregisterised=YES
GhcWithNativeCodeGen=NO
SplitObjs=NO

GhcLibWays=v

GhcStage1HcOpts    =
GhcStage2HcOpts    = -fllvm -opta=-march=armv7a
GhcLibHcOpts       = -fllvm -opta=-march=armv7a
SplitObjs          = NO
HADDOCK_DOCS       = NO
BUILD_DOCBOOK_HTML = NO
BUILD_DOCBOOK_PS   = NO
BUILD_DOCBOOK_PDF  = NO

It basically means that I’ll build stage1 compiler with C backend, but stage2 compiler and its libraries with LLVM backend. Everything was going well till the moment stage1 compiler attempts to build stage2 compiler Parser.hs file. This file is kind of tricky, it’s Happy generated Haskell file from the Haskell grammar definition in Happy format. So, well, automatically generated Haskell file. And this makes LLVM screaming. Well, perhaps LLVM was not screaming, but the board surely was, since LLC stayed compiling this file for 7 hours and yet without a result! I’ve submitted this as an Infinite loop in llc on ARMv7 bug into LLVM bug database. Eli Friedman’s note in a comment 4 make it sound a little bit more optimistically (hopefully no infinite loop!), but still O(N^3) problem complexity is not something I’d like to stress my board on — also keep in mind I’m using unoptimized LLVM builds on ARM since this is the best build I can have with available GNU C++ compiler. At the end I worked around this issue by switching back to C backend just for this file.

Anyway, this was the only one issue of the build, otherwise it went fine and testsuite give me those results:

OVERALL SUMMARY for test run started at Sat Jun 25 10:12:25 CEST 2011
    2802 total tests, which gave rise to
    7180 test cases, of which
       0 caused framework failures
    3289 were skipped

    3721 expected passes
     102 expected failures
       0 unexpected passes
      68 unexpected failures

Unexpected failures:
   2014(normal)
   2228(normal)
   2636(normal)
   3171(normal)
   3586(normal)
   3890(normal)
   4850(normal)
   T2615(llvm,normal)
   T3016(llvm,normal)
   T3736(normal)
   T3807(normal)
   T3953(normal)
   T4801(normal)
   T4891(normal)
   T4978(normal)
   T706(normal)
   ann01(llvm,normal)
   annfail03(normal)
   annfail04(normal)
   annfail05(normal)
   annfail06(normal)
   annfail07(normal)
   annfail08(normal)
   annfail09(normal)
   annfail10(normal)
   annfail12(normal)
   annrun01(llvm,normal)
   apirecomp001(normal)
   cabal04(normal)
   dph-diophantine-opt(normal)
   dph-dotp-fast(normal)
   dph-dotp-opt(normal)
   dph-primespj-opt(normal)
   dph-quickhull-opt(normal)
   dph-sumnats(normal)
   dph-words-opt(normal)
   ghc-e001(normal)
   ghc-e002(normal)
   ghc-e003(normal)
   ghc-e004(normal)
   ghc-e005(normal)
   ghci024(normal)
   ghci037(normal)
   ghcpkg05(normal)
   hpc_ghc_ghci(normal)
   jmp_tbl(llvm)
   layout007(normal)
   massive_array(llvm,normal)
   qq001(normal)
   qq002(normal)
   qq003(normal)
   qq004(normal)
   qq007(llvm,normal)
   qq008(llvm,normal)
   recomp007(normal)
   tcrun006(llvm,normal)
   tcrun007(llvm,normal)
   tcrun029(llvm,normal)

If you compare this with Milestone 3 results you will see that results are identical except two “solved” timed out testcases (solved by increasing timeout value) and 3424 testcase which now does not fail. It failed due to GCC’s cc1 process being killed for whatever reason I don’t know, but I suspect Linux’s OOM killer for this. So if I compare this then it seems I got the same and expected results, which is very optimistic, but still I need to wait and see how/when LLVM team is going to solve the reported issue to properly reach the milestone 4, but in parallel I’ll of course work on milestone 5 already…

Advertisements

Milestone (3) reached…

Milestone (3) has been reached. This means I’m running ghc HEAD (from 1st June + 3 my patches) compiled with C backend and its testsuite compiled with both C and LLVM backend for comparison. For better milestones definitions see document here. For the build, based on experiences with several broken previous builds or testsuite runs I’ve installed LLVM HEAD (2.9 was buggy as described here). Also I’ve needed to install binutils 2.21 since ubuntu provided binutils 2.20.1 GNU assembler cannot recognize some of ARM instructions (VFPv3/NEON). The symptoms look like:

/tmp/ghc16874_0/ghc16874_0.s:1010:0:
     Error: bad instruction `vmrs apsr_nzcv,fpscr'

Anyway, after 20 hours of GHC compilation and after 10 hours of testsuite run I got following results:

OVERALL SUMMARY for test run started at Wed Jun 22 14:17:06 CST 2011
    2802 total tests, which gave rise to
    7180 test cases, of which
       0 caused framework failures
    3289 were skipped

    3718 expected passes
     102 expected failures
       0 unexpected passes
      71 unexpected failures

Unexpected failures:
   2014(normal)
   2228(normal)
   2636(normal)
   3171(normal)
   3424(normal)
   3586(normal)
   3890(normal)
   4850(normal)
   T2615(llvm,normal)
   T3016(llvm,normal)
   T3736(normal)
   T3807(normal)
   T3953(normal)
   T4801(normal)
   T4891(normal)
   T4978(normal)
   T706(normal)
   ann01(llvm,normal)
   annfail03(normal)
   annfail04(normal)
   annfail05(normal)
   annfail06(normal)
   annfail07(normal)
   annfail08(normal)
   annfail09(normal)
   annfail10(normal)
   annfail12(normal)
   annrun01(llvm,normal)
   apirecomp001(normal)
   barton-mangler-bug(llvm)
   cabal04(normal)
   dph-diophantine-opt(normal)
   dph-dotp-fast(normal)
   dph-dotp-opt(normal)
   dph-primespj-opt(normal)
   dph-quickhull-opt(normal)
   dph-sumnats(normal)
   dph-words-opt(normal)
   ghc-e001(normal)
   ghc-e002(normal)
   ghc-e003(normal)
   ghc-e004(normal)
   ghc-e005(normal)
   ghci024(normal)
   ghci037(normal)
   ghcpkg05(normal)
   hpc_ghc_ghci(normal)
   jmp_tbl(llvm)
   joao-circular(llvm)
   layout007(normal)
   massive_array(llvm,normal)
   qq001(normal)
   qq002(normal)
   qq003(normal)
   qq004(normal)
   qq007(llvm,normal)
   qq008(llvm,normal)
   recomp007(normal)
   tcrun006(llvm,normal)
   tcrun007(llvm,normal)
   tcrun029(llvm,normal)

Which looks IMHO quite nice for unregisterised build. Anyway, testcases where only LLVM fails are barton-mangler-bug which fails due to compilation timeout, jmp_tbl which fails due to -fPIC flag and GHC complains that -fPIC and -fllvm are in conflicts. It’s a wonder to me that this test does not fail with C backend as this probably neither support -fPIC correctly (unregisterised build!), joao-circular fails due to compilation timeout. I’ve run testsuite with:

make WAY="normal llvm" EXTRA_HC_OPTS="-opta=-march=armv7a" TIMEOUT=1500

if you wonder if I ever increased the timeout. I did and it solved a lot of other timeouts, but those two mentioned above was probably too long even for 25 minutes timeout value. Part of LLVM slowness here is clearly caused by the fact that I’m using debug+asserts build. LLVM project put a nice warning at the end of build compilation claiming that such build might be 10x slower than common release build. OK, it probably is. Also I’ve used -opta=-march=armv7a option to enforce invoked GNU assembler to recognize correctly ARM assembler produced by LLVM. If this option is omitted I got errors like:

=====> 2047(llvm) 449 of 2802 [0, 4, 0]
cd ./rts && '/export/home/karel/vcs/ghc-src/ghc-arm-test-build-tree/inplace/bin/ghc-stage2' -fforce-re
comp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-conf -rtsopts -opta=-march=armv7 -o 204
7 2047.hs -fllvm -package containers  >2047.comp.stderr 2>&1
Compile failed (status 256) errors were:
[1 of 1] Compiling Main             ( 2047.hs, 2047.o )
/tmp/ghc12273_0/ghc12273_0.s: Assembler messages:

/tmp/ghc12273_0/ghc12273_0.s:5947:0:
     Error: thumb conditional instruction should be in IT block -- `moveq r1,#1'

/tmp/ghc12273_0/ghc12273_0.s:5969:0:
     Error: thumb conditional instruction should be in IT block -- `movlt r2,#1'

/tmp/ghc12273_0/ghc12273_0.s:5988:0:
     Error: thumb conditional instruction should be in IT block -- `movlt r2,#1'

/tmp/ghc12273_0/ghc12273_0.s:6005:0:
     Error: thumb conditional instruction should be in IT block -- `movlt r2,#1'

/tmp/ghc12273_0/ghc12273_0.s:6070:0:
     Error: thumb conditional instruction should be in IT block -- `moveq r1,#0'

/tmp/ghc12273_0/ghc12273_0.s:6071:0:
     Error: thumb conditional instruction should be in IT block -- `movne r1,#1'

*** unexpected failure for 2047(llvm)

Porting GHC using LLVM backend

I think it might be a good idea to have some porting plan with all those milestones etc. to get better motivation and feeling about work being done. Fortunately David Terei (GHC developer who implemented GHC LLVM backend) suggested such plan to me in a private email conversation and even agreed to allow putting it somewhere on web. I’ve entered this on GHC/LLVM Trac Wiki here. Please keep in mind this is really high-level plan what to do. For “how to do” you should consult at least linked design & documentation page and other related GHC and LLVM documents.

And if you are curious about where I’m with my GHC/LLVM/ARM port then I’m at (3) point now.

LLVM 2.9 buggy for the task…

And now, finally funny things might start. I’ve little bit hacked GHC HEAD to support -fllvm even on unregisterised build and so I’m able to try to compile some as simple as possible examples. My testing code is this naive fib example:

 module Main where

fib :: Integer -> Integer
 fib 0 = 0
 fib 1 = 1
 fib n = fib (n - 1) + fib (n - 2)

main = do putStrLn (show (fib 7))
 

The way I compile this is:

 /home/karel/vcs/ghc/inplace/ghc-stage1 -v -fllvm -c fib.hs

To my delight this runs well on x86 and to my bad surprise this crashes on ARM with:

*** CodeOutput:
 *** LLVM Optimiser:
 opt /tmp/ghc4275_0/ghc4275_0.ll -o /tmp/ghc4275_0/ghc4275_0.bc -mem2reg
 *** LLVM Compiler:
 llc -O1 -relocation-model=static /tmp/ghc4275_0/ghc4275_0.bc -o /tmp/ghc4275_0/ghc4275_0.lm_s
 UNREACHABLE executed!
 Stack dump:
 0. Program arguments: llc -O1 -relocation-model=static /tmp/ghc4275_0/ghc4275_0.bc -o /tmp/ghc4275_0/ghc4275_0.lm_s
 1. Running pass 'Function Pass Manager' on module '/tmp/ghc4275_0/ghc4275_0.bc'.
 2. Running pass 'ARM Instruction Selection' on function '@sau_entry'

so I’ve started to experiment with various optimize options and such, but still LLVM ARM codegen crashes on this. Finally I’ve attempted to generate ARM code on x86 machine with manual execution of llc compiler:


$ llc -march=arm -O1 -relocation-model=static ghc5638_0.bc -o ghc5638_0.lm_s
 UNREACHABLE executed!
 0 llc 0x08fd21af PrintStackTrace(void*) + 41
 Stack dump:
 0. Program arguments: llc -march=arm -O1 -relocation-model=static ghc5638_0.bc -o ghc5638_0.lm_s
 1. Running pass 'Function Pass Manager' on module 'ghc5638_0.bc'.
 2. Running pass 'ARM Instruction Selection' on function '@sav_entry'
 Abort (core dumped)

So this looks like ARM codegen is also broken on x86 not only on ARM! And to verify this idea, I’ve run following on ARM (generating code to x86):


llc -march=x86 -O1 -relocation-model=static ghc5638_0.bc -o ghc5638_0.lm_s

and to my surprise this runs well!

Conclusion: LLVM 2.9 is not well suited for ARM code generation at least when input is LLVM assembler file produced by GHC. Anyway, this is not the end of the show! A minute after this I’ve also tested LLVM HEAD compiled and run on x86 and to my surprise it compiles into ARM assembler well, so there is still the way to go and a hope to achieve the task…

LLVM on ARM

Since I’m going to try to use LLVM for GHC registerised build I need to make sure it’s kind of the best build I can get on the target hardware. To explain more: LLVM is really a picky project from the C++ compiler point of view. If you read some part of LLVM Getting Started manual (here) you will see a lot of references for “miscompilation” and such. Fortunately LLVM even in standard distribution comes with nice testsuite which you can use to verify quality of the build. So when I installed freescale’s recommended image it provides ubuntu’s gcc 4.4.3-1 compiler (I don’t know exactly now, since I’ve changed the compiler later, see below). When I tried to compile LLVM 2.9 by this I’ve got quite a lot of tests failing. To be precise 58 tests failing, this is not so nice. I’ve looked around and found that this gcc/ubuntu version defaults to thumb code generation, so next step was to enforce ARM code generation (-marm mode option) and viola I’ve been on 18 tests failures! Much better, but still not best since I remember in the past I’ve been able to get into just 1 failure, but this was probably on the older Ubuntu! Anyway, this Ubuntu compiler was tough so I started to downgrade the compiler and to make long story short I’ve ended on:

$ gcc -v
Using built-in specs.
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.1-4ubuntu9' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv6 --with-tune=cortex-a8 --with-float=softfp --with-fpu=vfp --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)

Mark this version since it’s able to compile LLVM 2.9 into a build where only one test is failing!

$ make check
llvm[0]: Running test suite
make[1]: Entering directory `/export/home/karel/src/obj-llvm/test'
Making a new site.exp file...
Making LLVM 'lit.site.cfg' file...
Making LLVM unittest 'lit.site.cfg' file...
( ulimit -t 600 ; ulimit -d 512000 ; ulimit -m 512000 ; ulimit -v 1024000 ; \
/export/home/karel/src/llvm-2.9/utils/lit/lit.py -s -v . )
FAIL: LLVM :: CodeGen/Thumb/select.ll (1593 of 5840)
******************** TEST 'LLVM :: CodeGen/Thumb/select.ll' FAILED ********************
Script:
--
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -march=thumb | grep beq | /export/home/karel/src/obj-llvm/Release/bin/count 1
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -march=thumb | grep bgt | /export/home/karel/src/obj-llvm/Release/bin/count 1
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -march=thumb | grep blt | /export/home/karel/src/obj-llvm/Release/bin/count 3
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -march=thumb | grep ble | /export/home/karel/src/obj-llvm/Release/bin/count 1
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -march=thumb | grep bls | /export/home/karel/src/obj-llvm/Release/bin/count 1
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -march=thumb | grep bhi | /export/home/karel/src/obj-llvm/Release/bin/count 1
/export/home/karel/src/obj-llvm/Release/bin/llc < /export/home/karel/src/llvm-2.9/test/CodeGen/Thumb/select.ll -mtriple=thumb-apple-darwin | grep __ltdf2
--
Exit Code: 1
Command Output (stderr):
--
Expected 3 lines, got 1.
--

********************
Testing Time: 928.90s
********************
Failing Tests (1):
LLVM :: CodeGen/Thumb/select.ll

Expected Passes : 5249
Expected Failures : 48
Unsupported Tests : 542
Unexpected Failures: 1
make[1]: *** [check-local-lit] Error 1
make[1]: Leaving directory `/export/home/karel/src/obj-llvm/test'
make: *** [check] Error 2

and in addition this is just a Thumb code generator error and I’m certainly not going to target Thumb! i.e. rather ARM or Thumb2 of course…

GHC 7.0.3 unregisterised build testsuite results

It would be kind of unlucky to build GHC 7.0.3 build and after all those hours of building forgot to run testsuite on it. Since I’ve omitted profiling libraries from build I also needed to omit profiling way from testsuite. Hence the testsuite was run with

cd testsuite/tests/ghc-regress
make WAY="normal optc"

I hope this pretty much cover everything I can test on this simplified unregisterised build. If not, just complain in comment below. Thanks!
It took another 7 hours before I got full results (GHC unregisterised build is using C code backend which means compilation of anything takes about 2x more times than with native code backend. Please keep this in mind, so we do have not so powerful computer and yet we use the worst GHC combination on it: unregisterised build providing slow runtime libs + slow C code backend!)
Anyway, the results are here:

OVERALL SUMMARY for test run started at Sun Jun 12 03:09:10 CST 2011
    2694 total tests, which gave rise to
    6093 test cases, of which
       0 caused framework failures
    2384 were skipped

    3529 expected passes
     116 expected failures
       0 unexpected passes
      64 unexpected failures

Unexpected failures:
   2014(normal)
   2228(normal)
   2636(normal)
   3171(normal)
   3424(normal)
   3586(normal)
   3890(normal)
   4850(normal)
   DoParamM(normal)
   T2615(normal,optc)
   T3016(normal)
   T3064(normal)
   T3330a(normal)
   T3391(normal,optc)
   T3736(normal)
   T3738(normal)
   T3807(normal)
   T3953(normal)
   T4801(normal)
   ann01(normal,optc)
   annfail03(normal)
   annfail04(normal)
   annfail05(normal)
   annfail06(normal)
   annfail07(normal)
   annfail08(normal)
   annfail09(normal)
   annfail10(normal)
   annfail12(normal)
   annrun01(normal,optc)
   apirecomp001(normal)
   barton-mangler-bug(normal)
   cabal04(normal)
   ghc-e001(normal)
   ghc-e002(normal)
   ghc-e003(normal)
   ghc-e004(normal)
   ghc-e005(normal)
   ghci024(normal)
   ghci037(normal)
   hpc_ghc_ghci(normal)
   hpc_markup_multi_001(normal)
   hpc_markup_multi_002(normal)
   hpc_markup_multi_003(normal)
   joao-circular(normal,optc)
   layout007(normal)
   openFile008(normal,optc)
   qq001(normal)
   qq002(normal)
   qq003(normal)
   qq004(normal)
   qq007(normal,optc)
   qq008(normal,optc)
   recomp006(normal)
   space_leak_001(normal,optc)

It takes quite some time…

Indeed, it takes quite some time to build unregisterised build of GHC on ARM machine. But let’s start from the beginning. I’ve been always quite ignorant to major computer architecture x86. To be honest this is probably caused by my laziness to read more about it since my university studies where I’ve been hit by all those segments, chaotic memory model and such. I must admit that AMD did really good job on AMD64, finally flat address space, 64 bit etc, but yet the platform is so boring, running everywhere… 🙂

So what’s more interesting to me are all those other platforms: PowerPC, MIPS, ARM, IA64, etc, etc. Generally speaking I quite like load-store CPU model and since last year I’ve more focused my attention to IA64 and ARM. IA64 since this is quite interesting from the assembler point of view and to ARM, since this is x86 world conqueror and my bet is that’s also future architectural winner. So ARM. I’m its user for more than five years … running it in my mobile phone, but I’m still more and more curious to learn a little bit more about this architecture and somehow connect this to my still to be performed Haskell learning — as Haskell is another project which makes me wonder what will be its outcome. Quite interesting language indeed. So Haskell and ARM, that’s it. From Haskell I’ve chosen GHC as it seems to be most spread around the community, although to be honest this is not the most easy choice if I consider ARM architecture. Anyway, the project evolves quickly and man is even able to perform unregisterised (read: build which produces not so fast binaries) build of GHC on ARM machine. I did this from time to time on GCC’s compile farm EfikaMX hosts, but still was considering to buy my own machine for better GHC/ARM hacking which I hope this blog will be about.
Anyway, big thanks to Freescale and big thanks to my friend working for Freescale who lend me nice i.MX53 Quick Start Board. I’m now able to start actual “local” tests and even hacks as time permits.

I’ve installed Freescale recommended Ubuntu/Debian based distro, installed provided GHC 6.12.1 and was able to build unregisterised build of GHC 7.0.3. I’ve used following mk/build.mk file to perform the build just to save the time I’ve disabled profiling libs which I don’t need. I need GHC 7.0.3 just to be better prepared to build GHC HEAD.

GhcUnregisterised=YES
GhcWithNativeCodeGen=NO
SplitObjs=NO
GhcLibWays=v

And finally I got back to the post title, funny thing is, it took 16 hours 30 minutes to perform this build! Interesting is that what I observed from top, it looks as majority of time was spent in C compiler which makes a hope that perhaps future GHC with either LLVM or NCG ARM support might run quite faster… Err, if you wonder, I don’t hold build tree on (micro)SD card nor SATA drive nor USB flash connected to the i.MX53 board. I use NFS to mount some space from my main Solaris workstation to the board. This certainly causes some slowness, but not so big, believe me and I rather trust ZFS on mirrored drives than any consumer flash storage on the board.