Unregisterised GHC HEAD build for ARM64 (ARMv8/AArch64) platform.

I’ve thought I’ll have some ARM/GHC fun again after a while and thought to give a try to ARM64 port. I mean AArch64 mode of ARMv8 platform as the ARM64 of course.
GHC Wiki contains nice page describing porting GHC using LLVM backend. I’ll point here right now that I’ve decided to jump directly to (4) and attempt to build GHC compiler using LLVM backend. As you probably know, the first thing you need to do while porting GHC to other/unsupported platform is so called unregisterised build of GHC for the platform. Thanks to a lot of improvements in GHC 7.6/HEAD in cross-compilation it is even possible to build GHC unregisterised cross-compiler. That’s exactly what I did. I’ve not expected this to work smoothly, but at the end was quite surprised how easily I got stage1 cross-compiler for ARM64. Well, there were few issues, but solvable as you will see.
If you are going to follow my path on this just to have some fun (not that many people compile Haskell for ARM64 these days. :-), let’s prepare some Ubuntu 13.10 amd64 box for it. I used virtualbox image for that. Into it, you can install Ubuntu 13.10 core running on ARM64 emulator. Concretely speaking, you will have Ubuntu 13.10 core running on ARM Foundation Model, which is free (as a beer) tool to let you run system(s) on ARM64 platform. Ubuntu is really helpful here, just follow their wiki page and you should have it running quickly.
Also Ubuntu 13.10 is really nice platform for ARM64 cross-compiling since it offers ARM64 GNU C/C++ cross-compilers as its native package. So everything you need to do is just

$ sudo apt-get install gcc-aarch64-linux-gnu

After this you should have GNU C for ARM64 cross-compiler ready:

$ aarch64-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=aarch64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/aarch64-linux-gnu/4.8/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.8.1-10ubuntu7' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,d,fortran --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/aarch64-linux-gnu/include/c++/4.8.1 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libssp --disable-libmudflap --disable-libitm --disable-libsanitizer --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-arm64-cross/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-arm64-cross --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-arm64-cross --with-arch-directory=arm64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libgcj --enable-multiarch --disable-werror --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --includedir=/usr/aarch64-linux-gnu/include
Thread model: posix
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7)

You will also need to build LLVM HEAD which contains revision r199265. This is a fix for LLVM bug observed by me while building GHC cross-compiler for ARM64. I’ve reported the issue to llvm-dev mailing list and Tim Northover was so kind and fix that really quickly.
So, obtain LLVM HEAD and compile on your Ubuntu box. Now, let’s assume you do have GNU C aarch64 cross-compiler, you do have Ubuntu 13.10 core running on ARM Foundation Model and you also do have LLVM HEAD compiled. Is that enough for building GHC cross-compiler? Not yet unfortunately! As long as GHC dev folks does not fix #8864 issue, you will need to update GHC’s libffi manually. I’ve tested 3.0.13 and it works fine. The reason is obvious, GHC bundled libffi 3.0.11’s configure script does not recognize aarch64 platform. Anyway, this is quite easy to fix. Just delete libffi-3.0.11.tar.gz in GHC’s libffi-tarballs subdirectory and put libffi-3.0.13.tar.gz there.
Now, you are nearly ready. Nearly, because GHC as a part of building process also builds some of its libraries and here terminfo library will make us a problem. The problem is that the library needs to have libncurses installed on the target platform as a development package. That means not only shared library libncurses needs to be installed but also its C header files, e.g. ncurses.h. Now, the problem is that GNU C cross-compiler comes only with really just core libraries for cross-compilation, i.e. libc. Nobody counted with the fact that you will need libncurses for cross-compiling. The issue is solvable by using different cross-compiler sysroot. Explanation: GNU C cross-compiler looks into a certain path to find its target libraries and head files for compilation. This certain path is known as sysroot. Fortunately for us, Ubuntu’s distributed aarch64 cross-compiler supports an option for setting the sysroot to some user defined directory. Now, you may ask where you will get all the required target libraries/header files besides ncurses which you need for terminfo library? Well, you already do have them in a form of ubuntu-core-13.10-core-arm64.tar.gz file which you needed to download for installation of Ubuntu core on ARM64/Foundation Model. Just unpack the file somewhere and let’s call its parent directory for example sysroot. If you search a little bit directory structure resulting from unpacking this tarbal you will find out quickly that even there there is no libncurses development package installed. There is however one hackish way how to get libncurses-dev there. You already do have Ubuntu 13.10/ARM64 running on ARM Foundation Model right? So start it, give it a time to boot and then install ssh there since you will need it to transfer files between your Ubuntu host and ARM64 target — well, if you don’t use for example NFS for the same purpose. And now, Debian/Ubuntu’s apt-get supports also an option to download just the package without actually installing it. You will need to download those two packages:

# apt-get download libncurses5-dev
# apt-get download libtinfo-dev

The libtinfo-dev package is a dependency of libncurses5-dev. Now, move those two packages to your Ubuntu host. If you have not done that before, please install mc package there to obtain nice midnight commander application. It’s very usable for example for browsing content of packages. Just start it by typing “mc” in your shell, navigate to the directory which holds two packages above and press enter on one of the package files. Midnight command will allow you to see content of the selected package and you will also be able to copy its files to your sysroot directory. Do this with both packages.
The last step before configuring GHC cross-compiler is creation of custom C compiler script which will invoke aarch64 C cross-compiler with a required –sysroot parameter. For my application here I’ve named it aarch64-linux-gnu-gccsysroot with following content:

#!/bin/bash
/usr/bin/aarch64-linux-gnu-gcc --sysroot=/home/karel/arm64/sysroot $@

Just modify the sysroot parameter value. I guess you don’t have your ARM64 sysroot located in the same directory like me. 🙂 Of course make this file executable too!

And, now, time comes to configure GHC finally:

$ perl boot
$ ./configure --target=aarch64-linux-gnu --with-gcc=/home/karel/bin/aarch64-linux-gnu-gccsysroot --enable-unregisterised --with-llc=/export/home/karel/vcs/llvm-head/Release+Asserts/bin/llc --with-opt=/export/home/karel/vcs/llvm-head/Release+Asserts/bin/opt

Again, please modify your paths to the cross-compiler script, LLVM’s llc and LLVM’s opt to your desired host values.
For compilation, you will need to modify build.mk makefile so please do:

$ cd mk
$ cp build.mk.sample build.mk

and edit build.mk file in your preferred editor. You will need to uncomment quick-cross build flavour line to enable cross-compiler build:

# Fast build configured for a cross compiler
BuildFlavour = quick-cross

Now, if you type

$ make

then after some time you will get your ghc-stage1 cross-compiler for ARM64 platform! Have fun with it! At least you can use GHC’s HelloWorld example to test it:

$ cd bindisttest/
$ ../inplace/bin/ghc-stage1 --make HelloWorld.lhs
[1 of 1] Compiling Main ( HelloWorld.lhs, HelloWorld.o )
Linking HelloWorld ...
$ file HelloWorld
HelloWorld: ELF 64-bit LSB executable, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 3.7.0, BuildID[sha1]=0xaafd52827fa12be43b564b11b96d9131d8f6a498, not stripped

Testing LLVM 3.1 on Ubuntu 11.10/12.04 ARM

Few weeks ago LLVM project released LLVM 3.1 release and I decided to give it a try on my ARM boards. I’ve tested on Ubuntu 11.10 on Freescale donated i.MX53 Quick Start Board and on Ubuntu 12.04 on Pandaboard. The results are pretty interesting as is shown in the table below. The table lists number of unexpected failures of basic LLVM testsuite when LLVM is compiled with specific optimization option and with specific GNU C compiler (on appropriate Ubuntu). Please note that Ubuntu 11.10 is last soft-float ABI Ubuntu and it provides GNU C 4.6.1:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.6.1/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.1-9ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) 

On the other hand, Ubuntu 12.04 is the first which provides hard-float ABI and it comes with GNU C 4.6.3:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 

And now, finally those interesting results. The numbers are clickable and link to the testsuite log file which you can see for your reference.

-O0 -O1 -O2 default
GCC 4.6.1 (Ubuntu 11.10) 6 6 54 54
GCC 4.6.3 (Ubuntu 12.04) 6 6 6 6

So it looks like GCC 4.6.3 did a very nice job here. Honestly speaking I’m not sure if this is GCC or ABI switch from soft-float to hard-float and I’m not able to verify it since Ubuntu 12.04 is only hard-float ABI but my bet is on GCC here.

Testing LLVM 3.0 on Ubuntu 11.10 ARM

LLVM 3.0 was released some time ago and I’ve thought it’ll be good to give it a try on stock Ubuntu 11.10 ARM. That means I’ve tested LLVM 3.0 with Ubuntu provided GNU C++ 4.6.1 and Clang 2.9. GNU C++ configuration looks:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.6.1/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.1-9ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) 

I’ve compiled LLVM 3.0 by those two compilers with default configuration and then with optimization flags set to -O0, -O1, -O2 and -O3. The table below lists sum of unexpected failures and unexpected passes with appropriate links to the tests output files. What’s surprising to me is that Clang on ARM even in version 2.9 performs so well. I know, Clang depends on LLVM and LLVM by default checks for Clang as a preferred compiler and both projects are mainly developed by Apple’s engineers, but still this is on native ARM/Linux system, so nothing like cross-compilation from MacOSX/x64 to iOS/ARM!

-O0 -O1 -O2 -O3 default
GCC 4.6.1 6 6 51 51 51
Clang 2.9 1147 8 12 12 12

So as you can see GCC still wins on the lowest number of failures while using -O0/-O1, but Clang performs very well on -O2/-O3/default optimization levels. Please note that the excess number of failures on -O0 with Clang is probably caused by the fact that LLVM code requires some optimization to be performed on it to behaves correctly and it looks like Clang does not perform such optimization while GCC does when compiling with -O0.
Also what’s kind of surprise to me is to see Clang compilation performance. I’ve not marked hard numbers since this was not the task for this testing, but I’ve been surprised to see what GCC took around 700 minutes, Clang did in about 400 minutes. I’m talking about default compilation on i.MX53 Quick Start Board here.

Now the questions are: how Clang compiled LLVM affects GHC tests and GHC compilation speed? (i.e. I may use -O3 compiled LLVM for this). Also how would the numbers look like while testing latest greatest Linaro GCC and Clang 3.0? Perhaps material for another post or two…

ARMv8: few details.

It looks like few details about ARMv8 are starting to appear on the network. The root of this is presentation and videos about ARMv8 made by Richard Grisenthwaite and which are now linked from the ARM ISAs page. Please just scroll down and select ARMv8 Resources tab.
Anyway, I’d like to list a few details also here especially focused on details which affect user-land application writer. Small table should do the job I hope. Please note that with ARMv8, ARM started to name various ISAs as A32, which is classical ARM, T32, which is Thumb2 and A64 which is new ISA for ARM 64bit computing. So far ARMv7’s and ARMv8’s A32 and T32 ISAs looks similar.

ARMv7 ARMv8
32 bit ISAs A32, T32 A32, T32
64 bit ISAs A64
Number of GPs 13* 13* (A32, T32), 31** (A64)
ISNs encoding length (bits) 16-32 (T32), 32 (A32) 16-32 (T32), 32 (A32), 32 (A64)
NEON 64 bit regs 32 32
NEON 128 bit regs 16 32
Crypto ISNs (using NEON regs) AES, SHA-1, SHA-256

*: I count only R0-R12
**: PC and SP are no longer considered GPs

So as you can see, we get nearly twice the general purpose registers, twice the number of 128 bit registers in NEON and we also get some additional instructions to support some common cryptography operations. Besides this A64 also provides new load-acquire/store-release instructions to better support ARM weak-memory model in higher level programming languages.

Well, so from the point of view of GHC this might indeed be fun. The only pity is that we still depend on LLVM to come with A64 support first and then we’ll be able to use it in GHC.

LLVM patch is merged for inclusion in LLVM 3.0 release

Good news for those shy to patch LLVM source code and build from scratch. 🙂 The patch which adds GHC calling convention for ARM platform is merged for inclusion in LLVM 3.0 release. This is mainly due to David Terei persistence and constant push on Apple engineering to get it in since I’ve submitted the patch for inclusion just last day and was not able to answer all the questions arising from it. David not only replied with all needed information, but also kept emailing LLVM 3.0 release engineer and asking for inclusion. Thanks David!

Current status: merged into GHC HEAD!

I’ve thought it might be a good idea to post some information about how is it going with the project.
So yes, thanks to help provided by David Terei and Manuel M T Chakravarty our project results were merged into GHC HEAD. Last commit (so far!) went in during August 20/21 2011. If you do have some ARM system, then please give it a try! You will need your own build of LLVM, which is described here. If you are curious and would just like to see tests results, then look here:

OVERALL SUMMARY for test run started at Tue Aug 23 22:59:36 CEST 2011
    2927 total tests, which gave rise to
    7123 test cases, of which
       1 caused framework failures
    2646 were skipped

    4260 expected passes
     148 expected failures
       0 unexpected passes
      68 unexpected failures

Unexpected failures:
   ../../libraries/random/tests  rangeTest [bad exit code] (normal,threaded1,threaded2,optllvm)
   annotations/should_run        annrun01 [exit code non-0] (normal,threaded1,threaded2,optllvm)
   cabal                         ghcpkg05 [bad stderr] (normal)
   cabal/cabal04                 cabal04 [bad exit code] (normal)
   codeGen/should_compile        jmp_tbl [exit code non-0] (normal)
   codeGen/should_compile        massive_array [exit code non-0] (normal)
   dph/dotp                      dph-dotp-fast [exit code non-0] (normal,threaded1,threaded2)
   dph/dotp                      dph-dotp-opt [exit code non-0] (normal,threaded1,threaded2)
   dph/primespj                  dph-primespj-fast [exit code non-0] (normal,threaded1,threaded2)
   dph/primespj                  dph-primespj-opt [exit code non-0] (normal,threaded1,threaded2)
   dph/quickhull                 dph-quickhull-fast [exit code non-0] (normal,threaded1,threaded2)
   dph/quickhull                 dph-quickhull-opt [exit code non-0] (normal,threaded1,threaded2)
   dph/sumnats                   dph-sumnats [exit code non-0] (normal,threaded1,threaded2)
   dph/words                     dph-words-fast [exit code non-0] (normal)
   dph/words                     dph-words-opt [exit code non-0] (normal)
   driver                        5313 [exit code non-0] (normal,threaded1,threaded2,optllvm)
   driver/recomp009              recomp009 [bad exit code] (normal)
   dynlibs                       T3807 [bad exit code] (normal)
   ghc-api/T4891                 T4891 [bad exit code] (normal)
   ghc-api/apirecomp001          apirecomp001 [bad exit code] (normal)
   ghci/linking                  ghcilink001 [bad exit code] (normal)
   ghci/linking                  ghcilink002 [bad exit code] (normal)
   ghci/linking                  ghcilink003 [bad exit code] (normal)
   ghci/linking                  ghcilink004 [bad exit code] (normal)
   ghci/linking                  ghcilink005 [bad exit code] (normal)
   ghci/linking                  ghcilink006 [bad exit code] (normal)
   ghci/scripts                  ghci024 [bad exit code] (normal)
   perf/compiler                 T1969 [stat not good enough] (normal)
   perf/compiler                 T3064 [stat not good enough] (normal)
   perf/compiler                 T5030 [stat not good enough] (normal)
   quasiquotation/qq007          qq007 [exit code non-0] (normal)
   quasiquotation/qq008          qq008 [exit code non-0] (normal)
   rts                           T2615 [exit code non-0] (normal,threaded1,threaded2,optllvm)
   rts                           derefnull [bad exit code] (threaded2)
   rts                           testblockalloc [bad exit code] (normal,threaded1)
   safeHaskell/flags             Flags02 [exit code non-0] (normal)
   simplCore/should_compile      T3016 [exit code non-0] (normal)
   typecheck/should_run          T4809 [exit code non-0] (normal,threaded1,threaded2,optllvm)

Majority of the failures are caused by missing GHCi support, which is also my next item on the project’s TODO list.

Nofib benchmarking

I’ve decided to do some nofib benchmarking on trees I do have here. Big thanks to Simon Marlow who helped me with fixing bugs in my benchmarking process (initially I’ve been comparing builds with different optimize options and getting strange results). I’ve compared results of unregisterised build when using -fvia-C and when using -fllvm together with registerised builds, one without tables next to code functionality enabled and another with it enabled. Results are summarized in table below. I’m using via-C build as a baseline.

unregisterised viaC unregisterised LLVM registerised LLVM registerised LLVM with tables next to code enabled
binary sizes +0.1% -31.3% -33.3%
allocations -0.0% -0.9% -0.9%
run time -9.9% -47.5% -51.4%
gc time -0.3% -1.6% -2.5%

IMHO -51.4% for runtime on registerised LLVM build with tables next to code enabled in comparison with via-C unregisterised build (which is currently the only available build on ARM/Linux!) is a nice outcome of the project. Click here to see whole results.