Just to write some status, I’ve dedicated received Freescale i.MX53 Quick Start Board to serve as a GHC builder machine. It’s already running for some time. Installed OS is Ubuntu 11.10 and I’ve installed on it Ubuntu’s 12.04 LLVM 3.0 packages. The builder already caught one issue during the end of February which I’m still trying to solve in my currently very limited free time… If you’d like to see results of the builder, then have a look into cvs-ghc mailing list and search for kgardas-linux-arm-head string in the emails subjects.
Tag Archives: GHC
ARMv8: few details.
It looks like few details about ARMv8 are starting to appear on the network. The root of this is presentation and videos about ARMv8 made by Richard Grisenthwaite and which are now linked from the ARM ISAs page. Please just scroll down and select ARMv8 Resources tab.
Anyway, I’d like to list a few details also here especially focused on details which affect user-land application writer. Small table should do the job I hope. Please note that with ARMv8, ARM started to name various ISAs as A32, which is classical ARM, T32, which is Thumb2 and A64 which is new ISA for ARM 64bit computing. So far ARMv7′s and ARMv8′s A32 and T32 ISAs looks similar.
| ARMv7 | ARMv8 | |
|---|---|---|
| 32 bit ISAs | A32, T32 | A32, T32 |
| 64 bit ISAs | — | A64 |
| Number of GPs | 13* | 13* (A32, T32), 31** (A64) |
| ISNs encoding length (bits) | 16-32 (T32), 32 (A32) | 16-32 (T32), 32 (A32), 32 (A64) |
| NEON 64 bit regs | 32 | 32 |
| NEON 128 bit regs | 16 | 32 |
| Crypto ISNs (using NEON regs) | — | AES, SHA-1, SHA-256 |
*: I count only R0-R12
**: PC and SP are no longer considered GPs
So as you can see, we get nearly twice the general purpose registers, twice the number of 128 bit registers in NEON and we also get some additional instructions to support some common cryptography operations. Besides this A64 also provides new load-acquire/store-release instructions to better support ARM weak-memory model in higher level programming languages.
Well, so from the point of view of GHC this might indeed be fun. The only pity is that we still depend on LLVM to come with A64 support first and then we’ll be able to use it in GHC.
LLVM patch is merged for inclusion in LLVM 3.0 release
Good news for those shy to patch LLVM source code and build from scratch.
The patch which adds GHC calling convention for ARM platform is merged for inclusion in LLVM 3.0 release. This is mainly due to David Terei persistence and constant push on Apple engineering to get it in since I’ve submitted the patch for inclusion just last day and was not able to answer all the questions arising from it. David not only replied with all needed information, but also kept emailing LLVM 3.0 release engineer and asking for inclusion. Thanks David!
Current status: merged into GHC HEAD!
I’ve thought it might be a good idea to post some information about how is it going with the project.
So yes, thanks to help provided by David Terei and Manuel M T Chakravarty our project results were merged into GHC HEAD. Last commit (so far!) went in during August 20/21 2011. If you do have some ARM system, then please give it a try! You will need your own build of LLVM, which is described here. If you are curious and would just like to see tests results, then look here:
OVERALL SUMMARY for test run started at Tue Aug 23 22:59:36 CEST 2011
2927 total tests, which gave rise to
7123 test cases, of which
1 caused framework failures
2646 were skipped
4260 expected passes
148 expected failures
0 unexpected passes
68 unexpected failures
Unexpected failures:
../../libraries/random/tests rangeTest [bad exit code] (normal,threaded1,threaded2,optllvm)
annotations/should_run annrun01 [exit code non-0] (normal,threaded1,threaded2,optllvm)
cabal ghcpkg05 [bad stderr] (normal)
cabal/cabal04 cabal04 [bad exit code] (normal)
codeGen/should_compile jmp_tbl [exit code non-0] (normal)
codeGen/should_compile massive_array [exit code non-0] (normal)
dph/dotp dph-dotp-fast [exit code non-0] (normal,threaded1,threaded2)
dph/dotp dph-dotp-opt [exit code non-0] (normal,threaded1,threaded2)
dph/primespj dph-primespj-fast [exit code non-0] (normal,threaded1,threaded2)
dph/primespj dph-primespj-opt [exit code non-0] (normal,threaded1,threaded2)
dph/quickhull dph-quickhull-fast [exit code non-0] (normal,threaded1,threaded2)
dph/quickhull dph-quickhull-opt [exit code non-0] (normal,threaded1,threaded2)
dph/sumnats dph-sumnats [exit code non-0] (normal,threaded1,threaded2)
dph/words dph-words-fast [exit code non-0] (normal)
dph/words dph-words-opt [exit code non-0] (normal)
driver 5313 [exit code non-0] (normal,threaded1,threaded2,optllvm)
driver/recomp009 recomp009 [bad exit code] (normal)
dynlibs T3807 [bad exit code] (normal)
ghc-api/T4891 T4891 [bad exit code] (normal)
ghc-api/apirecomp001 apirecomp001 [bad exit code] (normal)
ghci/linking ghcilink001 [bad exit code] (normal)
ghci/linking ghcilink002 [bad exit code] (normal)
ghci/linking ghcilink003 [bad exit code] (normal)
ghci/linking ghcilink004 [bad exit code] (normal)
ghci/linking ghcilink005 [bad exit code] (normal)
ghci/linking ghcilink006 [bad exit code] (normal)
ghci/scripts ghci024 [bad exit code] (normal)
perf/compiler T1969 [stat not good enough] (normal)
perf/compiler T3064 [stat not good enough] (normal)
perf/compiler T5030 [stat not good enough] (normal)
quasiquotation/qq007 qq007 [exit code non-0] (normal)
quasiquotation/qq008 qq008 [exit code non-0] (normal)
rts T2615 [exit code non-0] (normal,threaded1,threaded2,optllvm)
rts derefnull [bad exit code] (threaded2)
rts testblockalloc [bad exit code] (normal,threaded1)
safeHaskell/flags Flags02 [exit code non-0] (normal)
simplCore/should_compile T3016 [exit code non-0] (normal)
typecheck/should_run T4809 [exit code non-0] (normal,threaded1,threaded2,optllvm)
Majority of the failures are caused by missing GHCi support, which is also my next item on the project’s TODO list.
Nofib benchmarking
I’ve decided to do some nofib benchmarking on trees I do have here. Big thanks to Simon Marlow who helped me with fixing bugs in my benchmarking process (initially I’ve been comparing builds with different optimize options and getting strange results). I’ve compared results of unregisterised build when using -fvia-C and when using -fllvm together with registerised builds, one without tables next to code functionality enabled and another with it enabled. Results are summarized in table below. I’m using via-C build as a baseline.
| unregisterised viaC | unregisterised LLVM | registerised LLVM | registerised LLVM with tables next to code enabled | |
|---|---|---|---|---|
| binary sizes | — | +0.1% | -31.3% | -33.3% |
| allocations | — | -0.0% | -0.9% | -0.9% |
| run time | — | -9.9% | -47.5% | -51.4% |
| gc time | — | -0.3% | -1.6% | -2.5% |
IMHO -51.4% for runtime on registerised LLVM build with tables next to code enabled in comparison with via-C unregisterised build (which is currently the only available build on ARM/Linux!) is a nice outcome of the project. Click here to see whole results.
Fun with ARM barriers and GHC RTS
While reviewing part of ARM support code in RTS I’ve found out that there are some barriers which are not implemented for ARM yet. This leads me to investigation if they are really needed and I’ve found nice little rts/testwsdeque testcase which fails. The testcase tests WSDeque which is basically lock-less deque implementation for GHC RTS. So as the test fails something is badly wrong with this.
I’ve decided to implement missing barriers and found very useful reference to The JSR-133 Cookbook for Compiler Writers in include/stg/SMP.h header file. The same header file where all the barriers are implemented. The document contains nice table listing various kinds of barriers together with instructions used to implement them on various CPU architectures. ARMv7 was among them. Doug Lea did really nice work in writing it. The isn recommended to use was dmb and I already know this isn from various ARM documentation. ARM in fact provides two isns for implementing barriers: dmb and dsb. I’ve not been 100% sure which to use and so Doug’s document was really useful for me.
Anyway, even after this, rts/testwsdeque still failed. Let’s start searching again. This time I’ve found really nice although quite complex Barrier Litmus Tests and Cookbook which on a few examples recommends some best practice when and how to use barrier instruction in solving common programming problems (spin-locks etc.). I learn that although LDREX/STREXT isns provides kind of synchronization primitives they do not enforce any barrier and so I’ve also added dmb isn into GHC’s xchg and cas functions.
Let’s rerun the test and it still fails sometime. I’ve used simple script to run it in the loop and see if it fails:
while (true); do ./testwsdeque; echo -n .; done
Example of wrong output is:
........internal error: FAIL: 6706788 3 13
(GHC version 7.1.20110701 for arm_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted
................internal error: FAIL: 5463172 1 12
(GHC version 7.1.20110701 for arm_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted
...................internal error: FAIL: 6496304 1 11
(GHC version 7.1.20110701 for arm_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted
.........internal error: FAIL: 6192568 3 13
(GHC version 7.1.20110701 for arm_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted
So testcase passes several times for one failure, but still fails.
What now?
I’ve looked into testcase, printed it. I’ve also found appropriate rts/WSDeque.[c|h] sources and printed them too.
Side note: I don’t have several monitors setup here, I’m just using single 23″ LG W2220P in portrait mode but the viewing surface is still small for such manual “debugging”. So I usually print all the relevant code, lay it either on desk or even on floor and then read the code step by step and think about it.
So I ended with printed relevant source files and half of hour later I’ve been more and more convinced that my ARMv7 specific barriers and using of barrier in xchg/cas functions all is right and that the issue really might be in RTS work-stealing deque implementation. I have some feeling leading to it… Well, you know, GHC team is usually working on x86/x64 boxes. Some of the team members are on MacOSX/x64 and some of them are even using Niagara, i.e. former Sun’s UltraSPARC Tx processors and Solaris. Both hardware platforms are quite nice when it comes to load/store reordering. On the other hand I’ve found this nice note on a blog post dealing with barriers in Linux kernel on ARM:
Since the supported architecture with the weakest memory model (effectively the one that permits the most reordering) was the DEC Alpha, this was used as the reference architecture. No other architectures have since surpassed the DEC Alpha in this regard, but ARMv7-A comes pretty close.
And my idea which comes from this was simple, if Alpha was the weakest and if ARMv7 is pretty close, then perhaps ARMv7 is more weak (ie. permit more memory access reordering) than usually tested x86/x64 or UltraSPARC and then some bug might really slipped into RTS’ deque implementation. Deque code itself was written in 2008-2009. I was thinking that it was really a low chance that it was tested on Alpha even if some of GHC still contains Alpha code (which looks quite dead now (both CPU and GHC support for it I mean)). So the idea of a bug in deque implementation looked more and more real and I’ve been quite curious if I find it or not. Well, some time later I got to it!
pushWSDeque function which pushes specified data to the deque for consumption by stealing threads contained following code:
rtsBool
pushWSDeque (WSDeque* q, void * elem)
{
StgWord t;
StgWord b;
StgWord sz = q->moduloSize;
[...]
b = q->bottom;
[...]
q->elements[b & sz] = elem;
q->bottom = b + 1;
[...]
I’ve deleted the code which is not important for the bug explanation. The bug happens on those two lines, or I shall rather tell between them!
q->elements[b & sz] = elem;
q->bottom = b + 1;
What you may see is assignment of elem into deque and then incrementing deque’s bottom variable to let stealing threads know, there are some new data in deque. As I learn during the bug hunting, I cannot be sure at all that the sequence will look like this. In fact it might be very well reversed by modern CPUs to:
q->bottom = b + 1;
q->elements[b & sz] = elem;
which if this happen would mean that: if (1) there are no data in deque and if (2) we do have some eager stealing thread waiting for new data (or polling for new data) and if (3) the sequence is reordered like above then just between the execution of the two lines stealing thread might got to its run and think hey, there are some data in the deque, let’s consume it and then it’ll got some random data since intended data are not yet assigned to the deque. And that’s all since the second line of the code above has not been executed yet. So stealing thread gets something which it should not.
Solution is quite simple, modify the code sequence to:
q->elements[b & sz] = elem;
write_barrier();
q->bottom = b + 1;
The write_barrier(); which is effectively translated to dmb isn on ARMv7 enforces actual assignment of elem to really happen (and not only this, but all other pending assignments/writes before the isn execution) before the CPU comes to execute code incrementing deque’s bottom variable.
Does it solved the issue? I hope so, the while loop of testwsdeque testing was running several hundreds times without any failure. I’m running nearly full GHC testsuite now to see the results, but this will take another few hours anyway, so I’ll need to wait and see if I broke something or not. But anyway there is at least some chance that this was really the bugfix. And if so, then I’m going to push the patch upstream of course…
Conclusion: ARM is nicely RISCy and I learn some new stuff about barriers. I’ve known about them, but I’ve not had a chance to touch the stuff till now although I’m already quite some time from the college…
Erm, someone was faster than me making GHC/ARM registerised…
Erm, someone was faster than me making GHC/ARM port registerised! Nice GHC fellow Stephen Blackheath sent an email asking for review of his patches into LLVM-dev mailing list and it looks like he really got GHC/ARM registerised port up and running. You can see his conversation with David Terei starting here.
I’ve been contacted by Stephen, he was so nice to provide me with all his patches and guidance what to do with them. I’ve fixed few issues in them which were needed for more modern GHC HEAD and also for ARMv7 hardware and I got his work up and running here too! Congratulations to you Stephen and thanks a lot for your nice work!
If you are curious, I’ve used following mk/build.mk:
SRC_HC_OPTS = -H64m -optc=-marm -opta=-march=armv7a -opta=-mfpu=vfpv3 GhcLibWays = v GhcStage2HcOpts = -opta=-march=armv7a -opta=-mfpu=vfpv3 GhcLibHcOpts = -opta=-march=armv7a -opta=-mfpu=vfpv3 TABLES_NEXT_TO_CODE = NO SplitObjs = NO HADDOCK_DOCS = NO BUILD_DOCBOOK_HTML = NO BUILD_DOCBOOK_PS = NO BUILD_DOCBOOK_PDF = NO
and I was able to compile GHC registerised then. I've also run testsuite to see what's the port quality and the results are here:
OVERALL SUMMARY for test run started at Sun Jul 3 09:43:57 CEST 2011
2828 total tests, which gave rise to
7775 test cases, of which
3 caused framework failures
5212 were skipped
2377 expected passes
55 expected failures
0 unexpected passes
129 unexpected failures
Unexpected failures:
../../../libraries/hpc/tests/ghc_ghci hpc_ghc_ghci [bad exit code] (normal)
annotations/should_compile ann01 [exit code non-0] (normal)
annotations/should_fail annfail12 [stderr mismatch] (normal)
annotations/should_run annrun01 [exit code non-0] (normal)
cabal ghcpkg05 [bad stderr] (normal)
cabal/cabal04 cabal04 [bad exit code] (normal)
codeGen/should_compile jmp_tbl [exit code non-0] (normal)
codeGen/should_compile massive_array [exit code non-0] (normal)
codeGen/should_run cgrun044 [exit code non-0] (normal)
dph/diophantine dph-diophantine-opt [exit code non-0] (normal)
dph/dotp dph-dotp-fast [exit code non-0] (normal)
dph/dotp dph-dotp-opt [exit code non-0] (normal)
dph/primespj dph-primespj-fast [exit code non-0] (normal)
dph/primespj dph-primespj-opt [exit code non-0] (normal)
dph/quickhull dph-quickhull-fast [exit code non-0] (normal)
dph/quickhull dph-quickhull-opt [exit code non-0] (normal)
dph/sumnats dph-sumnats [exit code non-0] (normal)
dph/words dph-words-fast [exit code non-0] (normal)
dph/words dph-words-opt [exit code non-0] (normal)
driver T706 [bad exit code] (normal)
dynlibs T3807 [bad exit code] (normal)
ghc-api/T4891 T4891 [bad exit code] (normal)
ghc-api/apirecomp001 apirecomp001 [bad exit code] (normal)
ghc-e/should_run 2228 [bad exit code] (normal)
ghc-e/should_run 2636 [bad stderr] (normal)
ghc-e/should_run 3890 [bad stderr] (normal)
ghc-e/should_run ghc-e001 [bad exit code] (normal)
ghc-e/should_run ghc-e002 [bad exit code] (normal)
ghc-e/should_run ghc-e003 [bad exit code] (normal)
ghc-e/should_run ghc-e004 [bad stderr] (normal)
ghc-e/should_run ghc-e005 [bad stderr] (normal)
ghci/prog004 ghciprog004 [bad exit code] (normal)
ghci/scripts ghci024 [bad exit code] (normal)
ghci/scripts ghci037 [bad exit code] (normal)
ghci/should_run 3171 [bad stderr] (normal)
layout layout007 [bad stdout] (normal)
numeric/should_run arith005 [bad stdout] (normal)
perf/compiler T1969 [stat not good enough] (normal)
perf/compiler T3064 [stat not good enough] (normal)
perf/compiler T4007 [bad stderr] (normal)
perf/compiler T5030 [stat not good enough] (normal)
perf/should_run 3586 [stat not good enough] (normal)
perf/should_run MethSharing [stat not good enough] (normal)
perf/should_run T3738 [stat not good enough] (normal)
perf/should_run T4321 [bad exit code] (normal)
perf/should_run T4830 [stat not good enough] (normal)
perf/should_run T4978 [stat not good enough] (normal)
perf/should_run T5113 [stat not good enough] (normal)
perf/should_run T5205 [stat not good enough] (normal)
perf/should_run lazy-bs-alloc [stat not good enough] (normal)
perf/space_leaks space_leak_001 [stat not good enough] (normal)
plugins plugins01 [bad exit code] (normal)
plugins plugins05 [exit code non-0] (normal)
quasiquotation/qq007 qq007 [exit code non-0] (normal)
quasiquotation/qq008 qq008 [exit code non-0] (normal)
rename/should_fail rnfail043 [stderr mismatch] (normal)
rts 3424 [exit code non-0] (normal)
rts atomicinc [exit code non-0] (normal)
simplCore/should_compile EvalTest [bad stdout] (normal)
simplCore/should_compile T3016 [exit code non-0] (normal)
simplCore/should_compile T3055 [bad stdout] (normal)
simplCore/should_compile T3772 [bad stdout] (normal)
simplCore/should_compile T4306 [bad stdout] (normal)
simplCore/should_compile T4945 [bad stdout] (normal)
th T1835 [exit code non-0] (normal)
th T2386 [bad exit code] (normal)
th T2597a [exit code non-0] (normal)
th T2597b [stderr mismatch] (normal)
th T2674 [stderr mismatch] (normal)
th T2685 [exit code non-0] (normal)
th T2700 [exit code non-0] (normal)
th T2713 [stderr mismatch] (normal)
th T2817 [exit code non-0] (normal)
th T3100 [exit code non-0] (normal)
th T3177 [exit code non-0] (normal)
th T3319 [exit code non-0] (normal)
th T3395 [stderr mismatch] (normal)
th T3600 [exit code non-0] (normal)
th T3899 [exit code non-0] (normal)
th T3920 [exit code non-0] (normal)
th T4188 [exit code non-0] (normal)
th T4436 [exit code non-0] (normal)
th T5037 [exit code non-0] (normal)
th T5217 [exit code non-0] (normal)
th TH_1tuple [stderr mismatch] (normal)
th TH_NestedSplices [exit code non-0] (normal)
th TH_class1 [exit code non-0] (normal)
th TH_dupdecl [stderr mismatch] (normal)
th TH_emptycase [stderr mismatch] (normal)
th TH_exn1 [stderr mismatch] (normal)
th TH_exn2 [stderr mismatch] (normal)
th TH_fail [stderr mismatch] (normal)
th TH_foreignInterruptible [exit code non-0] (normal)
th TH_genEx [exit code non-0] (normal)
th TH_mkName [exit code non-0] (normal)
th TH_pragma [exit code non-0] (normal)
th TH_recover [exit code non-0] (normal)
th TH_reifyDecl1 [exit code non-0] (normal)
th TH_reifyDecl2 [exit code non-0] (normal)
th TH_reifyMkName [exit code non-0] (normal)
th TH_repE2 [exit code non-0] (normal)
th TH_repGuard [exit code non-0] (normal)
th TH_repGuardOutput [exit code non-0] (normal)
th TH_repPrim [exit code non-0] (normal)
th TH_repPrim2 [exit code non-0] (normal)
th TH_repPrimOutput [exit code non-0] (normal)
th TH_repPrimOutput2 [exit code non-0] (normal)
th TH_repUnboxedTuples [exit code non-0] (normal)
th TH_runIO [stderr mismatch] (normal)
th TH_sections [exit code non-0] (normal)
th TH_spliceD1 [stderr mismatch] (normal)
th TH_spliceD2 [exit code non-0] (normal)
th TH_spliceDecl1 [exit code non-0] (normal)
th TH_spliceDecl2 [exit code non-0] (normal)
th TH_spliceDecl3 [exit code non-0] (normal)
th TH_spliceDecl4 [exit code non-0] (normal)
th TH_spliceE1 [exit code non-0] (normal)
th TH_spliceE3 [exit code non-0] (normal)
th TH_spliceE4 [exit code non-0] (normal)
th TH_spliceE5 [exit code non-0] (normal)
th TH_spliceE6 [exit code non-0] (normal)
th TH_spliceExpr1 [exit code non-0] (normal)
th TH_spliceGuard [exit code non-0] (normal)
th TH_spliceInst [exit code non-0] (normal)
th TH_tf1 [exit code non-0] (normal)
th TH_tf3 [exit code non-0] (normal)
th TH_tuple1 [exit code non-0] (normal)
th/2014 2014 [bad exit code] (normal)
th/TH_spliceViewPat TH_spliceViewPat [exit code non-0] (normal)
GHC 7.0.3 unregisterised build testsuite results
It would be kind of unlucky to build GHC 7.0.3 build and after all those hours of building forgot to run testsuite on it. Since I’ve omitted profiling libraries from build I also needed to omit profiling way from testsuite. Hence the testsuite was run with
cd testsuite/tests/ghc-regress make WAY="normal optc"
I hope this pretty much cover everything I can test on this simplified unregisterised build. If not, just complain in comment below. Thanks!
It took another 7 hours before I got full results (GHC unregisterised build is using C code backend which means compilation of anything takes about 2x more times than with native code backend. Please keep this in mind, so we do have not so powerful computer and yet we use the worst GHC combination on it: unregisterised build providing slow runtime libs + slow C code backend!)
Anyway, the results are here:
OVERALL SUMMARY for test run started at Sun Jun 12 03:09:10 CST 2011
2694 total tests, which gave rise to
6093 test cases, of which
0 caused framework failures
2384 were skipped
3529 expected passes
116 expected failures
0 unexpected passes
64 unexpected failures
Unexpected failures:
2014(normal)
2228(normal)
2636(normal)
3171(normal)
3424(normal)
3586(normal)
3890(normal)
4850(normal)
DoParamM(normal)
T2615(normal,optc)
T3016(normal)
T3064(normal)
T3330a(normal)
T3391(normal,optc)
T3736(normal)
T3738(normal)
T3807(normal)
T3953(normal)
T4801(normal)
ann01(normal,optc)
annfail03(normal)
annfail04(normal)
annfail05(normal)
annfail06(normal)
annfail07(normal)
annfail08(normal)
annfail09(normal)
annfail10(normal)
annfail12(normal)
annrun01(normal,optc)
apirecomp001(normal)
barton-mangler-bug(normal)
cabal04(normal)
ghc-e001(normal)
ghc-e002(normal)
ghc-e003(normal)
ghc-e004(normal)
ghc-e005(normal)
ghci024(normal)
ghci037(normal)
hpc_ghc_ghci(normal)
hpc_markup_multi_001(normal)
hpc_markup_multi_002(normal)
hpc_markup_multi_003(normal)
joao-circular(normal,optc)
layout007(normal)
openFile008(normal,optc)
qq001(normal)
qq002(normal)
qq003(normal)
qq004(normal)
qq007(normal,optc)
qq008(normal,optc)
recomp006(normal)
space_leak_001(normal,optc)
It takes quite some time…
Indeed, it takes quite some time to build unregisterised build of GHC on ARM machine. But let’s start from the beginning. I’ve been always quite ignorant to major computer architecture x86. To be honest this is probably caused by my laziness to read more about it since my university studies where I’ve been hit by all those segments, chaotic memory model and such. I must admit that AMD did really good job on AMD64, finally flat address space, 64 bit etc, but yet the platform is so boring, running everywhere…
So what’s more interesting to me are all those other platforms: PowerPC, MIPS, ARM, IA64, etc, etc. Generally speaking I quite like load-store CPU model and since last year I’ve more focused my attention to IA64 and ARM. IA64 since this is quite interesting from the assembler point of view and to ARM, since this is x86 world conqueror and my bet is that’s also future architectural winner. So ARM. I’m its user for more than five years … running it in my mobile phone, but I’m still more and more curious to learn a little bit more about this architecture and somehow connect this to my still to be performed Haskell learning — as Haskell is another project which makes me wonder what will be its outcome. Quite interesting language indeed. So Haskell and ARM, that’s it. From Haskell I’ve chosen GHC as it seems to be most spread around the community, although to be honest this is not the most easy choice if I consider ARM architecture. Anyway, the project evolves quickly and man is even able to perform unregisterised (read: build which produces not so fast binaries) build of GHC on ARM machine. I did this from time to time on GCC’s compile farm EfikaMX hosts, but still was considering to buy my own machine for better GHC/ARM hacking which I hope this blog will be about.
Anyway, big thanks to Freescale and big thanks to my friend working for Freescale who lend me nice i.MX53 Quick Start Board. I’m now able to start actual “local” tests and even hacks as time permits.

I’ve installed Freescale recommended Ubuntu/Debian based distro, installed provided GHC 6.12.1 and was able to build unregisterised build of GHC 7.0.3. I’ve used following mk/build.mk file to perform the build just to save the time I’ve disabled profiling libs which I don’t need. I need GHC 7.0.3 just to be better prepared to build GHC HEAD.
GhcUnregisterised=YES GhcWithNativeCodeGen=NO SplitObjs=NO GhcLibWays=v
And finally I got back to the post title, funny thing is, it took 16 hours 30 minutes to perform this build! Interesting is that what I observed from top, it looks as majority of time was spent in C compiler which makes a hope that perhaps future GHC with either LLVM or NCG ARM support might run quite faster… Err, if you wonder, I don’t hold build tree on (micro)SD card nor SATA drive nor USB flash connected to the i.MX53 board. I use NFS to mount some space from my main Solaris workstation to the board. This certainly causes some slowness, but not so big, believe me and I rather trust ZFS on mirrored drives than any consumer flash storage on the board.