Fun with ARM barriers and GHC RTS

While reviewing part of ARM support code in RTS I’ve found out that there are some barriers which are not implemented for ARM yet. This leads me to investigation if they are really needed and I’ve found nice little rts/testwsdeque testcase which fails. The testcase tests WSDeque which is basically lock-less deque implementation for GHC RTS. So as the test fails something is badly wrong with this.
I’ve decided to implement missing barriers and found very useful reference to The JSR-133 Cookbook for Compiler Writers in include/stg/SMP.h header file. The same header file where all the barriers are implemented. The document contains nice table listing various kinds of barriers together with instructions used to implement them on various CPU architectures. ARMv7 was among them. Doug Lea did really nice work in writing it. The isn recommended to use was dmb and I already know this isn from various ARM documentation. ARM in fact provides two isns for implementing barriers: dmb and dsb. I’ve not been 100% sure which to use and so Doug’s document was really useful for me.
Anyway, even after this, rts/testwsdeque still failed. Let’s start searching again. This time I’ve found really nice although quite complex Barrier Litmus Tests and Cookbook which on a few examples recommends some best practice when and how to use barrier instruction in solving common programming problems (spin-locks etc.). I learn that although LDREX/STREXT isns provides kind of synchronization primitives they do not enforce any barrier and so I’ve also added dmb isn into GHC’s xchg and cas functions.
Let’s rerun the test and it still fails sometime. I’ve used simple script to run it in the loop and see if it fails:

while (true); do ./testwsdeque; echo -n .; done

Example of wrong output is:

........internal error: FAIL: 6706788 3 13
    (GHC version 7.1.20110701 for arm_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted
................internal error: FAIL: 5463172 1 12
    (GHC version 7.1.20110701 for arm_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted
...................internal error: FAIL: 6496304 1 11
    (GHC version 7.1.20110701 for arm_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted
.........internal error: FAIL: 6192568 3 13
    (GHC version 7.1.20110701 for arm_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted

So testcase passes several times for one failure, but still fails.

What now?
I’ve looked into testcase, printed it. I’ve also found appropriate rts/WSDeque.[c|h] sources and printed them too.
Side note: I don’t have several monitors setup here, I’m just using single 23″ LG W2220P in portrait mode but the viewing surface is still small for such manual “debugging”. So I usually print all the relevant code, lay it either on desk or even on floor and then read the code step by step and think about it.
So I ended with printed relevant source files and half of hour later I’ve been more and more convinced that my ARMv7 specific barriers and using of barrier in xchg/cas functions all is right and that the issue really might be in RTS work-stealing deque implementation. I have some feeling leading to it… Well, you know, GHC team is usually working on x86/x64 boxes. Some of the team members are on MacOSX/x64 and some of them are even using Niagara, i.e. former Sun’s UltraSPARC Tx processors and Solaris. Both hardware platforms are quite nice when it comes to load/store reordering. On the other hand I’ve found this nice note on a blog post dealing with barriers in Linux kernel on ARM:

Since the supported architecture with the weakest memory model (effectively the one that permits the most reordering) was the DEC Alpha, this was used as the reference architecture. No other architectures have since surpassed the DEC Alpha in this regard, but ARMv7-A comes pretty close.

And my idea which comes from this was simple, if Alpha was the weakest and if ARMv7 is pretty close, then perhaps ARMv7 is more weak (ie. permit more memory access reordering) than usually tested x86/x64 or UltraSPARC and then some bug might really slipped into RTS’ deque implementation. Deque code itself was written in 2008-2009. I was thinking that it was really a low chance that it was tested on Alpha even if some of GHC still contains Alpha code (which looks quite dead now (both CPU and GHC support for it I mean)). So the idea of a bug in deque implementation looked more and more real and I’ve been quite curious if I find it or not. Well, some time later I got to it! 🙂 pushWSDeque function which pushes specified data to the deque for consumption by stealing threads contained following code:

rtsBool
pushWSDeque (WSDeque* q, void * elem)
{
    StgWord t;
    StgWord b;
    StgWord sz = q->moduloSize; 
    
[...]
    b = q->bottom;
[...]
    q->elements[b & sz] = elem;
    q->bottom = b + 1;
[...]

I’ve deleted the code which is not important for the bug explanation. The bug happens on those two lines, or I shall rather tell between them!

    q->elements[b & sz] = elem;
    q->bottom = b + 1;

What you may see is assignment of elem into deque and then incrementing deque’s bottom variable to let stealing threads know, there are some new data in deque. As I learn during the bug hunting, I cannot be sure at all that the sequence will look like this. In fact it might be very well reversed by modern CPUs to:

    q->bottom = b + 1;
    q->elements[b & sz] = elem;

which if this happen would mean that: if (1) there are no data in deque and if (2) we do have some eager stealing thread waiting for new data (or polling for new data) and if (3) the sequence is reordered like above then just between the execution of the two lines stealing thread might got to its run and think hey, there are some data in the deque, let’s consume it and then it’ll got some random data since intended data are not yet assigned to the deque. And that’s all since the second line of the code above has not been executed yet. So stealing thread gets something which it should not.
Solution is quite simple, modify the code sequence to:

    q->elements[b & sz] = elem;
    write_barrier();
    q->bottom = b + 1;

The write_barrier(); which is effectively translated to dmb isn on ARMv7 enforces actual assignment of elem to really happen (and not only this, but all other pending assignments/writes before the isn execution) before the CPU comes to execute code incrementing deque’s bottom variable.
Does it solved the issue? I hope so, the while loop of testwsdeque testing was running several hundreds times without any failure. I’m running nearly full GHC testsuite now to see the results, but this will take another few hours anyway, so I’ll need to wait and see if I broke something or not. But anyway there is at least some chance that this was really the bugfix. And if so, then I’m going to push the patch upstream of course…

Conclusion: ARM is nicely RISCy and I learn some new stuff about barriers. I’ve known about them, but I’ve not had a chance to touch the stuff till now although I’m already quite some time from the college…

Goodbye Freescale i.MX53!

I’m returning i.MX53 Quick Start Board back to Freescale today and I must say, this nice little board is stable as a rock. Nearly a month ago I’ve installed it and since that time it was compiling nearly all the time without any problems. Uptime today is nearly 28 days.

lucid@lucid-desktop:~$ uptime
 17:04:36 up 27 days, 19:04,  2 users,  load average: 0.00, 0.00, 0.00
lucid@lucid-desktop:~$ 

As I already said, this board basically started my part of the project. So thank you Freescale for this generous loan! And I’m looking forward to seeing your brand new i.MX6 Quad on my desk as soon as possible. 🙂

ARMv7, Thumb, VFPv3 support and GitHub.com

So I’ve taken Stephen’s work and tested it on my Pandaboard which shows several issues needed to be worked around. First was deprecation of swp instruction on ARMv6. GNU assembler complained about it. This was quite easy so I provided ARMv6/7 specific xchg function. Next was issue which shows like Illegal instruction error and which was caused by wrong interworking between Thumb and ARM code. To be precise, Ubuntu’s GCC compiles into Thumb by default while LLVM compiles into ARM by default. That would not be that big issue if we don’t have any hand written assembler, but we have it! In fact it forms a *glue* between C world (Thumb compiled!) and Haskell world (ARM compiled!) and it looks like this glue, i.e. StgRun/StgReturn functions were not Thumb friendly enough. Last issue was that Stephen, due to having just ARMv5 machine, completely omitted floating point support. As I do have ARMv7 where VFPv3 (at least in a “crippled” form of VFPv3-D16) is broadly supported I went ahead and added support for it too.

If you are curious, just grab latest LLVM HEAD, apply following patch, compile, install somewhere and then test GHC from my fork on GitHub.com. Please remember I have probably completely broken ARMv5 support of original Stephen’s patches (Stephen, I’m sorry for this) as I’ve not #ifdefed carefully ARMv7 bits out so make sure you do have ARMv7 hardware ready.

For your reference I’m using this mk/build.mk file:

SRC_HC_OPTS = -H64m -opta=-march=armv7a -opta=-mfpu=vfpv3
GhcLibWays = v

GhcStage2HcOpts    = -opta=-march=armv7a -opta=-mfpu=vfpv3
GhcLibHcOpts       = -opta=-march=armv7a -opta=-mfpu=vfpv3

TABLES_NEXT_TO_CODE = NO

SplitObjs          = NO
HADDOCK_DOCS       = NO
BUILD_DOCBOOK_HTML = NO
BUILD_DOCBOOK_PS   = NO
BUILD_DOCBOOK_PDF  = NO

Erm, someone was faster than me making GHC/ARM registerised…

Erm, someone was faster than me making GHC/ARM port registerised! Nice GHC fellow Stephen Blackheath sent an email asking for review of his patches into LLVM-dev mailing list and it looks like he really got GHC/ARM registerised port up and running. You can see his conversation with David Terei starting here.
I’ve been contacted by Stephen, he was so nice to provide me with all his patches and guidance what to do with them. I’ve fixed few issues in them which were needed for more modern GHC HEAD and also for ARMv7 hardware and I got his work up and running here too! Congratulations to you Stephen and thanks a lot for your nice work!
If you are curious, I’ve used following mk/build.mk:

SRC_HC_OPTS = -H64m -optc=-marm -opta=-march=armv7a -opta=-mfpu=vfpv3
GhcLibWays = v

GhcStage2HcOpts    = -opta=-march=armv7a -opta=-mfpu=vfpv3
GhcLibHcOpts       = -opta=-march=armv7a -opta=-mfpu=vfpv3

TABLES_NEXT_TO_CODE = NO

SplitObjs          = NO
HADDOCK_DOCS       = NO
BUILD_DOCBOOK_HTML = NO
BUILD_DOCBOOK_PS   = NO
BUILD_DOCBOOK_PDF  = NO

and I was able to compile GHC registerised then. I've also run testsuite to see what's the port quality and the results are here:

OVERALL SUMMARY for test run started at Sun Jul  3 09:43:57 CEST 2011
    2828 total tests, which gave rise to
    7775 test cases, of which
       3 caused framework failures
    5212 were skipped

    2377 expected passes
      55 expected failures
       0 unexpected passes
     129 unexpected failures

Unexpected failures:
   ../../../libraries/hpc/tests/ghc_ghci  hpc_ghc_ghci [bad exit code] (normal)
   annotations/should_compile             ann01 [exit code non-0] (normal)
   annotations/should_fail                annfail12 [stderr mismatch] (normal)
   annotations/should_run                 annrun01 [exit code non-0] (normal)
   cabal                                  ghcpkg05 [bad stderr] (normal)
   cabal/cabal04                          cabal04 [bad exit code] (normal)
   codeGen/should_compile                 jmp_tbl [exit code non-0] (normal)
   codeGen/should_compile                 massive_array [exit code non-0] (normal)
   codeGen/should_run                     cgrun044 [exit code non-0] (normal)
   dph/diophantine                        dph-diophantine-opt [exit code non-0] (normal)
   dph/dotp                               dph-dotp-fast [exit code non-0] (normal)
   dph/dotp                               dph-dotp-opt [exit code non-0] (normal)
   dph/primespj                           dph-primespj-fast [exit code non-0] (normal)
   dph/primespj                           dph-primespj-opt [exit code non-0] (normal)
   dph/quickhull                          dph-quickhull-fast [exit code non-0] (normal)
   dph/quickhull                          dph-quickhull-opt [exit code non-0] (normal)
   dph/sumnats                            dph-sumnats [exit code non-0] (normal)
   dph/words                              dph-words-fast [exit code non-0] (normal)
   dph/words                              dph-words-opt [exit code non-0] (normal)
   driver                                 T706 [bad exit code] (normal)
   dynlibs                                T3807 [bad exit code] (normal)
   ghc-api/T4891                          T4891 [bad exit code] (normal)
   ghc-api/apirecomp001                   apirecomp001 [bad exit code] (normal)
   ghc-e/should_run                       2228 [bad exit code] (normal)
   ghc-e/should_run                       2636 [bad stderr] (normal)
   ghc-e/should_run                       3890 [bad stderr] (normal)
   ghc-e/should_run                       ghc-e001 [bad exit code] (normal)
   ghc-e/should_run                       ghc-e002 [bad exit code] (normal)
   ghc-e/should_run                       ghc-e003 [bad exit code] (normal)
   ghc-e/should_run                       ghc-e004 [bad stderr] (normal)
   ghc-e/should_run                       ghc-e005 [bad stderr] (normal)
   ghci/prog004                           ghciprog004 [bad exit code] (normal)
   ghci/scripts                           ghci024 [bad exit code] (normal)
   ghci/scripts                           ghci037 [bad exit code] (normal)
   ghci/should_run                        3171 [bad stderr] (normal)
   layout                                 layout007 [bad stdout] (normal)
   numeric/should_run                     arith005 [bad stdout] (normal)
   perf/compiler                          T1969 [stat not good enough] (normal)
   perf/compiler                          T3064 [stat not good enough] (normal)
   perf/compiler                          T4007 [bad stderr] (normal)
   perf/compiler                          T5030 [stat not good enough] (normal)
   perf/should_run                        3586 [stat not good enough] (normal)
   perf/should_run                        MethSharing [stat not good enough] (normal)
   perf/should_run                        T3738 [stat not good enough] (normal)
   perf/should_run                        T4321 [bad exit code] (normal)
   perf/should_run                        T4830 [stat not good enough] (normal)
   perf/should_run                        T4978 [stat not good enough] (normal)
   perf/should_run                        T5113 [stat not good enough] (normal)
   perf/should_run                        T5205 [stat not good enough] (normal)
   perf/should_run                        lazy-bs-alloc [stat not good enough] (normal)
   perf/space_leaks                       space_leak_001 [stat not good enough] (normal)
   plugins                                plugins01 [bad exit code] (normal)
   plugins                                plugins05 [exit code non-0] (normal)
   quasiquotation/qq007                   qq007 [exit code non-0] (normal)
   quasiquotation/qq008                   qq008 [exit code non-0] (normal)
   rename/should_fail                     rnfail043 [stderr mismatch] (normal)
   rts                                    3424 [exit code non-0] (normal)
   rts                                    atomicinc [exit code non-0] (normal)
   simplCore/should_compile               EvalTest [bad stdout] (normal)
   simplCore/should_compile               T3016 [exit code non-0] (normal)
   simplCore/should_compile               T3055 [bad stdout] (normal)
   simplCore/should_compile               T3772 [bad stdout] (normal)
   simplCore/should_compile               T4306 [bad stdout] (normal)
   simplCore/should_compile               T4945 [bad stdout] (normal)
   th                                     T1835 [exit code non-0] (normal)
   th                                     T2386 [bad exit code] (normal)
   th                                     T2597a [exit code non-0] (normal)
   th                                     T2597b [stderr mismatch] (normal)
   th                                     T2674 [stderr mismatch] (normal)
   th                                     T2685 [exit code non-0] (normal)
   th                                     T2700 [exit code non-0] (normal)
   th                                     T2713 [stderr mismatch] (normal)
   th                                     T2817 [exit code non-0] (normal)
   th                                     T3100 [exit code non-0] (normal)
   th                                     T3177 [exit code non-0] (normal)
   th                                     T3319 [exit code non-0] (normal)
   th                                     T3395 [stderr mismatch] (normal)
   th                                     T3600 [exit code non-0] (normal)
   th                                     T3899 [exit code non-0] (normal)
   th                                     T3920 [exit code non-0] (normal)
   th                                     T4188 [exit code non-0] (normal)
   th                                     T4436 [exit code non-0] (normal)
   th                                     T5037 [exit code non-0] (normal)
   th                                     T5217 [exit code non-0] (normal)
   th                                     TH_1tuple [stderr mismatch] (normal)
   th                                     TH_NestedSplices [exit code non-0] (normal)
   th                                     TH_class1 [exit code non-0] (normal)
   th                                     TH_dupdecl [stderr mismatch] (normal)
   th                                     TH_emptycase [stderr mismatch] (normal)
   th                                     TH_exn1 [stderr mismatch] (normal)
   th                                     TH_exn2 [stderr mismatch] (normal)
   th                                     TH_fail [stderr mismatch] (normal)
   th                                     TH_foreignInterruptible [exit code non-0] (normal)
   th                                     TH_genEx [exit code non-0] (normal)
   th                                     TH_mkName [exit code non-0] (normal)
   th                                     TH_pragma [exit code non-0] (normal)
   th                                     TH_recover [exit code non-0] (normal)
   th                                     TH_reifyDecl1 [exit code non-0] (normal)
   th                                     TH_reifyDecl2 [exit code non-0] (normal)
   th                                     TH_reifyMkName [exit code non-0] (normal)
   th                                     TH_repE2 [exit code non-0] (normal)
   th                                     TH_repGuard [exit code non-0] (normal)
   th                                     TH_repGuardOutput [exit code non-0] (normal)
   th                                     TH_repPrim [exit code non-0] (normal)
   th                                     TH_repPrim2 [exit code non-0] (normal)
   th                                     TH_repPrimOutput [exit code non-0] (normal)
   th                                     TH_repPrimOutput2 [exit code non-0] (normal)
   th                                     TH_repUnboxedTuples [exit code non-0] (normal)
   th                                     TH_runIO [stderr mismatch] (normal)
   th                                     TH_sections [exit code non-0] (normal)
   th                                     TH_spliceD1 [stderr mismatch] (normal)
   th                                     TH_spliceD2 [exit code non-0] (normal)
   th                                     TH_spliceDecl1 [exit code non-0] (normal)
   th                                     TH_spliceDecl2 [exit code non-0] (normal)
   th                                     TH_spliceDecl3 [exit code non-0] (normal)
   th                                     TH_spliceDecl4 [exit code non-0] (normal)
   th                                     TH_spliceE1 [exit code non-0] (normal)
   th                                     TH_spliceE3 [exit code non-0] (normal)
   th                                     TH_spliceE4 [exit code non-0] (normal)
   th                                     TH_spliceE5 [exit code non-0] (normal)
   th                                     TH_spliceE6 [exit code non-0] (normal)
   th                                     TH_spliceExpr1 [exit code non-0] (normal)
   th                                     TH_spliceGuard [exit code non-0] (normal)
   th                                     TH_spliceInst [exit code non-0] (normal)
   th                                     TH_tf1 [exit code non-0] (normal)
   th                                     TH_tf3 [exit code non-0] (normal)
   th                                     TH_tuple1 [exit code non-0] (normal)
   th/2014                                2014 [bad exit code] (normal)
   th/TH_spliceViewPat                    TH_spliceViewPat [exit code non-0] (normal)

LLVM on ARM testing

In my LLVM on ARM post I’ve claimed that the best GCC to compile LLVM on ARM natively (no cross compiling from x86/amd64 to ARM!) is GCC 4.4.1. Last week I kept all my available hardware running to test LLVM 2.9 and LLVM HEAD more thoroughly so I’ll have some proof that this claim is actually true. Well, it’s indeed true at least so far, but there are also other GCC versions able to get to the same level of quality as 4.4.1 while compiling LLVM source base on ARM.
Anyway, I’ve tested 6 GCC versions on both LLVM 2.9 and LLVM HEAD as of following commit

commit ca42299619cf47371a42c2bda87d067e003657ea
Author: Eric Christopher <echristo@apple.com>
Date:   Wed Jun 29 17:53:29 2011 +0000

    Move XCore from getRegClassForInlineAsmConstraint to
    getRegForInlineAsmConstraint.
    
    Part of rdar://9643582
    
    
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134080 91177308-0d34-0410-b5e6-96231b3b80

I’ll list results in two tables, one for LLVM 2.9 and one for LLVM HEAD. Compilation parameters were -O0, -O1, -O2, and default which is -O3. I’ve run only LLVM tests which are distributed with LLVM itself (in the same package, not separate huge LLVM testsuite!) and which are invoked by make check. The tables lists tested GCC versions, compile parameters and a number of unexpected failures which I got from this combination. Interested reader might click on the number to download output of the make check command to see which tests exactly failed and why. I hope someone from LLVM community and also from Linaro community will actually do, I’ve done this testing as a service to both communities… Below the tables you can also find gcc -v output for all GCC version involved for the idea how compilers where configured and built.

LLVM 2.9 results:

-O0 -O1 -O2 default
GCC 4.3.4 (1) 1 1 45 39
GCC 4.4.1 (2) 1 1 45 39
GCC 4.4.5 (3) 54 54 98 92
GCC 4.5.2 (4) 54 54 112 112
GCC 4.6.1/2011.05 (5) 1 1 59 59
GCC 4.6.1/2011.06 (6) 1 1 59 59

LLVM HEAD results:

-O0 -O1 -O2 default
GCC 4.3.4 (1) 29 29 82 82
GCC 4.4.1 (2) 29 29 82 82
GCC 4.4.5 (3) 29 29 82 82
GCC 4.5.2 (4) 29 29 100 100
GCC 4.6.1/2011.05 (5) 29 29 100 100
GCC 4.6.1/2011.06 (6) 29 29 100 100

Interesting is number of regression presented in LLVM HEAD in comparison with LLVM 2.9. Compilers were configured as:

(1) GCC 4.3.4: run on Ubuntu 10.04.2 LTS on i.MX53 Quick Start Board together with binutils 2.20.1-system.20100303

Using built-in specs.
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.3.4-10ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --disable-sjlj-exceptions --with-arch=armv6 --with-tune=cortex-a8 --with-float=softfp --with-fpu=vfp --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.3.4 (Ubuntu 4.3.4-10ubuntu1) 

(2) GCC 4.4.1: run on Ubuntu 11.04 on Pandaboard together with binutils 2.21.0.20110327

Using built-in specs.
Target: arm-linux-gnueabi
Configured with: /export/home/karel/src/gcc-4.4.1-t2/gcc-4.4-4.4.1/src/configure -v --with-pkgversion='Ubuntu 4.4.1-4ubuntu9' --enable-languages=c,c++ --prefix=/export/home/karel/arm-sfw/gcc-4.4.1-4ubuntu9 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --without-included-gettext --enable-threads=posix --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --disable-sjlj-exceptions --with-arch=armv6 --with-tune=cortex-a8 --with-float=softfp --with-fpu=vfp --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) 

(3) GCC 4.4.5: run on Ubuntu 11.04 on Pandaboard together with binutils 2.21.0.20110327

Using built-in specs.
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.4.5-15ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.4.5 (Ubuntu/Linaro 4.4.5-15ubuntu1) 

(4) GCC 4.5.2: run on Ubuntu 11.04 on Pandaboard together with binutils 2.21.0.20110327

Using built-in specs.
COLLECT_GCC=gcc-4.5
COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) 

(5) GCC 4.6.1 Linaro 2011.05: compiler compiled by me from Linaro 2011.05 source distribution. Run on Ubuntu 11.04 on Pandaboard together with binutils 2.21.0.20110327

Using built-in specs.
COLLECT_GCC=/export/home/karel/arm-sfw/gcc-4.6-linaro-2011.05-0/bin/gcc
COLLECT_LTO_WRAPPER=/export/home/karel/arm-sfw/gcc-4.6-linaro-2011.05-0/libexec/gcc/arm-linux-gnueabi/4.6.1/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../gcc-linaro-4.6-2011.05-0/configure -v --enable-languages=c,c++ --prefix=/export/home/karel/arm-sfw/gcc-4.6-linaro-2011.05-0 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --without-included-gettext --enable-threads=posix --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3 --with-mode=arm --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.6.1 20110506 (prerelease) (Linaro GCC 4.6-2011.05-0) 

(6) GCC 4.6.1 Linaro 2011.06: compiler compiled by me from Linaro 2011.06 source distribution. Run on Ubuntu 11.04 on Pandaboard together with binutils 2.21.0.20110327

Using built-in specs.
COLLECT_GCC=/export/home/karel/arm-sfw/gcc-4.6-linaro-2011.06-0/bin/gcc
COLLECT_LTO_WRAPPER=/export/home/karel/arm-sfw/gcc-4.6-linaro-2011.06-0/libexec/gcc/arm-linux-gnueabi/4.6.1/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../gcc-linaro-4.6-2011.06-0/configure -v --enable-languages=c,c++ --prefix=/export/home/karel/arm-sfw/gcc-4.6-linaro-2011.06-0 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --without-included-gettext --enable-threads=posix --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3 --with-mode=arm --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.6.1 20110526 (prerelease) (Linaro GCC 4.6-2011.06-0) 

I hope someone will find those results useful. If anything here is wrong or require correction or if someone needs to perform some additional test, please ask in comments.

New hardware: Pandaboard

First of all *big* thanks to Freescale for lending me i.MX53 Quick Start Board. This was a move which basically started my project and although days of i.MX53 in my office are numbered and I need to return the board soon it still did a lot of work here and also motivation of myself to continue with the project! So thanks a lot Freescale!
As I wouldn’t like to stop the project I’ve decided to purchase Pandaboard myself and at least give it a try, since general feeling you can get about the board from the Pandaboard discussion group is kind of mixed. Let’s compare both boards in a table and then I’ll explain it a little bit.

i.MX53 Quick Start Board Pandaboard
CPU speed single-core Cortex A8 dual-core Cortex A9 (twice and a bit faster than single-core Cortex A8)
CPU/MEM temperature stability stable as a rock easily overheated, e.g. make -j2 fails several times per week
SD/microSD card stability stable as a rock very weak and very sensitive to card in use
Ethernet/NFS mount stability stable as a rock kernel sometimes crashes on memory allocation in USB/Eth driver

i.MX53 Quick Start Board and Pandaboard together on top of SMC GS8P-Smart switch


With Pandaboard, the first problem I’ve hit was sensitivity to SD card. I’ve originally purchased Kingston SDHC 8GB Class 10 (UltimateX 100X), but this does not work well. The second attempt with card purchase was much better and I’ve used recommended Transcend SDHC 8GB Class 10 this time without any issue (so far, knocking on the wood! :-)). I reported this my issue and complained loudly about it. I’ve though this is really board design issue (weak hardware), but some people also think that it might be software issue. See the thread SD card compatibility issue on panda discussion group. Also search the group and you will see a lot of other reports about incompatible cards or cards issues. So if you are going Pandaboard route, be very careful with your card choice!
Another issue you will easily hit with Panda, especially if you plan to use it like me for building some huge software package, is overheating. I’m using this nice temperature monitoring tool for temperature measurements on OMAP4. When stress testing panda with make -j2 I’ve been able to get up to 92°C without any troubles, I mean it’s really easy to get to this temperature. Anyway, this temperature is quite high for my taste and also for Panda since it fails several times during the stress testing and so at the end I’ve decided to start another discussion thread on the group and ask what’s the recommended heatsink for Panda. I’ve purchased Primecooler PC-NB1 for my panda at the end and it really does its job very well. It basically decreases top temperature from 92°C to just 52°C which is much more acceptable for me at least.
The last issue with Panda which I’ve observed so far is its mediocre ethernet controller connected to USB. Its current software support is quite weak, but I still don’t have time to test proposed solution yet.
I hope this last kernel/NFS/Eth/USB issue will be solved in the next Ubuntu release for Pandaboard and makes my Panda really stable at the end.

Primecooler PC-NB1 on top of Pandaboard


Primecooler PC-NB1 on top of Pandaboard from the top