ARMv7 rebuild

This forum is for supported devices using an ARMv7 Texas Instruments (TI) SoC.

ARMv7 rebuild

Postby kmihelich » Wed Jun 08, 2011 3:32 am

Just looking for any community comments on what I'm working on..

So the first build-up of the v7 repo was interesting and exposed me to some of the issues out there, and got the autobuild system refined to handle multiple architectures. But what I'd like to see, what it seems like a lot of people want to see, is better performance.

The current v7 repo is built soft-float, though capable of using hardware FPUs if available and targeted (such as in multimedia apps). In short, even with this system, all FPU instructions still have to go through software before going to the actual processor. This increases the amount of instructions that must be processed by the ARM core before being finally offloaded, and as the instructions and FPUs get better and faster, this overhead compounds. This is opposed to building hard-float, where floating-point instructions are passed directly to the FPU with no software overhead. This has been proven in test-case comparisons to be on average 40% faster, and in cases of high FP operations an improvement of up to 200% has been seen.

What I'm currently working on is rebuilding the v7 repo to be purely hard-float, supporting the lowest-common-denominator system: the Tegra. Beagle and Panda both support the full 32 register vfpv3 FPU along with NEON. nVidia chose to not fully implement all the features available with the Cortex-A9, and thus they have a 16 register vfpv3 FPU (vfpv3-d16) and no NEON support (though in theory the GPU could do NEON-like operations if code targeted it). Additionally, since only a couple of the proprietary drivers for graphics chips on the boards have hard-float variants, I'll be separating a small subset of packages available as soft-float, a la Arch's lib32-* pure-32bit variant of packages. In the end, this will allow maximum performance of the hardware while remaining ABI-compatible with binary xorg graphics driver releases by manufacturers. Later on down the road I plan on separating out another small subset of multimedia packages to be built with full vfpv3+neon hard-float support, which will allow for blazing fast multimedia experiences on Beagle, Panda, and other OMAP3/4 variants (sorry, TrimSlice).

For comparison among other distributions: Ubuntu/Linaro is soft-float optimized for vfpv3-d16, Debian is toying with a hard-float variant though their main armel branch is ARMv4t for the widest compatibility, Meego has committed to being pure-hardfloat for ARMv7-a, and the rest are either doing the ARMv4t thing or lining up behind the Linaro standards.

The current armv7 pacman repo will remain as-is for the time being, with everything remaining available. Once I have a solid base to release it will likely be replaced with the quickness.
Arch Linux ARM exists and continues to grow through community support, please donate today!
kmihelich
Developer
 
Posts: 1133
Joined: Tue Jul 20, 2010 6:55 am
Location: aka leming #archlinuxarm

Re: ARMv7 rebuild

Postby pepedog » Wed Jun 08, 2011 7:59 am

I trust you on this.
Does the trimslice have something lacking hardware-wise?

Incidentally, on trimeslice (the gentoo guy) talks about compile of kernel with make -j5
Tried this and although build went fast (20 minutes) modules install (and there are only 3) showed they were not built, and make uImage rebuilt lots of modules and still failed. Plus can't concentrate with sciatica
pepedog
Developer
 
Posts: 2431
Joined: Mon Jun 07, 2010 3:30 pm
Location: London UK

Re: ARMv7 rebuild

Postby kmihelich » Wed Jun 08, 2011 2:58 pm

TrimSlice is only lacking FPU parts, which is the case for all Tegra and Marvell Dove boards. make -j5 just says to run 5 jobs at the same time, which follows the standard wisdom of '# of processors * 2 + 1'. But that wisdom is debatable on ARM, in my opinion. The idea is to have enough jobs that all processors are saturated, and thus things go as fast as possible, but with limited RAM and a severe USB bottleneck I just stick with -j2 if compiling locally and -j3 for distcc cross-compiling. This still saturates both cores, doesn't max out I/O wait as USB tries to keep up, keeps RAM usage out of swap when it runs into big pre-processing and linking jobs, and gives the device some time to breathe. As for the TrimSlice kernel, I'm waiting on them to get more of the code committed to mainline. There were a great number of advancements for Cortex in general in .38 and .39, so using .36 just seems silly to me -- borderline hackish.
Arch Linux ARM exists and continues to grow through community support, please donate today!
kmihelich
Developer
 
Posts: 1133
Joined: Tue Jul 20, 2010 6:55 am
Location: aka leming #archlinuxarm

Re: ARMv7 rebuild

Postby kmihelich » Wed Jun 15, 2011 4:08 am

I'm still in the first stages of getting the new toolchain and core filesystem in place, but thought I'd share some quick Whetstone benchmark comparisons.

Original soft setup:
Loops: 100000, Iterations: 1, Duration: 117 sec.
C Converted Double Precision Whetstones: 85.5 MIPS

New hard setup:
Loops: 100000, Iterations: 1, Duration: 20 sec.
C Converted Double Precision Whetstones: 500.0 MIPS
Arch Linux ARM exists and continues to grow through community support, please donate today!
kmihelich
Developer
 
Posts: 1133
Joined: Tue Jul 20, 2010 6:55 am
Location: aka leming #archlinuxarm

Re: ARMv7 rebuild

Postby pepedog » Sun Jun 26, 2011 11:30 pm

How is this going?
Need help? If you have a rootfs, and tell me what to set for the FP, I could compile my trimslice kernel and the get it building apps for you natively. V7 stuff for hard is quite small in number, and you are not developing any more.
Got working xdm, lxde and icewm, would have liked to have seen kde too. X can only hold my attention for so long, want to get benefit of power for headless as quick as poss.
pepedog
Developer
 
Posts: 2431
Joined: Mon Jun 07, 2010 3:30 pm
Location: London UK

Re: ARMv7 rebuild

Postby kmihelich » Mon Jun 27, 2011 12:20 am

I have base and base-devel done, working on getting the toolchain the way I want it right now.. 4.6 keeps giving me issues. I could stick with 4.5.2 for now, but I'd like to at least give it my best shot for 4.6. After the toolchain it's just a matter of doing a recompile of everything again.. getting to this point has been far more hackish than proper. Then it will be ready for prime time.

Compiling goes much quicker for me using the panda instead of the beagle, but if you wanted to get a crosstools-ng setup running it would go that much faster with some help. It will be a little bit yet, but soon.
Arch Linux ARM exists and continues to grow through community support, please donate today!
kmihelich
Developer
 
Posts: 1133
Joined: Tue Jul 20, 2010 6:55 am
Location: aka leming #archlinuxarm

Re: ARMv7 rebuild

Postby pepedog » Mon Jun 27, 2011 12:33 am

Would I have to cross compile if doing it on a trimslice? Can't I just set some options or flags and run the build scripts? Has the plug gcc and lobs got the capability?
Totally ignorant on this, plus dont want to have something drinking the electric.
pepedog
Developer
 
Posts: 2431
Joined: Mon Jun 07, 2010 3:30 pm
Location: London UK

Re: ARMv7 rebuild

Postby kmihelich » Mon Jun 27, 2011 12:50 am

You don't have to, it's just faster. I have actually been very impressed with how fast the panda is just on its own, multitudes faster than what I was getting on the beagle. But yeah, compiles can all be run natively, I just distcc to a cross compiler to go faster.
Arch Linux ARM exists and continues to grow through community support, please donate today!
kmihelich
Developer
 
Posts: 1133
Joined: Tue Jul 20, 2010 6:55 am
Location: aka leming #archlinuxarm

Re: ARMv7 rebuild

Postby pepedog » Mon Jun 27, 2011 7:45 am

What settings, or what files to change, to get compiling hfp
pepedog
Developer
 
Posts: 2431
Joined: Mon Jun 07, 2010 3:30 pm
Location: London UK

Re: ARMv7 rebuild

Postby kmihelich » Sun Jul 10, 2011 8:37 pm

The new OMAP rootfs is up now, you can get it through the main site or the download thread here after I update it. No TrimSlice version yet, since they seem to be lost in creating a kernel that works.
Arch Linux ARM exists and continues to grow through community support, please donate today!
kmihelich
Developer
 
Posts: 1133
Joined: Tue Jul 20, 2010 6:55 am
Location: aka leming #archlinuxarm

Next

Return to Texas Instruments (TI)

Who is online

Users browsing this forum: No registered users and 15 guests