I'm trying to get the cycle counter running on my ARM Cortex-A9 (on a PandaBoard running Arch Linux). I've created and loaded a kernel module allowing userland access, but then reading the cycle counter is unreliable. Sometimes it works and sometimes it doesn't. For example, here is the result of running a cycle reading program three times (program is below)
$this->bbcode_second_pass_code('', 'panda> a.out
Illegal instruction
panda> a.out
function took exactly 700059 cycles (27 overhead)
panda> a.out
Illegal instruction
')
Has anybody seen similar behavior and tracked it down?
-Ted
----------
The kernel module does nothing but this:
$this->bbcode_second_pass_code('', ' /* enable user-mode access to the performance counter */
asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(1));
/* enable all counters */
asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f));
')
----------
Here's the cycle reading program I used to test (from http://stackoverflow.com/questions/3247373). Similar programs have been similarly flakey too.
$this->bbcode_second_pass_code('', '#include <stdio.h>
#include <stdint.h>
static inline unsigned int get_cyclecount (void)
{
unsigned int value;
// Read CCNT Register
asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));
return value;
}
static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
// in general enable all counters (including cycle counter)
int32_t value = 1;
// peform reset:
if (do_reset)
{
value |= 2; // reset all counters to zero.
value |= 4; // reset cycle counter to zero.
}
if (enable_divider)
value |= 8; // enable "by 64" divider for CCNT.
value |= 16;
// program the performance-counter control-register:
asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));
// enable all counters:
asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));
// clear overflows:
asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}
int main()
{
// init counters:
init_perfcounters (1, 0);
// measure the counting overhead:
unsigned int overhead = get_cyclecount();
overhead = get_cyclecount() - overhead;
unsigned int t = get_cyclecount();
// do some stuff here..
for (int i = 0; i<100000; i++)
i=i+1;
t = get_cyclecount() - t;
printf ("function took exactly %d cycles (%d overhead)\n", t, overhead);
return 0;
}')