Hi .. i am new to this forum ... i have some neon assembly code for Cortex A8. When the same code was ported on Cortex A9, it showed 40% increase in time taken. I was under the assumption that Cortex A9 had superior performance, when compared to Cortex A8. So i expected that time taken for Cortex A9 will be lesser, when compared to cortex A8 ... but in my case, it showed an increase in time

...
But when the neon assemby function was replaced with equivalent C function code, Cortex A9 showed better performance (takes 10% lesser time than cortex A8

) . So for me, Cortex A9 was showing poor performance only when neon assembly code was used ... I don't know why it happened ..
All sugesstions or comments are welcome ...
