Using NEON with WinCE 6.0 on OMAP3530

Hi,

Could you tell me whether there is a WinCE 6.0 library to use NEON on OMAP3530 ? Indeed, we would like to use it in order to perform vector/matrix floating point operations in few cycle.

Thanks for your help.

  • WinCE6.0 has no "native" support for NEONWhen multiple threads use either NEON or VFP, the registers have to be saved and restored by the kernel during context switches when a VFP/NEON exception is raised. The OMAP3 uses VFPv3 and as a consequence shares the same set of registers as NEON. Visit ARM website here for detailshttp://infocenter.arm.com/help/topic/com.arm.doc.dht0002a/DHT0002A_introducing_neon.pdf

    • Problem:
      • The WinCE6.0 R2 (and R3) kernel only saves D0 to D15 registers only when a VFP instruction occurs. Although Microsoft provides kernel hooks like OEMSaveVFPCtrlRegs() and OEMRestoreVFPCtrlRegs() in order to save extra VFP registers, implementing these functions won't work as they will only allow you to save eight additional 32 bits registers. We need at least sixteen more i.e. from D16 to D31. Visit MSDN here for detailshttp://msdn.microsoft.com/en-us/library/aa913547.aspx
      • If you are interested and have access to the Microsoft WinCE Kernel Source code, look at the %_WINCEROOT%\Private\WINCEOS\COREOS\NK\KERNEL\ARM\armtrap.s OEMSaveVFPCtrlRegs/OEMRestoreVFPCtrlRegs.
      • You would think you can just increment the number of VFP registers in the %_WINCEROOT%\PUBLIC\COMMON\SDK\INC\winnt.h file. However, all the kernel source code would not be aligned if not re-built from the %_WINCEROOT%\PRIVATE source tree which we are not allowed to modify besides Microsoft itself.
    • Solution to support both VFP + NEON:
      • Don't know if a reliable one exists as Microsoft has not support for it in its kernel code and their official position is to NOT support it. And the Shared Source Microsoft Program only allows people to look at the kernel code, no modifications allowed.
      • The only available option with WinCE6 is to use either the ARM supplied VFPv2 library OR Neon - Neon is used by the accelerated GDI functions in the current WinCE BSP for the display driver so you would need to disable/remove this part.
        • Neon context is saved/restored using the co-proc kernel callbacks in the BSP.
        • If you use the ARM supplied VFPv2 library then you must disable the co-proc kernel callback and allow the Microsoft kernel to save/restore the VFP context.
        • Enabling both the Neon co-proc callback functions for save and restore context, and using VFPv2 library from ARM will lead to corruption of graphics or incorrect floating point results.
     
    Hope this helps. 
    Regards
    David.

  • In reply to DavidBercovitz:

    This means the current WinCE6.0 toolchain has no support for NEON and there is no emulated library like the VFPv2 one from ARM. Your only chance is to get access to WinCE7.0 if you have a special agreement with Microsoft like ADP program or wait for WinCE7.0 RTM.

    Regards, David.

  • In reply to DavidBercovitz:

    Hi David,

    Thanks for your detailed answer. I have other questions :

    • If we choose to disable NEON use for GDI display acceleration, what are potential impacts ? Is it going to slow our OpenGLES rendereing that is using a Window handle to build its surface ? If not, would it be possible to easily use a toolchain like ceggc that supports neon instruction set and matching C functions and generate a compliant WinCE lib ?
    •  You are talking about WinCE 7.0 that is our only chance to access NEON. Do you know if current BSQUARE BSP for OMAP3530 are compatible with CE 7.0 or if they are already working on dedicated BSP for WINCE 7.0 on OMAP3530 board ?

     

    Thanks.

    Best Regards.

    Bertrand.

  • In reply to Bertrand Forestier:

    Bertrand,

    • NEON disabled: it means GDI won't be accelerated. However, it has no impact on OGLES. OGLES doesn't run on the ARM Cortex-A8 but on the Imagination SGX530 core. I'm not familiar with cegcc and what level of WinCE it has. I'd rather consider this as experimental. The problem with WinCE is in the binary format which is COFF/PE i.e. non-standard COFF.
    • There are very few things I'm allowed to share concerning any future development. I'd rather recommend you to contact your TI local sales representative. 

    Best regards

    David.

  • In reply to DavidBercovitz:

    Hi all,

    I am using VFPv2. TI's/BSquare documentation mentions "If you use the ARM supplied VFPv2 library then you must disable the co-proc kernel callback

    and allow the Microsoft kernel to save/restore the VFP context." How can co-proc kernel callback be disabled?

    Thanks,

    Eugen

  • In reply to Eugen Feraru:

    I may be wrong but I thought there was another issue with the VFPv2 under CE 6? I'll have to dig around maybe it was the NEON interaction. I do remember having it working in an earlier dorp of the BSP.

  • In reply to Eugen Feraru:

    In vfphandler.c, in OALVFPInitialize(), change these lines as follows:

     

        pOemGlobal->pfnInitCoProcRegs    = NULL;
        pOemGlobal->pfnSaveCoProcRegs    = NULL;
        pOemGlobal->pfnRestoreCoProcRegs = NULL;
        pOemGlobal->cbCoProcRegSize      = 0;
        pOemGlobal->fSaveCoProcReg       = FALSE;