This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM57 performance changes after write to 0x0FF87000

Other Parts Discussed in Thread: HALCOGEN

I'm completely baffled at this point, and don't fully expect any good responses to this post, as the results are so vague, but thought I'd put this out there anyway.

Have been running on the hercules for a few months now. Have created a custom RTOS based SW solution referencing a lot of the HALCoGen base code, and it has worked well to date, but we've run into a snag. When creating the system initialization code, there was initially a mistake made, the base address of the flash registers was missing an F (0x0FF87000 vs 0xFFF87000). So essentially, the routine below was writing to a RESERVED area of memory. The clock settings we were using worked fine with the default waitstates on the Flash, but when we started trying to use the EEPROM, we noted this problem since the defaults were not sufficient. 

Here's the strange part. When we made this update, the CPU loading went up by a factor of 40 in I/O heavy routines (ex. many writes to UART)! I found this really odd, so I went through many tests including

- Removed calls to set flash controller timings
- Removed all calls to flash controller
- Disabled mpu, cache, etc. (in combination with two items above)
- Added synchronization barriers between all calls
- Relocated code into RAM and executed

None of these had any effect. However, adding a single write to the RESERVED region from 0x08480000 and 0x30000000 causes the I/O to speed up. It doesn't matter when I make this write, it has this effect. I boot, profile an algorithm that has I/O, writel(0, 0x08480000), then run algorithm again, and see this delta. Effectively, the I/O is in super snail mode until I make this random write into this specific memory space. 

I have verified that I'm not getting any spurious IRQ or exceptions in this scenario, so am just completely befuddled as to what might be happening here...is a bit hard to sell this random line of code in our system to "Enable fast I/O" since we don't know what is going on...anyone have any thoughts?


  • Hmm. Doesn't make any sense.

    Can you provide a simple test program that produces this behavior (stripped down at the bare minimum)?

    Just to make sure I understand, you're saying:
    1) one write to that reserved range any time during startup
    2) IO read/writes speed up by more than factor of 40
    (because the function they are speeds up by a factor of 40 so the IO speedup has to be more than 40x)

    That's a huge speedup which is one reason it's very difficult to believe.

    -Anthony
  • Your description is accurate. CPU intensive algorithms (like a whetstone benchmark routine) are not impacted, but any operations that write to the SCI are hugely impacted. 

    - Run foo() algorithm at 2Hz, CPU usage ~20%
    - Write to 0x08480000
    - Run foo() algorithm at 2Hz, CPU usage ~0.58%

    Unfortunately. the foo() algorithm is not simple enough to list here, involves floating point math and usage of the SCI uart for writing out data. 

    Have a control application which has a PID loop running at 1KHz that writes telemetry out the SCI UART. Has similar result.

    - Run control() algorithm at 1KHz, CPU usage ~2-3% 70-80%
    - Write to 0x08480000
    - Run control() algorithm at 1KHz, CPU usage ~70-80% 2-3%

    Whetstone output is unaffected, so I've made the assumption that it is IO related. 

  • Hah, I saw you corrected one question I had before I had a chance to ask :)

    So let me just comment - I think it's too high a level of measurement to conclude that *all* IO activity speeds up.

    If you're just looking at two routines that contain UART code, and you're using blocking SCI calls, then simply messing with the SCI baud rate would greatly impact the time you measure at this high level of profiling.

    That's probably the most likely scenario as you could get a 40x effect by changing the baud rate.

    Are you actually looking at the SCI output when all this happens to see that the baud rate didn't change??

    -Anthony
  • So for foo(), the application is actually a shell based app, and the output is printed to the console. The data is all as expected (human readable).

    For control(), The data that is sent is being encoded into a proprietary protocol, sent to the PC and displayed on a GUI, and that data is also coming in just fine. Additionally, this data has a CRC, and there are no errors.

    Also, the SCI is configured for interrupt driven transfers from a ringbuffer, not poll'd.
  • Ok, well there still could be something you've inadvertently done to affect timing, or perhaps to corrupt the program in a weird way.

    I think you'd need to either profile or trace a level deeper to figure out what's going on but there's no way that you've simply sped up the IO by a factor of 40x without making a change that would have shown up as a difference in the baud rate. Gotta be something more subtle like you've knocked out some other function that was taking time and now isn't executing due to a corruption caused by writing to that reserved area...

    If you can boil this down to a small test program to send we could look at it on ETM trace but that's about all I can offer given this high level of description. For the ETM trace this would be best if you have the source code and object both..

    That tool can give us a histogram view of the PC though..

    -Anthony