The C6748 on-chip boot loader allows loading code from the HPI bus. From what I can tell, there are some delays in the actions taken by the boot loader ... such as between when the boot loader is told to "allow" loading, and when loading should occur.
The second delay appears between the time the boot loader is told to "launch" the application, and when the host is "allowed" to manipulate on-chip memory. It appears that the boot loader must do some "house-keeping" between "told to launch" and the "actual launch".
Does TI have a notion of how long are these delays?
Also, the C6748 boot loader document (SPRAAT2F) states that the boot loader "uses all of L1D cache". However, near as I can tell, that's not true; it leaves the C6748 cache settings at their power-on values, allowing 1/2 of L1D to remain "as RAM". Should the document be corrected?