How to determine stack/RAM usage

Sarah Weinberger

Expert 1915 points

Other Parts Discussed in Thread: UNIFLASH, HALCOGEN, RM57L843

Chip: RM48L952ZWT

Environment: Halcogen, CCS (Code Composer Studio), UniFlash, TI compiler

Memory: 16MB SDRAM plus internal to the chip; stack uses internal, while heap uses the external

How can I determine how much of the stack and memory I am using and if I am getting close to filling or exceeding one of the sections?

MEMORY
{
    VECTORS (X)  : origin=0x00000000 length=0x00000020
    FLASH0  (RX) : origin=0x00000020 length=0x0017FFE0
    FLASH1  (RX) : origin=0x00180000 length=0x00180000
    STACKS  (RW) : origin=0x08000000 length=0x0003fe00
    RAM     (RW) : origin=0x0803fe00 length=0x00000200

/* USER CODE BEGIN (2) */
    RAM2    (RW) : origin=0x80000000 length=0x01000000
/* USER CODE END */
}

SECTIONS
{
    .intvecs : {} > VECTORS
    .text    : {} > FLASH0 | FLASH1
    .const   : {} > FLASH0 | FLASH1
    .cinit   : {} > FLASH0 | FLASH1
    .pinit   : {} > FLASH0 | FLASH1
    .bss     : {} > RAM2
    .data    : {} > RAM2
    .sysmem  : {} > RAM2

    FEE_TEXT_SECTION : {} > FLASH0 | FLASH1
    FEE_CONST_SECTION : {} > FLASH0 | FLASH1
    FEE_DATA_SECTION : {} > RAM2
#endif
/* USER CODE END */
}

over 9 years ago

0 Anthony F. Seely over 9 years ago

TI__Guru 68920 points

Sarah,

I suppose you could put a fill in the linker command file, but for this particular processor the hardware memory initialization performed on the SRAM during startup [which you need to do in order to initialize all the ECC anyway] will fill the RAM with '0' so you can basically just look at the stack after your code has run and get a good idea of how much stack you are using.

The stack is full descending so the stack pointer starts at the highest (numerical) address and is decremented as values are pushed onto the stack. Just to orient you when you look at the memory... if your linker file and halcogen says a particular stack is say from
0x0803ff00 and length is 0x100 then the stack pointer should be initialized to 0x08040000 and the first register will be pushed to address 0x0803fffc since the full stack means the pointer is decremented before the push.

So as this example stack grows, it'll push downward to 0x0803fff8, 0x0803fff4, 0x0803fff0, 0x0803ffec, etc.
But at some point (hopefully) you'll start to see 0's between the lowest SP address and the bottom of your stack which is usually also the top of the next stack. You can use the # of 0's to get an idea of how much stack is being used.
Of course it may be in error if you actually *push* 0's but then any sort of magic # you use to initialize the stack RAM for stack usage is subject to that sort of problem.

Ok so once you get an IDEA of how much stack you think you are using you can do one of two things:

a) configure the MPU such that the 'spare' space in your stack - whatever margin you want to leave in your design for ?? purpose, will trap (so make it some area that is not accessible by setting a high priority MPU region covering this range). Then you can leave this MPU setting on all the time as you are testing and if you get the trap - you can increase the stack size till you stop getting the traps. You could also consider whether you want to leave this trap in place as you deploy (maybe though narrowing the window) for stack overflow detection online all the time.

b) if you want to use the debugger watchpoint feature - you can get a similar function where the CPU halts when it reads/writes memory in that 'spare' space that you want to have. Of course this will only work if an emulator is always attached during testing, and it can't work as an online check for you.

Those are the two easiest ways I can think of that are empirical / based on testing. Probably some software company out there that can do a more formal analysis of stack usage for you but I cannot point you to one (not educated enough on the topic to know).

Sorry: I think I answered your STACK question but only for internal memory.

And also didn't address HEAP.

The SDRAM doesn't get initialized by hardware like the internal RAM. Also you cannot access the SDRAM until you call the HalCoGen init() functions. So if you want to initialize the SDRAM like the internal RAM you need to do a memory fill in your own code as your application starts up. Just write 0's to all the locations - probably some C library function like memcpy() that would do this for you although don't know the function off the top of my head.

HEAP I think is different because there are different HEAPs out there. For example if you were using FreeRTOS it has I think at least three different heap algorithms. Are you using the C Compiler standard 'heap' and 'malloc' or are you using another? In any case I think with the heap you need to make sure you check the return value as you try to allocate memory to make sure it's correct. And look into the implementation to see if it provides any API's for checking how many free blocks or bytes are available. Heap can be fragmented so you can have 100,000 free bytes but not be able to allocate memory for 1000 if it cannot find 1000 contiguous .. and it doesn't always grow linearly like a stack. So I think you have to look at the manuals for the library where you are getting the heap functions to understand the algorithm and whether there's anything you can do to monitor usage or fragmentation...

Sorry 1 more edit: For any sort of STACK you might have in SDRAM, assuming you manually fill you can still then use either the MPU or the watchpoint mechanisms to trap or halt the CPU as it goes into the 'spare' region .. These mechanisms check the address at the CPU itself so it doesn't matter where the memory is located (on/off chip). It's just the hardware intialization that is missing from SDRAM. And unless you were to move the EMIF initialization into the startup code prior to main - anything you do as far as linker fills also wouldn't be effective on the SDRAM... That's why I think the simplest solution for SDRAM is to explicitly initialize the SDRAM after you init the EMIF.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

I tried to change things around, however now I see nothing, so the code did not boot. The EE told me that the SDRAM takes 275ms or so after initialization before it is ready to use or something like that. I wonder if my current memory map is the cause? If so, how best to address the problem? I changed things around so as to increase my stack space.

MEMORY
{
    VECTORS (X)  : origin=0x00000000 length=0x00000020
    FLASH0  (RX) : origin=0x00000020 length=0x0017FFE0
    FLASH1  (RX) : origin=0x00180000 length=0x00180000
    STACKS  (RW) : origin=0x08000000 length=0x0003fe00
    RAM     (RW) : origin=0x0803fe00 length=0x00000200

/* USER CODE BEGIN (2) */
    RAMBSS    (RW) : origin=0x80000000 length=0x00080000
    RAMDATA    (RW) : origin=0x80080000 length=0x00080000
    RAMFEE    (RW) : origin=0x80100000 length=0x00080000
    RAMSYSMEM    (RW) : origin=0x80180000 length=0x00E80000
/* USER CODE END */
}

SECTIONS
{
    .intvecs : {} > VECTORS
    .text    : {} > FLASH0 | FLASH1
    .const   : {} > FLASH0 | FLASH1
    .cinit   : {} > FLASH0 | FLASH1
    .pinit   : {} > FLASH0 | FLASH1
    .bss     : {} > RAMBSS
    .data    : {} > RAMDATA
    .sysmem  : {} > RAMSYSMEM

    FEE_TEXT_SECTION : {} > FLASH0 | FLASH1
    FEE_CONST_SECTION : {} > FLASH0 | FLASH1
    FEE_DATA_SECTION : {} > RAMFEE
#endif
/* USER CODE END */
}

/* USER CODE BEGIN (3) */
/* Use the entire 16MB SDRAM for the heap */
--heap_size=0x00E80000

#if 0
/* USER CODE END */

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

(My EE had me disable ECC, so I am not sure how that modifies your first answer.)

Does my C/C++ code primarily/only use the "User Stack"?

What uses the "Undefined Stack"?

Do I need to concern myself with the "Supervisor Stack"?

Is the stack relegated to the internal RAM? Halcogen hard codes the "RAM Base Address" field.

Assuming that the SDRAM startup delay is the cause of my code no longer executing, how do I add a delay, say 275ms before the CPU accesses SDRAM for reads/writes?

From my C++ code, how do I access and then print out the content of the stack pointers? I already have a routine, modified from the Halcogen example, of sciDisplayText(), which emits characters, plus there is sprintf(), so the question is just how to get the value of the stack pointer(s).

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

Initialized sections in your linker command file, like .data, get their values copied from flash to SRAM before main();
by the call to __TI_auto_init() in sys_startup.c. This is done just prior to sys_startup.c calling main().

So if you are initializing the EMIF in main, the call to __TI_auto_init() will be trying to write values from copy tables to SDRAM prior to main.

One solution might be to move the call to the emif_SDRAMInit() function into sys_startup.c but I don't know for sure what resources this would use .. you would need to make sure that it's not using anything in the sections that need to be initialized by __TI_auto_init()... It looks like this would work as the code just appears to be using constants and register addresses which should all be stuff that's in flash.

I think that the HalCoGen emif_SDRAMInit() might have this line of code:

buffer = *PTR;

To prevent exiting from the function until the EMIF SDRAM initialization completes - because it's a read from the EMIF SDRAM bank and when the SDCR write kicks off the autoinitialization sequence I believe that's not interruptable. But cannot find anything stating this in the TRM so it would not be a bad idea to confirm maybe by inserting two reads of the PMU cycle counter around this. Otherwise I can research but it may take a while.

But - your EE is right it does take something on the order of a few hundred us normally to initialize the DRAM before it's ready to use.

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Sarah,

I would definitely recommend allocating some minimal stack space to all the stacks unless you are really tight on internal RAM.

How much is minimal - well that is going to depend on whether you're pushing those floating point registers onto the stack..

but I would start with the minimal stacks as set by HalCoGen.

Sarah Weinberger said:
Does my C/C++ code primarily/only use the "User Stack"?

Yes normally unless you are using an RTOS that creates a separate stack for each task.

This is the stack that is used when the processor is either running in the USER mode or in the SVC mode.

Sarah Weinberger said:
What uses the "Undefined Stack"?

You shouldn't really need much of an Undefined stack at all. You don't have anything like that 386 software floating pt. emulation code we were talking about, right? That's the type of code that would run in the Undefined mode.

Sarah Weinberger said:
Do I need to concern myself with the "Supervisor Stack"?

Yes I would. You may switch into SVC with an SVC call for example. It depends on what you are doing.
If you are using USER mode to avoid all your code running with privilege, then to enable/disable the IRQ
you need to make an SVC call to temporarily get privilege. Might also occur if ever nesting interrupts.
I think you could start with a very small SVC mode stack though.

Sarah Weinberger said:
From my C++ code, how do I access and then print out the content of the stack pointers? I already have a routine, modified from the Halcogen example, of sciDisplayText(), which emits characters, plus there is sprintf(), so the question is just how to get the value of the stack pointer(s).

I think you need to write an assembly function that simply copies R13 to R0 and returns. You can call this 'get_SP' and invoke like: sp = get_SP(); the return value should be in the variable 'sp' if you move it to R0 in assembly.

Unfortunately I don't see an intrinsic like _get_MSP() that only works for cortex M.. It would be nice if the equivalent existed for R.

Also I just noticed that there is an 'armofd' utility documented in the Assembly Language tools manual.

It will output XML that includes the worst-case stack usage of each function - so that could be used as an input to a more formal 'stack usage' study I suppose.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Hi Anthony,

Do you have a link to the assembly PDF?

I do not know the RM48L952ZWT assembly, so I have to look that up to write the get_SP() function.

/* USER CODE BEGIN (74) */
	// Initialize: 16MB (128Mb) SDRAM
	// Must be done prior to __TI_auto_init(), as that function uses section of EEPROM.
	emif_SDRAMInit();
/* USER CODE END */

    /* Configure system response to error conditions signaled to the ESM group1 */
    /* This function can be configured from the ESM tab of HALCoGen */
    esmInit();
    /* initialize copy table */
    __TI_auto_init();
/* USER CODE BEGIN (75) */
/* USER CODE END */

I placed the emif_SDRAMInit() just prior to the __TI_auto_init() and laid out the memory map as above, basically with .bss, .data, .sysmem, and FEE_DATA_SECTION all located in SDRAM. I wanted to give 512K to .bss, .data, and FEE with the heap (sysmem) getting the rest.

I gather that .sysmem and --heap_size are one and the same, so the length of one is the length of the other, just .sysmem states the origin. True? False?

This approach caused the processor to not start properly, however thankfully I was still able to use Uniflash to erase the chip. Reverting back to most things in internal RAM brought back the display and execution.

-> 386/Floating Point

I have the floating point stuff enabled, so that everything would work, as Halcogen uses FP, as explained in that other thread, however my code has no math functions whatsoever and does not use FP.

-> RTOS/privilege

No. I have a home grown OS that operates, once I get that up and running, in a cooperative tasking. main() in C transfers at the end to pmMain(), which is the C++ entry point. I initialize classes and then proceed into an infinite while loop: different tasks get called and then loops back to the beginning. The code does not switch to protected mode, as I did not know this chip has a protected mode, similar to the 386. The code does absolutely nothing with processor anything once in C++.

Let em know if I forgot to answer a question of yours.

What can be the cause of things not starting, when using SDRAM?

You never answered my query about possibly relocating the stack origin to SDRAM. I am thinking that might be a bit better, as there is more space and C++ does tend to use a lot of stack, not to mention I do have variables declared on the stack.

(Yes, I still want to get a feel for how much stack I use, so that task is a work in progress.)

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

Sarah Weinberger said:
Do you have a link to the assembly PDF?

It should be something like this:

    .text
    .arm
    .def     get_SP
    .asmfunc

get_SP

mov r0, r13

bx lr

.endasmfunc

Sarah Weinberger said:
-> 386/Floating Point

Yes this was just a response to your question of what the Undefined mode is for. It is for emulating instructions like floating point, when there is no floating point coprocessor. So theoretically you could take your RM48 code and run it on our RM42 (no VFP) and link in a floating point emulator library. You would have the code all compiled with VFP instructions - and every one of them would trap into Undefined mode. Then the undefined mode code (which uses the undefined mode stack) would look up the instruction opcode that trapped, and if it determines it is a floating point opcode it would emulate that opcode with a software function. This is not really much though - therefore you probably don't need much of an Undefined mode stack. I'd just make sure there's minimal in case for some reason you get into this mode unintentionally (say during development, branching into the 'weeds'.. executing data [the MPU can protect you from executing data by the way if you configure it appropriately..]

Sarah Weinberger said:
What can be the cause of things not starting, when using SDRAM?

I don't know the answer to this one. Would need to see what is going on.

Can you get as far as running to _TI_autoInit()? How about main()? Or are you finding that it crashes earlier?

When you move the SDRAM init did you *remove* it from main() as well? I don't think calling it 2x is a good idea.

If you can get as far as main, then I'd try breaking at _TI_autoInit() then resuming. If it doesn't crash when you do that, then

probably that read of *PTR doesn't actually cause the CPU to wait until the DRAM init is complete .. which would then be an issue when followed by autoinit.

The other thing to check is the MPU setting - to make sure that the SDRAM area of the memory map is accessible.

Sarah Weinberger said:
You never answered my query about possibly relocating the stack origin to SDRAM. I am thinking that might be a bit better, as there is more space and C++ does tend to use a lot of stack, not to mention I do have variables declared on the stack.

I would recommend against this. External SDRAM is not protected by ECC. Assuming you have high diagnostic coverage requirements?

If you need more than 256K on chip SRAM we have the RM57L843 with 512K ECC protected SRAM.

Sarah Weinberger said:
No. I have a home grown OS that operates, once I get that up and running, in a cooperative tasking. main() in C transfers at the end to pmMain(), which is the C++ entry point. I initialize classes and then proceed into an infinite while loop: different tasks get called and then loops back to the beginning. The code does not switch to protected mode, as I did not know this chip has a protected mode, similar to the 386. The code does absolutely nothing with processor anything once in C++.

OK so you SHOULD be running everything in System mode then - but maybe you can tell me your CPSR value to confirm.

The mode is encoded in the M[4:0] field of CPSR.

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

Hi Anthony,

void main(void)
{
/* USER CODE BEGIN (3) */
	// Initialize: Modules
	// Enable the MPU unit to avoid extra writes.
//	emif_SDRAMInit();	// Initialize: 16MB (128Mb) SDRAM
	void * lpKludge = malloc(1);
	lpKludge = lpKludge;

I put back the memory map to use the SDRAM for BSS, DATA, SYSMEM, and FEE. I usually use UniFlash with a serial port hookup, however I ran via CCS to see where I stop. Code stops at the 1-byte malloc().

There is another open ticket dealing with SDRAM and malloc(). I found a workaround to my problem by allocating 1-byte at the very beginning. After that, I am able to allocate any size space.

NOTE: TI dropped the ball, did not respond, left the issue hanging.

e2e.ti.com/support/microcontrollers/hercules/f/312/t/519401

With the default RAM configuration, I am able to allocate the first 1-byte and then all other malloc requests. I just leave the 1-byte allocated and do not use that memory. It is kludge, but... Sadly, my kludge does not work with the memory map using SDRAM (non ECC mode).

Thoughts on why the new SDRAM memory map configuration does not work?

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

I have done some tests and have a new data point.

This build fails, namely with .DATA in SDRAM and .BSS uses internal RAM:

MEMORY
{
    VECTORS (X)  : origin=0x00000000 length=0x00000020
    FLASH0  (RX) : origin=0x00000020 length=0x0017FFE0
    FLASH1  (RX) : origin=0x00180000 length=0x00180000
    STACKS  (RW) : origin=0x08000000 length=0x0002c800
    RAM     (RW) : origin=0x0802c800 length=0x00013800

/* USER CODE BEGIN (2) */

    RAMDATA    (RW) : origin=0x80080000 length=0x00080000
    RAMFEE    (RW) : origin=0x80100000 length=0x00080000
    RAMSYSMEM    (RW) : origin=0x80180000 length=0x00E80000
/* USER CODE END */
}

--heap_size=0x00E80000

SECTIONS
{
    .intvecs : {} > VECTORS
    .text    : {} > FLASH0 | FLASH1
    .const   : {} > FLASH0 | FLASH1
    .cinit   : {} > FLASH0 | FLASH1
    .pinit   : {} > FLASH0 | FLASH1

    .bss     : {} > RAM
    .data    : {} > RAMDATA
    .sysmem  : {} > RAMSYSMEM

    FEE_TEXT_SECTION : {} > FLASH0 | FLASH1
    FEE_CONST_SECTION : {} > FLASH0 | FLASH1
    FEE_DATA_SECTION : {} > RAMFEE
}

This build succeeds where .DATA uses internal RAM and .BSS uses SDRAM.

MEMORY
{
    VECTORS (X)  : origin=0x00000000 length=0x00000020
    FLASH0  (RX) : origin=0x00000020 length=0x0017FFE0
    FLASH1  (RX) : origin=0x00180000 length=0x00180000
    STACKS  (RW) : origin=0x08000000 length=0x0002c800
    RAM     (RW) : origin=0x0802c800 length=0x00013800

    RAMBSS    (RW) : origin=0x80000000 length=0x00080000
    RAMFEE    (RW) : origin=0x80100000 length=0x00080000
    RAMSYSMEM    (RW) : origin=0x80180000 length=0x00E80000
}

--heap_size=0x00E80000

SECTIONS
{
    .intvecs : {} > VECTORS
    .text    : {} > FLASH0 | FLASH1
    .const   : {} > FLASH0 | FLASH1
    .cinit   : {} > FLASH0 | FLASH1
    .pinit   : {} > FLASH0 | FLASH1

    .bss     : {} > RAMBSS
    .data    : {} > RAM
    .sysmem  : {} > RAMSYSMEM

    FEE_TEXT_SECTION : {} > FLASH0 | FLASH1
    FEE_CONST_SECTION : {} > FLASH0 | FLASH1
    FEE_DATA_SECTION : {} > RAMFEE
/*    FEE_DATA_SECTION : {} > RAM*/
}

I already know that EVERYTHING using internal RAM works, so the question to myself was: "Does EVERYTHING really have to use internal RAM, or is it one or more components that cannot use SDRAM, and if so, which ones and why?"

To answer this question, I started off placing this in SDRAM one at a time and seeing if I can get past the malloc of 1-byte or if execution stops there. I found out that .SYSMEM and .BSS are both okay using SDRAM, at least as far as getting to my C-style command prompt goes (think MS-DOS for a user interface). I still cannot boot to the main code, so only the test mode command prompt shows.

The 1-byte malloc at the beginning of the main() entery point does not like the .DATA section placed in SDRAM.

Execution still stops stops, as expected, going into the main code.

(The following is more for me for future reads, as I will forget and nice to have here.) Here is the Wiki definition of the .DATA section.

In computing, a data segment (often denoted .data) is a portion of an object file or the corresponding virtual address space of a program that contains initialized static variables, that is, global variables and static local variables.

I saw this link:

e2e.ti.com/support/microcontrollers/hercules/f/312/p/1895349/reply?tsid=3e3c7a0d-68be-49da-9015-6fa210c58b10&ReplyToContentTypeID=1

Initialized Data Segment:
Initialized data segment, usually called simply the Data Segment. A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.

Note that, data segment is not read-only, since the values of the variables can be altered at run time.

This segment can be further classified into initialized read-only area and initialized read-write area.

For instance the global string defined by char s[] = “hello world” in C and a C statement like int debug=1 outside the main (i.e. global) would be stored in initialized read-write area. And a global C statement like const char* string = “hello world” makes the string literal “hello world” to be stored in initialized read-only area and the character pointer variable string in initialized read-write area.

Uninitialized Data Segment:
Uninitialized data segment, often called the “bss” segment, named after an ancient assembler operator that stood for “block started by symbol.” Data in this segment is initialized by the kernel to arithmetic 0 before the program starts executing

uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.

For instance a variable declared static int i; would be contained in the BSS segment.
For instance a global variable declared int j; would be contained in the BSS segment.

I do have quite a few initialized data, mainly used for debugging, such as "char * lpszDebug1 = "Hello World: Point 1\r\n";, which according to my research gets placed in the .DATA section. I believe that I do have a few uninitialized static variables (.bss), but nowhere near as many as initialized static/const variables (.data). Still, I do not see what static variables have to do with my malloc call, which goes into the heap/sysmem, which is located in SDRAM.

I might still be chasing my original problem of memory allocations using malloc and my kludge to work around the problem is not really a work around. Something seems to be afoot with my memory map and/or configuration in Halcogen, not that I see what at the moment.

What is wrong with the .DATA section being in SDRAM? Thoughts?

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

Note: Commands do not get recognized telling me that .bss data does not work in SDRAM either. I would have to debug to see the details. That makes me wonder about SDRAM usage at all being used.

I would say the SDRAM has issues, possibly with the address, but I already did an SDRAM, several times over, where I wrote binary data from the base of SDRAM and walked it through the entire region verifying every word, so I know from that that 0x80000000 is right for the RM48L952ZWT and Halcogen as the SDRAM base. The offset is fine, just data has issues. Maybe the 275ms (or whatever the number is) might be the cause that maybe "buffer = *PTR" does not really wait for SDRAM, as thought earlier.

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Sarah,

Hmm. if you see the SDRAM not taking writes - then it could be a PCB issue, a clocking issue, or an SDRAM configuration issue.

Can you please confirm whether or not the SDRAM clock is toggling and if so at what frequency? I think there is an extra bit in HalCoGen's pinmux needed to enable the clock.

Also would be good to check the parameters in the HalCoGen SDRAM timing & geometry against the SDRAM datasheet.
Things like # of rows, columns, CAS, all really need to match exactly... If you need help there let us know.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Actually, SDRAM DOES take writes, IF, and here is the kicker, ALL segments (BSS, DATA, HEAP, and FEE) use the internal RAM. I can then launch my SDRAM tester and have it reliably verify all of SDRAM. The tester parses SDRAM into uint32 cells starting off with 0xFFEEFFEE in the first cell at 0x80000000, then reducing the count by one and placing the next data value, 0xFFEEFFED into the second cell, and so on.

I would say a timing issue, however I wrote a simple function, which I placed immediately after the initialize SDRAM call, which increments a variable to 0xFFFFF before continuing, basically adding in a 1-second delay before the TI function gets called. Code still does not work.

When I said the code comes up with .BSS in SDRAM was not true. The code comes up, but variable assignments do not take, basically my uninitialized static variables holding my commands. I would have to verify that specifically, but that seems obvious, as what else? Basically, my code simply does not like SDRAM for whatever reason, even though I can test SDRAM perfectly and repeatedly if all segments use internal RAM.

I can honestly tell you that I have no clue where or how to answer your question about clocks in Halcogen. Can you tell me what tab you would like me to capture. I can tell you that I had the EE, when he was here, double and triple check the SDRAM values against the datasheet and he said that everything is fine. Like I said, the SDRAM tests are fine, as long as the linker does not have to touch/use/know about SDRAM. ;-)

The EMIF General Setup | EMIF Clock is set to 10.000 MHz. Our main clock is set to 16 MHz, so as to be 100% compatible to the previous chip, the Intel 8036EX. Why do we have to be compatible? I have no clue. I got EE non-layperson speak as an answer.

Here are the clock tree settings.

I hope that answers your questions on clocks. If not, please let me know what you would like.

The EE checked the table in the SDRAM, see above, against the datasheet several times. We also walked data, per test described above. He was satisfied with the result and said that future issues are software (CCS memory map, etc.), not hardware. I briefly looked at the datasheet and as I recall the settings were fine. Had the settings not been fine, my SDRAM test would not have worked.

It seems to me the issue is a chicken and the egg type of problem. The SDRAM has to be initialized and operational prior to anything being done that uses it. I thought that was done with the "buffer = *PTR" assignment, not to mention my do-while-loop after that, but obviously not.

Thoughts on next steps?

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

So if your SDRAM writes do take, and you can make SDRAM functional tests pass with all of the linker initialized sections place in on-chip RAM, I think the problem is likely in the timing of the SDRAM initialization code versus when the TI auto-init function is called.

I wish it were documented with some sort of comment but my interpretation of the 'PTR' assignment is that it appears to be not something functional but simply a 'barrier' of sorts that is there to make sure the code doesn't continue until that long initiailzation cycle of SDRAM completes.

One test you could perform is to simply add another delay loop between the SDRAM init and the call to auto-init... maybe give it another 200us there. If that makes the system work with .DATA mapped to SDRAM then I think it points to just a timing issue during initialization.

Regarding the SDRAM chip timings -- you copied the correct HalCoGen page - but would need the part # of the SDRAM to make sure. Still if your EE already checked this then it's probably ok.

The 10MHz clock frequency is also low for SDRAM. Normally it's closer to 100Mhz. So we should be a little on alert for any sort of calculation that might be invalid for a 10MHz clock.

Although - my understanding of SDR SDRAM is that unlike DDR, DDR2... etc there isn't a minimum clock frequency because there's no PLL involved.

For your next step --- I think just inserting the 'extra' delay between the SDRAM init and the auto-init is the probable next step...

For us - I can also try to get a timing measurement on one of our HDKs to confirm the effectiveness of that "PTR" access as a barrier... If it's not working effectively as a barrier then I think it's probably something we need a HalCoGen ticket for.

Best Regards,

Anthony

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Hi Anthony,

Apologies for the slight delay, but was doing tests and gathering information/data points.

You can see the variable declarations at the top of the screenshot above. My BSS (uninitialized global/static data, namely testBss) test passed, no issues.

My DATA test (global/static initialized data, namely testData) failed.

testData at function entry point, line 705.

Name : testData
	Default:2884567022
	Hex:0xABEEFFEE
	Decimal:2884567022
	Octal:025373577756
	Binary:10101011111011101111111111101110

Notice that the data should be 0xFFEEFFEE but I get 0xABEEFFEE. A is 1010 and F is 1111, so the difference here is bits 12 and 14.

*************************

I am trying, as I write this post, to set a breakpoint and receiving the error "Breakpoint Manager: Error enabling this function: There is no AET resource to support this job.

Translate that into simple language and what is the cure? I can set breakpoints earlier. Maybe entering this function and touching BSS and DATA caused heartburn.

*************************

I got lucky the first time around, as now I am having step and breakpoint issues. Anyways, here is a screenshot from my earlier attempt.

Adding 1 (one) to testData, my global initialized data (DATA segment) should have been 0xFFEEFFEF, however I get 0x4AEEFFEF. There is one more bit, bit 15 off, but what is interesting is that the highest order nibble has issues.

I would say a hardware data line issue, but my SDRAM test, which starts at 0xFFFFFFFF and decrements 1 (one) for each 32-bit word in SDRAM, works major greatness when BSS and DATA are in RAM. I find it very interesting that the upper nibble has issues, but not the lower 3 nibbles. Hmm.

I stepped through (miraculously now that I cannot step or get past the first line on subsequent attempts without tripping the no AET resource issue, at least on system reset. I did not try reloading) on the second pass through my for-loop and got 0xAB54AB54, which is totally off, but an interesting pattern nontheless.

My last step on the first time around was at the malloc(150) call. The processor stopped. I expected as much, because I had no earlier 1-byte malloc(), see my other thread. The system when executing BSS and DATA out of RAM needs to have the very first malloc be less than 64-bytes and not released, then subsequent allocations at 64-bytes or more work. I know from previous tests that when BSS and DATA are in SDRAM, a malloc of 1-byte hangs the system too; so much for my kludge. Heap was/is in SDRAM.

I put in a wait loop after the emifInit() call, not to mention I am using CSS too, so time after EMIF init and before the TI Auto is not an issue.

The way that the problems appears to be a timing issue, not a boot/initialization issue. Can it be a refresh/clock issue? If so, locating BSS and DATA in RAM make a difference?

I plan to do a testSdram(), where I walk an increment, now with BSS and DATA located in SDRAM. Do not worry, as I do my SDRAM test in blocks copying existing data. Also, I will step through with CSS. More to come on this test shortly.

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

Hi Anthony, I forgot to answer your SDRAM chip question.

TI – PN: RM48L952ZWTT 337P NFBGA 16-BIT/32-BIT 3072KB FLASH MICROCONTROLLER

MICRON - MT48LC8M16A2TG-75 SDRAM 128MBX16 TSOP

The EE already confirmed the SDRAM settings, however I wonder if the clock frequency to the SDRAM is the culprit.

My testSdram1() test, which starts off with 0xFFFFFFFF and reduces the count by one (1) for each 32-bit uint32 word in SDRAM, has nothing to do with BSS or DATA segments. I do have a variable to hold my data, but that is local to the function and gets placed on the stack, which operates out of RAM, not SDRAM. Furthermore, there is no such thing as RAM to SDRAM. My local variable first presumably goes to a register and then that register in turn gets placed in SDRAM. I can imagine 2 mv statements. Still, BSS/DATA do have something to do with my test, not that I see why at the moment.

There is definitely enough of a wait after the EMIF Init and before the TI auto. Not only is there the built in "buffer = *PTR" instruction, but there is my do-while loop. Added to both of those are my tests through CSS, where I stepped over the instructions thereby giving even more, lots more, time after EMIF SDRAM initialization and prior to the TI auto call.

I also think that the data values are way too coincidental, even with DATA + 2, where I got the 0xAB54AB54 and the first nibble alteration on the DATA and DATA+1.

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

I doubt this is the issue you are facing but while trying to get setup to check out the PTR write as a barrier, I noticed something wrong in SDRAM init - see this post: .

You will probably want to put a workaround in for this in any case.

Sarah Weinberger said:
Notice that the data should be 0xFFEEFFEE but I get 0xABEEFFEE. A is 1010 and F is 1111, so the difference here is bits 12 and 14.

I don't exactly know what to make of this other than the obvious couple of bit flips as you mention. (figuring out what causes the flips is what I can't really make anything of ..)

With your test - are you:

a) comparing the .data values to what got written to SDRAM

b) writing a pattern to SDRAM in the area where .data would be and reading back.

And,

Is this particular fail the first fail or the only fail?

Sarah Weinberger said:

*************************

I am trying, as I write this post, to set a breakpoint and receiving the error "Breakpoint Manager: Error enabling this function: There is no AET resource to support this job.

Translate that into simple language and what is the cure? I can set breakpoints earlier. Maybe entering this function and touching BSS and DATA caused heartburn.

*************************

In simple language, to set a breakpoint in FLASH (basically read-only memory) the debugger sets up an address comparator that is included in the CPU's debug logic.
There are only so many of these resources, [I believe about 8 max, but you can read the # from the CP15 registers]. When you run out - this is the error you get.

You can view the breakpoints window and disable the ones you don't need at a particular time to set one where you need it.

Now, I may have made a mistake telling you where to put the emifSDRAMInit() function.

I was only thinking of the initialized sections like .data.

If you put .bss in the SDRAM then we need to check everything in HalCoGen to see if it uses .BSS before you get to emifSDRAMInit().

That could be another culprit.

Have you tried *only* putting the initialized section .DATA into SDRAM + moving emifSDRAMInit() to before the _TI_autoInit() call?
(but all other RAM based sections including .BSS would go into internal SRAM..)?

-Anthony

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Sarah,

Aside from the issue w. the SDRAM refresh rate calculation I mentioned... all of the other TIMINGs basically turn into '0's in the SDRAM configuration register because all the values even 60+ns are < 100ns you get from a 10MHz clock.

The SDRAM refresh rate calculation shouldn't cause a problem - it just kills performance because it makes the SDRAM busy delaying accesses to it 50x more frequently than needed. And at 10Mhz CAS latency 2 is a no brainer.

That leaves the geometry (# banks, page size) but these are correct as well as far as I can tell by looking at the Micron datasheet.
So I think your emif SDRAM settings in HalCoGen are fine.

Checking the pinmux wouldn't hurt... just to make sure all of the required pins are enabled.

Do make sure that the EMIF_WE pin is set for _WE and not set for EMIF_RNW... they are on the same line of pinmux but they are different signals.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

I am reading your previous comments, however a quick reply.

Yes, the EE did confirm all the PINMUX settings. I just now confirmed D17. I did link EMIF_nWE (first column).

It is highly annoying that I cannot search by name and D17 does not roll off the tongue from EMIF_nWE or the second column entry EMIF_RNW.

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

Is it possible to place the stack in the external SDRAM?

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

Yes but it may need a lot of manipulation to the existing HalCoGen startup code.
You'd absolutely need to have some stack in internal SRAM until the SDRAM is initialized.
Then there would be a stack switch.

Plus I really recommend against this as you lose ECC completely and you'll really slow down even the 10MHz device.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

My EE had me very early on disable ECC and insisted that I keep that off.

0 Anthony F. Seely over 9 years ago in reply to Anthony F. Seely

TI__Guru 68920 points

Here is a trace captured on the emif_SDRAMInit() function.

I need to think through this - but while it appears that the buffer=*PTR took some time to execute compared to say the register reads/writes, 665 cycles is not in the range of 100us.. it's more like 16us with a 40MHz CPU clock that I was running this test with.

So it's odd - seems like there is some blocking but probably not enough for the entire SDRAM initialization time requirement.

That init sequence is supposed to be something like 8 refresh periods - so perhaps the buffer=*PTR read is getting in between the first and 2nd of these.

Need to dig into the details - but this one data point makes me begin to think that one recipe might be:

- only auto-iniit sections in SDRAM

- move emif_SDRAMInit() to just prior to _TI_autoInit()

- insert additional time delay and probably remove that buffer=*PTR access from the end of emif_SDRAMInit().

Here's the image where you can see the cycle count of 665 cycles..

To be honest though - our rationale for SDRAM was really limited to buffer space for USB and Ethernet and/or data that gets paged into/out of internal SRAM for processing. For two reasons: poor performance compared to on chip RAM, and because of the lack of ECC protection. So you are venturing into a bit of a new area by putting parts of the actual 'program' and it's data tables in SDRAM.

Mostly again the SDRAM is where something like the ethernet packet buffers get stowed.. That's why this is so rough going..

Best Regards,

-Anthony

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

I did not read your just now post directly above, I did complete the following test:

    .bss     : {} > RAM
    .data    : {} > RAM2
    .sysmem  : {} > RAM2

I still get the exact same behavior with testData and malloc with the same values.

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

I just read the SDRCR post and have some questions.

For example, normally an SDRAM datasheet will specify something like "4096 refresh cycles every 64 ms" ; and HalCoGen will compute the value of the SDRCR to be (64ms / (4096 * emif clock period)). For a 100MHz EMIF clock the result would be 1500 cycles, and the EMIF would make sure at least 1 refresh occurs during this time period.

What is the specific calculation: I see 64/4096 = 0.015625. There is more to the calculation, as the article states should be above 1000.

How does one get from 100MHz to 1500 cycles?

emif.c has a value of 250.

 /* configure refresh rate*/
   emifREG->SDSRETR = (uint32)0U;  
   emifREG->SDRCR   = 250U;

The SDRAM is a Micron MT48LC8M16A2TG-75 SDRAM 128MBX16 TSOP. I downloaded the datasheet. I see a refresh count of 4K in all 3 columns. I know the EMIF clock frequency is 10 MHz. What does that make my equation and timing? Is 250 correct?

What did you want me to do with this register? (I have to know the calculation before I can adjust.)

Do you think the refresh rate is partly the cause?

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

I don't think the refresh rate is the cause of the issue directly, if the symptom is that your application crashes.

Now if your application could crash *indirectly* due to running much slower than expected [such that events accumulate/backup, memory overflows, etc] then this could cause the issue.

I haven't figured it out yet, but setting the refresh to every ~32 cycles might use up a large percentage of the available EMIF bandwidth, because the refreshes are not instantaneous. I just don't know if we're eating 6/32 or 16/32 or 24/32 of the bandwidth... Don't know enough about exactly how long refreshes take to answer off the top of my head. But you can see the potential there for eating a huge chunk of bandwidth if this issue isn't corrected.. even if it's 6/32 that's too much.

The TRM for the chip

actually has a good explanation of the calculation:

I grabbed this from the Micron online datasheet for your part but you should check that it matches your copy:

64ms, 4096-cycle refresh (commercial and industrial)

16ms, 4096-cycle refresh (automotive)

If you are going to run at the temp range where you need the automotive spec make sure you use that faster refresh rate, because the charges do leak away faster at higher temperatures.

Then your calculation is either:

RR = 10MHz * 64ms / 4096 = 152

RR = 10MHz * 16ms / 4096 = 39

NB: interesting that if you are using the high-temp 16ms spec - because of running at 10MHz instead of 100MHz you almost have the same problem as *with* the HalCoGen issue. Didn't think about that yesterday because I did my own calculation w. 13.33Mhz and a 64ms number and got 208. The note I posted was for a more typical SDRAM clock at 100MHz but to be honest I need to fix that because you really cannot run the SDRAM that fast on these chips like RM48.

So the typical value for everyone else will be more like 760'ish for 64ms / 4096 at 50MHz.

But your value at 10MHz EMIF clock is going to be pretty low. If you need to use the 16ms number - a good conversation to have with your EE could be whether to even use SDRAM. At your EMIF speeds an async SRAM may be more appropriate; and these devices are not nearly as sensitive to correct initialization as SDRAM.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

I need the industrial, -40C to +80C, so 156 (not 152) above.

What does that mean compared to 250, which is what Halcogen calculated?

What does NB mean, as in "NB: interesting that..."?

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Sarah,

Ok 152 will hopefully not completely kill your performance. Will know better once we get a measurement on the refresh cycles.

If 250 is the first number that you see written to SDRCR in HalCoGen then it's computed by some formula that I do not understand: clock freq * 0.0002 / 8. I think we may have picked this as a 'default' to use prior to the PLL being enabled but it doesn't belong in this file.

if(refresh_period>31)

{

refresh_period = 31

}

To truncate to 8191 instead of 31 .. then I think you would see the correct result in the second write in your generated output file.

NB: is kind of like Note:

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Apologies for the delay Anthony, but I have been busy running tests. I have some interesting with a capital INTERESTING data points.

Clarification:

/**  -general clearing of register
*    -for NM for setting 16 bit data bus
*    -cas latency
*    -BIT11_9CLOCK to allow the cl field to be written
*    -selecting the banks
*    -setting the pagesize
*/   
    emifREG->SDCR   = (uint32)((uint32)0U << 31U)|                               	
                      (uint32)((uint32)1U << 14U)|                               	
                      (uint32)((uint32)2U << 9U)|  	
                      (uint32)((uint32)1U << 8U)|                                	
                      (uint32)((uint32)2U << 4U)|              	
                      (uint32)((uint32)elements_512);         	
/* wait for a read to happen*/
   buffer           = *PTR;                                	
   buffer           = buffer;
   emifREG->SDRCR   = 31U;	

/* USER CODE BEGIN (3) */
   // The temperature range is "industrial". Our temperature range is -15C to +65C.
   // Specification is therefore -40C to +80C (industrial), which means 64ms/4096.
   // Equation is SDRCR = 10 MHz * 64 * 1000 / 4096 or 156.25 or 156.
   // See June 15, 2016 post from Anthony
   //	  e2e.ti.com/.../1897844
   emifREG->SDRCR   = 156U;
/* USER CODE END */

The code above shows what Halcogen calculated and what I set. I used 156, not 152, as the calculation yields 156.25. How did you get 152?

Next, I did the following:

    .bss     : {} > RAM2
    .data    : {} > RAM2
    .sysmem  : {} > RAM2

    FEE_TEXT_SECTION : {} > FLASH0 | FLASH1
    FEE_CONST_SECTION : {} > FLASH0 | FLASH1
/*    FEE_DATA_SECTION : {} > RAMFEE*/
    FEE_DATA_SECTION : {} > RAM2

Basically, I set all sections: .bss, .data, FEE, and .sysmem to use SDRAM rather than internal RAM.

Mind you Anthony, prior to setting the SDRCR to 152 (default is 31), I was unable to malloc anything with all sections in SDRAM, .data section tests failed, and .bss section (first one passed).

I am able to successfully complete the first the malloc and .data tests, however I have issues with the .bss section. That works sometimes on my Hello World test but fails on my new .bss test.

A refresher on the .bss code.

// TEST
char testBss[25];
uint32 testData = 0xFFEEFFEE;
void *testHeap;
uint32 m_dwTaskSlice[32];

boolean testSdram2(sciBASE_t *sci, boolean doDisplay)
{
	char szDebug[111];
	strcpy(testBss, "Hello World!");
	if (0 == strcmp(testBss, "Hello World!"))
		sciDisplayTextExAll("    -> BSS 1 Success\r\n");
	else
		sciDisplayTextExAll("BSS 1 Failure");
	}

	memset((void *)m_dwTaskSlice, 0, sizeof (m_dwTaskSlice));
...
}

Here is a screenshot of CSS just past the copy operation.

Notice the result of the strcpy? Some characters are there, namely the first 3, but then the 4th character is null, the sixth character is a space (0x20), and the 8th character is a null. I switched to 8-bit hex view to look at the hex values, so that I know the exact value of the . character.

The problem seems to be one of timing.

I like your comment about asynchronous DRAM vs SDRAM. I had to look up the difference between the two. My research stated that asynchronous DRAM uses the /CAS line to synchronize the data to/from the processor, while SDRAM has two banks of memory, where one bank prefetches the data. Yes, that does seem much more timing and configuration sensitive. Maybe the slow 10 MHz clock is too slow for the part.

JUST ONE MORE THING ;-), less the cigar and thrench coat, of course

Prior to adding in the last new SDRAM2 test, the memset(), I let the code switch to C++. The reason is that I ran the SDRAM2 test and everything past. That never happened with all sections using RAM2. Whether I ran with CCS or uploaded the code via UniFlash, I always go to my test mode command prompt.

I switched to C++ to see what happens, and I got an interesting result. Code execution ALWAYS stopped at the exact same point, namely the line where I perform a memset. I allocated in one of my classes the following:

class myClass()
{
     ...

     uint32 myBssVar[32];

     ...

     void doIt();
}

myClass::doIt()
{
     memset(myBssVar, 0, sizeof (myBssVar));
}

Code execution always stopped executing the memset. I decided to do the same test in my SDRAM2 test. I found out that execution stopped in the exact same way. I checked the location of myBssVar and everything is fine, the variable in the class and the global uninitialized one in SDRAM2 both exist towards the end of the SDRAM section.

Now after adding the memset() line to the SDRAM2 test, I see that my testBss variable does not always have "Hello World!". Sometimes it does, but many times it does not.

We definitely affected the problem, but no Columbo (er. cigar).

Yes, I still have to give you some sample code.

I will read your last response and respond.

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

Anthony, Yelp!

The last change toasterified the board, because I placed the testSdram2() just after the main() entry point, but the memset() call hangs the CPU.

I tried erasing the board with UniFlash and programming/debugging with CCS, but to no avail. I am not quick enough to catch the CPU before it gets to main. How do I erase/debug/program the board?

Yes, I have "system reset on connect" on both CCS and Uniflash.

Thoughts?

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

When the part 'bricks' or turns into a toaster -- this is usually because some error is occurring right after reset.
Unfortunately the debug request *isnt* the highest priority exception so it can be tricky to take control of the part.

If you didn't move anything in your setup - and didn't do anything to 'damage' the part/board like over-voltage.. then the TEST CONNECTION should pass... But it's still something you should check off the list before proceeding.

Double click the .ccxml file that is your active target configuration. You should see a button 'Test Connection' on the 'Basic' tab. Click this button.

It will pop up a window and print a bunch of stuff - but scroll to the bottom. If it says "The JTAG DR Integrity scan-test has succeeded." you should be good to continue on to these next steps.

There are some good instructions on this post: e2e.ti.com/.../927980

But there's a lot of replies so here is an excerpt of the steps:
1] Start CCS and launch your target configuration.
2] In the debug windows, right click on your emulator and select "Show all Cores"
3] Expend Non-Debuggable Devices.
4] Right click on emulator/IcePick and select connect target. (This should work)
5] Right click on emulator/Dap and select connect target.
5.1] If this is working, go to 6
5.2] If this is not working, there is a lot of chance you have lock the part with AJSM.
6] Open a memory windows at address 0xFFFF_FF00 (system module)
7] write 0x00 at address 0xFFFF_FFFE This will generate a system reset.
8] In the debug windows, right click on emulator/CortexR0 and select Connect Target.
9] If this is ok, go to Tools->On-Chip Flash. Erase Option "Entire Flash" and click on "Erase Flash"
10] Disconnect, reset and restart your debugger as usual. It should work now.

11] At that point, the code in flash is forcing the device in some kind of lock mode. A deep debug is necessary.

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

Sorry actually I meant Asynchronous SRAM, for example this 2MByte chip from ISSI

www.digikey.com/.../4862174

Static RAM doesn't need to be refreshed - unlike Dynamic RAM. It costs more because it's usually 6 transistors per bit but you are not at the bleeding edge of density and it may not be an important price difference at all.

VERY interesting that the SDRCR value actually made an impact. You're right no Colombo moment yet, but it's a good clue.
I have been working on some microscopic soldering to get probe points on the SDRAM chip on one of our boards.
Should be able to get a logic analyzer on it tomorrow to verify that the code for the init isn't violating the requirements in the SDRAM datasheet.

There is a step in that intialization where a command is given to the SDRAM to configure itself. (for example type of burst, CAS latency, etc.) You don't see code for it, it's done in a hardware state machine after writing to the upper bits of SDCR, but if that were disrupted by the *PTR access it could really explain quite a lot of this.

I'll need to study the last post more carefully to see if I can spot anything in the code examples - thanks for posting.

Best Regards,
Anthony

0 Anthony F. Seely over 9 years ago in reply to Anthony F. Seely

TI__Guru 68920 points

Hi Sarah,

So not Colombo yet.. but this is also interesting.

I'd tried to correct the emif_SDRAMInit() code to:

/* SourceId : EMIF_SourceId_001 */
/* DesignId : EMIF_DesignId_001 */
/* Requirements: HL_SR334 */
void emif_SDRAMInit(void)
{
/* USER CODE BEGIN (2) */
/* USER CODE END */

   uint32 buffer;

   emifREG->SDTIMR  = (uint32)((uint32)0U << 27U)|
                      (uint32)((uint32)0U << 24U)|
                      (uint32)((uint32)0U << 23U)|
                      (uint32)((uint32)0U << 20U)|
                      (uint32)((uint32)0U << 19U)|
                      (uint32)((uint32)0U << 16U)|
                      (uint32)((uint32)0U << 12U)|
                      (uint32)((uint32)0U << 8U)|
                      (uint32)((uint32)0U << 7U)|
                      (uint32)((uint32)0U << 4U)|
                      (uint32)((uint32)0U << 3U);

 /* configure refresh rate*/
   emifREG->SDSRETR = (uint32)0U;  
   emifREG->SDRCR   = 208U;

/**  -general clearing of register
*    -for NM for setting 16 bit data bus
*    -cas latency
*    -BIT11_9CLOCK to allow the cl field to be written
*    -selecting the banks
*    -setting the pagesize
*/   
    emifREG->SDCR   = (uint32)((uint32)0U << 31U)|                               	
                      (uint32)((uint32)1U << 14U)|                               	
                      (uint32)((uint32)2U << 9U)|  	
                      (uint32)((uint32)1U << 8U)|                                	
                      (uint32)((uint32)2U << 4U)|              	
                      (uint32)((uint32)elements_256);         	
/* wait for a read to happen*/
   buffer           = *PTR;                                	
   buffer           = buffer;
/* USER CODE BEGIN (3) */
/* USER CODE END */
}

Where the 208 in SDRCR is my value to get 15.6us which is the time between one pair or refreshes, if you want to get all 4096 done in 64ms.

This is all based on a 13.3MHz clock so the number is about 30% larger than your 152...

Ok so here is a screenshot now of what occurs on the bus. I added CMD to decode the nCS,nCAS,nRAS,nWE,DQM,CKE lines.

I triggered the analyzer on the "Load Mode Register" command which should only be performed once, near the end of the hardware initialization sequence performed by the EMIF.

Then we can look back in time to see what occurred prior, and a bit ahead to see what occurs next..

This is what I captured:

Hopefully you can enlarge the image because I uploaded it as 1024 wide.

Now if I read the SDRAM datasheet (for the ISSI part on our HDK... but should be the same for Micron) it says:

1) A 100us delay is required prior to issuing any command other than COMMAND INHIBIT or a NOP.

I put a (1) on the drawing for this step; although below you will see it's not 100us...

2) with at least 1 command inhibit/nop command applied, and once the 100us delay has been satisfied, a PRECHARGE
command shoudl be applied.

I put a (2) on the drawing for this step..

3) this will leave all banks in an idle state after which at least 2 AUTO REFRESH cycles must be performed.

I put a (3). We give 8 auto-refresh commands but I think that is what our TRM says we do...

4) After the auto-refresh cycles are complete, the SDRAM is ready for mode register programming. the mode register should be loaded prior to applying any operational command . I put a (4) on the figure for LMR.

5) After the Load Mode Register Command at least TWO NOP commands must be asserted prior to any command.

an oops on this one - because it seems that we issue REF twice here. [marked 5 on the drawing]

6) ok so now we have one NOP ... then ACT which I think may be due to the read from *PTR

7) I think that the RD at 7 is the read from *PTR because it's a burst that is started and terminated after the first data word is returned..

Ok so just from the above, it looks like we have a problem in step 5 because we refresh instead of NOP after the LMR.

Not sure how critical this is or whether we can control it but more on this later...

Then the other issue is if we look PRIOR to the LMR zoomed out there are refreshes in the stream of NOPs

(where we are supposed to have 100us continuous w.o. any other command than NOP or INH)

But look at the time for the last NOP command - it matches the refresh period that I programmed into the updated emif_SDRAMInit().

EDIT: also it is slightly less time than the 665us I measured for the delay on that 'write' to *PTR which again makes some sense.
It looks like the value programmed into SDRCR *prior* to the SDCR write controls the delay that we want to be 100us...

This is what I'm thinking:

1) the initial write to SDRCR might need to be in the code to set a *long* refresh period - so that there is 100us at least without anything other than a NOP.

the problem I have with this is that for a 100MHz/10ns clock frequency, a value of 8191 for the refresh counter would only get you to 81.9us instead of 100us.

2) need to figure out if we can somehow suppress the refresh right after LMR

In any case this sheds some light onto two problems with the current startup code plus my attempted 'fixes' with values that are calculated correctly. And it also may explain why SDRCR is written twice in that code. So I don't think we're too far off now from a 'legal' initialization sequence but need to do some more thinking and testing of fixed code.

-Anthony

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Hey Anthony, I will read the post above shortly.

In creating a small project, which replicates the problem for you, I stumbled onto the linchpin of the problem. There may be other problems. The SDRCR may or may not be changed from the defaults, but the following is definitely a problem.

(I am still researching this topic, but I saw that you just posted, so wanted to update you, even though I am still in the midst of things. I got my SDRAM2 test to pass successfully and the C++ class, which perpetually failed to succeed.)

void doInitDiscrete()
{
	// Initialize ASYNC1, 0x60000000, for discrete intput with order the same as old CCM; method ReadDiscrete().
	emif_ASYNC1Init();

	// Initialize ASYNC2, 0x64000000, for discrete output with order the same as old CCM; method WriteDiscrete().
	emif_ASYNC2Init();

	// Enable the MPU unit to avoid extra writes.
	_mpuInit_();
}

I called the following inside main:

initialize GIO
initialize RTI
initialize SCI
initialize discrete: doInitDiscrete()
TI_Fee_Init()
_enable_IRQ()

I did not have a chance yet to see, which of the lines in the discrete input caused SDRAM to go haywire, but something inside that function or the placement of the call is what it seems like at the moment. ALSO, I am still using the 156 value for SDRCR, which definitely made a difference, although I have NOT had a chance yet to comment out the 156 and use the Halcogen value and keeping the discrete initialization commented out.

Like I said, a lot of things to test, so an interim results.

You might have some more insight to the individual discrete lines shown above.

(I took a brief look at what you wrote, looks interesting, will comment more later.)

Thanks,

Sarah

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

Interesting. So it may be the async init functions - because the EMIF is shared between the two modes.
Where did you call emif_SDRAMInit() in relation to initialize discrete: doInitDiscrete() ?

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Hey again Anthony,

Directly after vimInit() (line 298, User Code Begin 74) in sys_startup.c, which is where we moved the call to.

The discrete initialization gets called in main() at the location mentioned. Here is a code snapshot.

void main(void)
{
/* USER CODE BEGIN (3) */
	// Initialize: Modules
	doInitVim();		// Call first, so that other peripherals do not fire interrupts *before* the VIM is reconfigured.
//	emif_SDRAMInit();	// Initialize: 16MB (128Mb) SDRAM
	gioInit();			// Initialize: GIO
	doInitRti();		// Initialize: RTI (Real Time Interrupt, timer)
	doInitSci();		// Initialize: SCI (Serial Communicaton Interface)
	doInitAcbusok();	// Initialize: ACOK/BUSOK lines
	doInitAdc();		// Initialize: AD/C
//	doInitDiscrete();	// Initialize: Discrete In (ASYNC1, CE2CFG) and Discrete Out (ASYNC2, CE3CFG)
	doInitEeprom();		// Initialize: EEPROM

I can say that the discrete initialization function as part of the problem threw me for a loop. Never did I suspect that. Just, I took that along with all the other unnecessary initialization calls and code, so that we could concentrate on the SDRAM issue. I made a bloodbath to the main code. What I found out, though, was the totally stripped down code, worked, and worked repeatedly. I called that SDRAM2 test what seemed like a zillion times and no issues whatsoever.

I then had the unenviable task of putting things back, so I took things out one-by-one on yet another project. I can safely tell you that the discrete stuff was either the absolute last stuff to go or close to it. I figured the EEPROM, ADC, and the GIO lines caused grief. I was thinking stack issues. I thought of flash code size issues.

I did not realize the direct relationship between discrete and EMIF until you just told me, although testing proved it. My system, independent of board, does not like the discrete.

BTW, I still need to go through that de-toasterifying steps that you gave. I have a brick on my hands with that one board. To answer your question from there, I did not toast any component or the like. Merely, I called the discrete initialization function directly after main() call, as you see, and then immediately after that I did the memset(), basically SDRAM2 test, which threw a processor exception. The calls happen to quick for either CCS or UniFlash to initiate the erase prior to the main() code executing. Hopefully what you wrote will work. My manager is not happy with possibly having to replace the CPU on 3 boards that I so far brickified.

0 Anthony F. Seely over 9 years ago in reply to Anthony F. Seely

TI__Guru 68920 points

Hi Sarah,

So I was wrong about the first calculation of the RR value and now I understand the formula.

The TRM states for 'procedure B' [which is to be done in case the automatic 'release emif from reset' timings violate the SDRAM spec:

Procedure B — Following is the procedure to be followed if the SDRAM Power-up constraint was

violated:

1. Configure the desired EMIF_CLK clock frequency. The frequency of the memory clock must meet the

timing requirements in the SDRAM manufacturer's documentation and the timing limitations shown in

the electrical specifications of the device datasheet.

2. Program SDTIMR and SDSRETR to satisfy the timing requirements for the attached SDRAM device.

The timing parameters should be taken from the SDRAM datasheet.

3. Program the RR field of SDRCR such that the following equation is satisfied: (RR × 8)/(fEMIF_CLK) >

200 μs (sometimes 100 μs). For example, an EMIF_CLK frequency of 100 MHz would require setting

RR to 2501 (9C5h) or higher to meet a 200 μs constraint.

4. Program SDCR to match the characteristics of the attached SDRAM device. This will cause the auto-

initialization sequence in Section 17.2.5.4 to be re-run with the new value of RR.

5. Perform a read from the SDRAM to assure that step 5 of this procedure will occur after the initialization

process has completed. Alternatively, wait for 200 μs instead of performing a read.

6. Finally, program the RR field to match that of the attached device's refresh interval. See

Section 17.2.5.6.1 details on determining the appropriate value.

After following the above procedure, the EMIF is ready to perform accesses to the attached SDRAM

device. See for an example of configuring the SDRAM interface.

We were getting step 6 wrong because of the truncation to 31. So that needs to be fixed.

I think I mentioned though that my value 334 and your value 250? were some calculation I didn't understand

based on SDRAM Clock Frequency * 0.0002 / 8

So now reading Step 3 this makes sense - 0.0002 is 200us and /8 is so that 8*RR*f emif clock = 200us.

Also step 5 explains the *PTR read.

-Anthony

0 Anthony F. Seely over 9 years ago in reply to Anthony F. Seely

TI__Guru 68920 points

Sarah,

For your discrete IO didn't you have very *long* setup,hold,strobe counts? I wonder if they are long enough that you are missing refreshes. If you have them set for the max setup,strobe,hold that's on the order of 1/2 a refresh interval. So you could miss the refresh target by a lot if a long async access begins write before the time where a refresh should have begun..

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

No clue Anthony. Which Halcogen tab(s) would you like me to take a screenshot of, look at?

0 Anthony F. Seely over 9 years ago in reply to Anthony F. Seely

TI__Guru 68920 points

Here is the result following the above steps from the TRM more closely.

It *appears* that the result is that only a single refresh period of 25us is 'waited' for before the LMR command.

Whereas the TRM indicates the LRM will be delayed for 8 refresh periods.

I think because you have such a low frequency of the SDRAM clock we'll be able to come up with a number that meets 200us without any refreshes in between and still be below the 8192 max for RR. That may be a good stop-gap till this gets sorted.

Long term the 'alternate' in step 5 of waiting for 200us may be what's needed.

This is what I changed the code to for 13.3MHz.

I am going to try 334u*8 for the first value in SDRCR next. That will be 2672.

At 10MHz you could program this first value to 2000. Then still use 152 for the 2nd write to SDRCR near the end of the function...

/* SourceId : EMIF_SourceId_001 */
/* DesignId : EMIF_DesignId_001 */
/* Requirements: HL_SR334 */
void emif_SDRAMInit(void)
{
/* USER CODE BEGIN (2) */
/* USER CODE END */

   uint32 buffer;

   emifREG->SDTIMR  = (uint32)((uint32)0U << 27U)|
                      (uint32)((uint32)0U << 24U)|
                      (uint32)((uint32)0U << 23U)|
                      (uint32)((uint32)0U << 20U)|
                      (uint32)((uint32)0U << 19U)|
                      (uint32)((uint32)0U << 16U)|
                      (uint32)((uint32)0U << 12U)|
                      (uint32)((uint32)0U << 8U)|
                      (uint32)((uint32)0U << 7U)|
                      (uint32)((uint32)0U << 4U)|
                      (uint32)((uint32)0U << 3U);

 /* configure refresh rate*/
   emifREG->SDSRETR = (uint32)2U;
   emifREG->SDRCR   = 334U;

/**  -general clearing of register
*    -for NM for setting 16 bit data bus
*    -cas latency
*    -BIT11_9CLOCK to allow the cl field to be written
*    -selecting the banks
*    -setting the pagesize
*/   
    emifREG->SDCR   = (uint32)((uint32)0U << 31U)|                               	
                      (uint32)((uint32)1U << 14U)|                               	
                      (uint32)((uint32)2U << 9U)|  	
                      (uint32)((uint32)1U << 8U)|                                	
                      (uint32)((uint32)2U << 4U)|              	
                      (uint32)((uint32)elements_256);         	
/* wait for a read to happen*/
   buffer           = *PTR;                                	
   buffer           = buffer;
   emifREG->SDRCR   = 208U;

   /* USER CODE BEGIN (3) */
/* USER CODE END */
}

It's only 25us of solid "NOP" below before the trigger (red cursor) right after the end of the NOP at yellow cursor...

So it's still what RR is programmed to ...

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Sarah,

These tabs, for the two asynch chip selects that you are using for discrete IO:

If the delays are really long - like above where they are maximized - then combined with the very slow EMIF clock frequency but the

*constant* SDRAM refresh interval of 15.6us ... could be a conflict.

If you are not using HET or GIO these could probably be used for your discrete IO function. Would just need to understand it better.

Even if the SDRAM and ASYNC wind up working together at this 10MHz clock - it may be so bad performance-wise that it makes sense to

move discrete IO to another periperal just so the SDRAM isn't crippled.

0 Anthony F. Seely over 9 years ago in reply to Anthony F. Seely

TI__Guru 68920 points

Ok here is now 200us of nothing but NOP before the PRECHARGE, the 8 refreshes, and the LMR.

So please try this for your emif_SDRAMInit().

I have already changed the values for you to reflect 10MHz:

void emif_SDRAMInit(void)
{
/* USER CODE BEGIN (2) */
/* USER CODE END */

   uint32 buffer;

   emifREG->SDTIMR  = (uint32)((uint32)0U << 27U)|
                      (uint32)((uint32)0U << 24U)|
                      (uint32)((uint32)0U << 23U)|
                      (uint32)((uint32)0U << 20U)|
                      (uint32)((uint32)0U << 19U)|
                      (uint32)((uint32)0U << 16U)|
                      (uint32)((uint32)0U << 12U)|
                      (uint32)((uint32)0U << 8U)|
                      (uint32)((uint32)0U << 7U)|
                      (uint32)((uint32)0U << 4U)|
                      (uint32)((uint32)0U << 3U);

 /* configure refresh rate*/
   emifREG->SDSRETR = (uint32)0U;
   emifREG->SDRCR   = 2000U;

/**  -general clearing of register
*    -for NM for setting 16 bit data bus
*    -cas latency
*    -BIT11_9CLOCK to allow the cl field to be written
*    -selecting the banks
*    -setting the pagesize
*/   
    emifREG->SDCR   = (uint32)((uint32)0U << 31U)|                               	
                      (uint32)((uint32)1U << 14U)|                               	
                      (uint32)((uint32)2U << 9U)|  	
                      (uint32)((uint32)1U << 8U)|                                	
                      (uint32)((uint32)2U << 4U)|              	
                      (uint32)((uint32)elements_256);         	
/* wait for a read to happen*/
   buffer           = *PTR;                                	
   buffer           = buffer;
   emifREG->SDRCR   = 155U;

/* USER CODE BEGIN (3) */
/* USER CODE END */
}

(I recalculated 10MHz * 64ms / 4096 and got 155 not 152..)

Give this a shot for the SDRAM initialization..

NOTE: I don't know if you have this as an *option* available to you - but if you are stuck with the current architecture of Async & SDRAM sharing the bus, and if these problems are all due to timing - then simply increasing the device clock frequency [pure software change in PLL setting] might be enough to get you moving again with current hardware..

Also - if you didn't damage the 3 bricked boards - you should be able to recover them. Once you recover the 1st you'll be able to get the others unbricked more easily.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Anthony, I have values of 2, 5, 3, 2, 5, 3, 1 for the w setup, w strobe, w hold, r setup, r strobe, r hold, and TA fields respectively. I see these are cycles, so 2 cycles for write setup based on a 10 MHz clock. Are these long? What constitutes long, medium, or short?

What are the max setup, strobe, and hold cycles for my SDRAM? Where on the datasheet do I read maximum and minimum values?

You calculated 251 for the RR rate in the above post. Is the 251 value (or 31, or 152 or 156 or whatever value finally goes there) cycles? If so, then 2/5/3 is much less than 251. No?

What is the connection between async access and SDRAM refresh? Explain targets?

Here is the code that I have. Yes, I should change 152 to 251 and update the equation. You never gave me a value in the 300s. You just said use either commercial/industrial or automotive for the value.

   // The temperature range is "industrial". Our temperature range is -15C to +65C.
   // Specification is therefore -40C to +80C (industrial), which means 64ms/4096.
   // Equation is SDRCR = 10 MHz * 64 * 1000 / 4096 or 156.25 or 156.
   // See June 15, 2016 post from Anthony
   //	  e2e.ti.com/.../1897844
   emifREG->SDRCR   = 152U;

Sarah

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Anthony, I am finally catching up on all that you wrote. I got busy yesterday (Friday) with Raytheon most of the day.

You wrote:

Ok 152 will hopefully not completely kill your performance. Will know better once we get a measurement on the refresh cycles.

You keep writing 152, but the calculation according to the TRM, which you copied is

10 MHz * 64 ms / 4096 = 156.25, which should therefore be rounded up to 157.

Where does 152 come from?

EMIF.c sets SDRCR to 2 different values: first 250 and then 31.

I know that you understand, where Halcogen got the 250 value from based on yesterday's post.

How many different formulas for the calculation of SDRCR are there anyways?

What final value for SDRCR do I set at the end of the initialization section for EMIF SDRAM do I use: 157 or 251?

Do I do anything after setting SDRCR, such as reading SDRAM again with another buffer = *PTR call?

Do I STILL need to update the SDRCR now that we know that the discrete signal caused issues with the SDRAM?

You wrote:

The second computation is actually performed correctly by HalCoGen but then the result is always truncated to 31 instead of 8191. If you edit the XML file C:\ti\Hercules\HALCoGen\v04.05.02\drivers\TMS570LS3137ZWT\EMIFv000.xml (make sure the version matches your version) and change the 'if' statement at line 104 if(refresh_period>31) { refresh_period = 31 }. To truncate to 8191 instead of 31 .. then I think you would see the correct result in the second write in your generated output file.

What version do we talk about? I am using the RM48L952ZWT processor, but if I go into that folder, I do not see the EMIFv000.xml file, only in the TMS570L3137ZWT folder do I see that file.

What do I change what to and why? I do not follow you here. Is this comment necessary?

What is TMS570L3137ZWT? Is that another processor like the RM48L952ZWT?

Why does the TMS570L3137ZWT have many files and sub-directories inside of it, but the RM48L952ZWT only has one XML file?

Sarah

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Anthony, on 6/17 @ 12:04 AM you wrote:

There is a step in that intialization where a command is given to the SDRAM to configure itself. (for example type of burst, CAS latency, etc.) You don't see code for it, it's done in a hardware state machine after writing to the upper bits of SDCR, but if that were disrupted by the *PTR access it could really explain quite a lot of this.

What and where initialization? Do you speak in emif.c::emif_SDRAMInit()?

As TI wrote the function to initialize the routine and then added the "buffer = *PTR" command (as opposed to waiting 200us) to initialize the part after setting the values, such as SDRCR, then how can anything disrupt that initialization, especially since you indicated to me that interrupts are not enabled and that task, *PTR, cannot get interrupted. Are you saying that now it can?

(Yes, I understand the problem is discrete signals, but I am finally catching up on your posts and want to understand.)

Sarah

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Slowly but surely I am catching up. On July 17 @ 8:51 PM, you wrote

Where the 208 in SDRCR is my value to get 15.6us which is the time between one pair or refreshes, if you want to get all 4096 done in 64ms. This is all based on a 13.3MHz clock so the number is about 30% larger than your 152...

I see how you obtained 15.6us (0.064 / 4096), so that makes yet another (third) method to obtain the SDRCR value?

By overriding the value in the first instance, your change will not stay across a Halcogen save and regeneration of the code, but for testing I understand, a test.

If you are testing my configuration and issue, not that you were able to replicate the problem (you did not enable discrete), then why did you use the 13.3 MHz clock, when the EE on my project requested I use 10 MHz.

Also, more importantly, is there really a difference between the default value of 250 and your 208? I can see setting something like 7000 or 2, which would be a huge change from 250, but 208 and 250 seem almost identical to each other.

Thank you for the screenshots of the logic analyzer. I would need to study details of SDRAM and signals to get a better grasp. Memory for me is like salami. I just buy it, never involved in the intricacies of making of it and how one particle needs to be next to another pinkish particle and how this spice interacts with that spice and at such and such temperature. One has to use such and such tool, because a wooden spoon does that.

My EE would probably get more out of the graph, but I do understand a bit. I learned yesterday that in asynchronous RAM the /CAS line controls data moving in and out of RAM, whereas SDRAM has two banks and does not need the /CAS line. I presume that is meant by the strobe parameter in Halcogen or clocking data in and out of RAM.

I am trying to understand your 2 problems.

Problem 1:

After the Load Mode Register Command at least TWO NOP commands must be asserted prior to any command. An oops on this one - because it seems that we issue REF twice here. [marked 5 on the drawing] 6) ok so now we have one NOP ... then ACT which I think may be due to the read from *PTR 7) I think that the RD at 7 is the read from *PTR because it's a burst that is started and terminated after the first data word is returned.. Ok so just from the above, it looks like we have a problem in step 5 because we refresh instead of NOP after the LMR.

You expected in step 5 two clocks of NOP instead of the two clocks of REF (refresh)

Problem 2:

You said "look PRIOR to the LMR zoomed out", but the LMR is step 4.

Just prior to step 4 are your "2 AUTO REFRESH cycles". Does that mean 4 clocks per AUTO REFRESH cycle? How does one arrive at that number, given 13.3 MHz (in your case) clock frequency, 64 ms refresh time, and 4096 cycles of something?

What you really speak of are the 100us of NOP instructions just prior to the PRE command?

I do not see any zoom out possible on the cycles to the left of PRE. You show distinctly 6 clocks of NOP. How many milliseconds is one clock? Would that just be 1/13.3MHz or 75.19 ns. That would mean 1,329,964 clocks of NOP. True?

Thinking 1:

What equation do you use? So far, I know:

RR = f[EMIF_CLK] * t[Refresh Period] / n[cycles].

As the SDRAM datasheet dictates the refresh period and cycles, what equation do you use? Please refresh my memory.

Thinking 2:

"need to figure out if we can someow suppress the refresh right after LMR"

Since we are talking about your HDK and test, the better question to ask is does your SDRAM work? Are you just creating technical speak for something that works perfectly on your board? That is the real question. If your SDRAM tests perform flawlessly, then your board is fine, no issues anywhere except your understanding. If your SDRAM does not work fine, then the question is why, and that is when maybe a discussion using a logic analyzer comes in handy.

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

On June 17th @ 9:19 PM, you updated your equation to "RR * 8 / fEMIF_CLK. Plugging in my numbers yields exactly 250, which is the value Halcogen placed in the first SDRCR.

Are you just saying that Halcogen was correct here?

If so, what about the 31 that Halcogen calculated in the second setting of SDRCR? Does that still need to go to 157 based on the TRM formula?

As I asked earlier, there seems to be an awful lot of formulas, procedures, and whatnot governing the refresh rate

0 Sarah Weinberger over 9 years ago in reply to Anthony F. Seely

Expert 1915 points

Hi Anthony,

On Jun 17, 2016 9:55 PM you said:

If you are not using HET or GIO these could probably be used for your discrete IO function. Would just need to understand it better.

Even if the SDRAM and ASYNC wind up working together at this 10MHz clock - it may be so bad performance-wise that it makes sense to

move discrete IO to another periperal just so the SDRAM isn't crippled.

I am NOT using HET, but I am using GIO.

Given current hardware, how do I move the discrete IO to another peripheral, so that SDRAM stays working and performance does not suffer? What did you have in mind?

On Jun 17, 2016 10:04 PM you said:

So please try this for your emif_SDRAMInit(). I have already changed the values for you to reflect 10MHz:

You keep changing the emif_SDRAMInit() function altering the refresh rate register, SDRCR. How do these changes affect the SDRAM issue caused by the discrete IO peripheral initialization?

Changing the value of SDRCR to 2000 at the first instance will get erased, whenever I regenerate code in Halcogen.

What is an option available to me? I can obviously make whatever software changes you want. With respect to hardware, that is a physical object, already manufactured and everything. I am using it. Changing hardware is a length and costly procedure that will not go over well, especially with the customer demanding completion of the project in short order. I can probably make a quick change, such as changing the value of a resister or something. The EE has done that on this project. You must be talking of rewiring the discrete IO to use HET.

Also - if you didn't damage the 3 bricked boards - you should be able to recover them. Once you recover the 1st you'll be able to get the others unbricked more easily.

One of the 3 boards, #2, sits on my desk. I will get to that either later today or more than likely tomorrow. I want to get a bit further on another task and get to a sort of working command prompt, which means through initialization and with serial communication working in both directions, getting there. There is definitely no hardware damage on the boards of any kind. The problem was that I wrote code, which shortly after reset causes a crash. Had the crash occurred 10-seconds after reset, then that would have been enough time to erase the chip. Hopefully, your procedure will work. I will let you know.

0 Sarah Weinberger over 9 years ago in reply to Sarah Weinberger

Expert 1915 points

Hi Anthony, on Jun 17, 2016 9:55 PM,

If the delays are really long - like above where they are maximized - then combined with the very slow EMIF clock frequency but the *constant* SDRAM refresh interval of 15.6us ... could be a conflict.

I believe that I now understand. My EE has me using short delays 2, 5, 3, 2, 5, 3, 1 going down the first column, so I am using nowhere near the long cycles that you showed (15, 63, 7, 15, 63, 7, 0). The only exception is that I am using 1 for TA, whereas your example uses 0.

As mentioned earlier, discrete IO on the HET, not that I asked the EE on this question, would mean a new spin of the board, something that would cause huge delays, so if a new spin is what you mean, then that is not an option.

The board does not use discrete IO all that often, only once in a while. Is it possible to toggle between discrete IO and just not use SDRAM, not that I see that possible with .bss, .data, .sysmem, and .fee all on the SDRAM. I would not be allocating new memory. Maybe a short block of discrete saving values to the stack. That is just a thought.

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Hi Sarah,

I'll try to summarize here a few key points:

1) There are 2 different formulae for the RR field of SDRCR with two different meanings depending on the context.

a) prior to SDRAM initialization
the RR value programmed here is used for a different purpose. it isn't tied to refresh rate calculations.
it tied to giving the SDRAM 200us of NOP time before issuing the LMR command.

b) after SDRAM configuration and after prior to *PTR read.
here the RR value is actually tied to the refresh rate calculation

2) The first formula for RR, prior to initialization - is:
clock frequency * 0.0002 / 8
in HalCoGen and in the TRM.

the meaning is: 0.0002 = 200us. multiply by the clock frequency to get a cycle count. divide by 8 because the TRM
says to do so.. the controller will wait 8 * the value of RR.

3) the problem I found with this first formula is that the controller doesn't wait 8*RR like the TRM indicates.
I actually dug up the IP spec and found a clause that it only waits 8 * RR on the initialization that happens automatically
after a reset.

A software triggered initialization - as is being done by HalCoGen emif_SDRAMInit().. bypasses the 8*RR delay.
(but appears to insert 1*RR)

4) This means that a workaround for you is to program the first value as:
clock frequency * 0.0002 ... it will give you 200us of NOP this way. And with your clock freqency of 10MHz, the value
you get will be 2000 and fit inside the RR register's maximum range of 8192.

for others who run EMIF faster we need to do more investigation...

5) The second value programmed into RR after initialization is tied to the SDRAM refresh rate.
There is only one way to compute this:

clock frequency * refresh interval / # of refreshes

Now 64ms for a refresh interval and 4096 refreshes needed = 1 refresh per 15.6us.

So sometimes as a shorthand I may have written 15.6us * clock frequency but that's
just a shorthand.

6) I will check the datasheet but I may have missed that there is a -1 term to the calculation... sometimes a value of 0 = 1 cycle, value of 1 = 2 cycles, etc.

7) If your async delays for setup, hold, strobe are short (2, 3, ...) then yes I agree these would be short compared to the refresh interval and should be fine.
In the internal IP spec I also found mention of a problem if the async access time is long compared to the refresh interval. So it's a known hazard but very unusual to occur in the real world.

8) There is also a counter mentioned in the spec that keeps track of the # of missed refreshes. When this count gets too high, it raises the priority of the refresh versus other accesses until the SDRAM controller catches up. So it does try to allow for bursts of other accesses (async) where maybe you miss the 15.6us refresh - as long as in between it can catch up. The SDRAM could work fine if you did all 4096 refreshes at once as long as you do them every 64ms. So that 15.6us is just trying to space the refreshes evenly over time---a nice target but not itself a requirement.

I am a bit doubtful that dotting the i's and crossing t's here on meeting the SDRAM spec will give you much more improvement. I hope it will but suspect maybe it won't But we should get this into spec compliance even if it doesn't fix the problem so we can be sure it's not contributing in any way.

0 Anthony F. Seely over 9 years ago in reply to Sarah Weinberger

TI__Guru 68920 points

Sarah Weinberger said:

What version do we talk about? I am using the RM48L952ZWT processor, but if I go into that folder, I do not see the EMIFv000.xml file, only in the TMS570L3137ZWT folder do I see that file.

What do I change what to and why? I do not follow you here. Is this comment necessary?

What is TMS570L3137ZWT? Is that another processor like the RM48L952ZWT?

Why does the TMS570L3137ZWT have many files and sub-directories inside of it, but the RM48L952ZWT only has one XML file?

Sarah - this is an artifact of HalCoGen and of our part numbering schemes.

The TMS570LS3137ZWT is the automotive grade, big endian version of the RM48L952ZWT.

HalCoGen inherits most of the files for RM48L952ZWT from the TMS570LS3137ZWT driver folder. The EMIF configuraiton file is one that is inherited. It is the same IP and same silicon so instead of duplicating the files from one folder to another, the RM48 folder references the files in the TMS570 folder.

You normally shouldn't see this - it's behind the curtains stuff...

You are absolutely correct that the value for SDRCR should be 157 cycles. Just need to check the datasheet to see if 157 cycles means RR=157 or 156... as mentioned sometimes the register value of 0 means 1 cycle, register value 1 means 2 cycles, and so on.

I'm not sure where 152 got into my head. Was using a calculator but I probably did something stupid.

It's clear though that 64ms/4096 = 15.6us so multiplying by 10.000MHz is going to give you a number on the order of 156 not 152.

I tried to explain in the last mail but again:

a) prior to SDRAM initialization - this means before the write to SDCR - you need to have RR set to 2000 to give you an initialization delay of 200us of NOP.

b) after the SDRAM initialization and after the read of *PTR, then you set the RR field of SDRCR to the actual 'refresh' value which will be 157 or 156..

The logic controller uses that RR field for two different purposes - which is why it's confusing.

-Anthony

Arm-based microcontrollers

Arm-based microcontrollers forum

How to determine stack/RAM usage