TMS320C6678: There is a certain delay between the CPU and external devices, which may be due to internal limitations of the CPU

?? ?

Prodigy 95 points

Part Number: TMS320C6678
Other Parts Discussed in Thread: AMC7812

if(cs == 0) {

spiRegs->SPIDELAY = (8 << CSL_SPI_SPIDELAY_C2TDELAY_SHIFT) |

(8 << CSL_SPI_SPIDELAY_T2CDELAY_SHIFT); //(8+1)*6ns=54ns

/* default chip select register */

spiRegs->SPIDEF = CSL_SPI_SPIDEF_RESETVAL; //T2C END TO CHIP SELECT ; C2T CHIP SELECET TO START

} else if(cs == 1) {

spiRegs->SPIDELAY = (6 << CSL_SPI_SPIDELAY_C2TDELAY_SHIFT) | //7*6ns=42ns 4*6ns=24ns

(3 << CSL_SPI_SPIDELAY_T2CDELAY_SHIFT);

}

The delay of C2T and T2C can be understood as chip selection time and chip selection hold time, resulting in clock after time or clock end time delay.

But my concern is why the WDELAY time in the timing chart is so large, as shown in the picture at 780ns.
In the DEMO program, SPI_SPIFMT.WDELAY does not have a time set, and the default value of the register is 0. And with the default value of 0 for SPI_SPIDAT_1 WDEL, why does the CS switching time for each frame of data take 780ns?
Now the core board circuit is fully encapsulated, not open to the public, and cannot be rectified. We directly introduce SPI data cables from the core board circuit for timing testing. Now I want to know why the hardware circuit is not executed according to the software settings?
Our project now requires an interruption time of 3.3us to complete multiple SPI controls, and this delay in project design cannot be accepted

Suspect that there is a delay when the CPU accesses the external device. Every time the transmission is retransmitted, there is a certain delay between the CPU and the external device. Is it due to internal limitations of the CPU.

over 1 year ago

0 Shankari G over 1 year ago

TI__Mastermind 25535 points

Dear Customer,

Let me experiment the SPI of C6678 and let me come up with the number of CPU cycles taken and the time measurement during SPI transmission.

Expect results in a day or two.

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

HI Shankari G,

pdk_c6678_1_1_2_6/packages\ti\platform\env6678l\paltform_lib\src

spi_xfer
(
uint32_t nbytes,
uint8_t* data_out,
uint8_t* data_in,
Bool terminate
)
{
uint32_t i, buf_reg;
uint8_t* tx_ptr = data_out;
uint8_t* rx_ptr = data_in;

/* Clear out any pending read data */
SPI_SPIBUF;

for (i = 0; i < nbytes; i++)
{
/* Wait untill TX buffer is not full */
while( SPI_SPIBUF & CSL_SPI_SPIBUF_TXFULL_MASK );
/* Set the TX data to SPIDAT1 */
data1_reg_val &= ~0xFFFF;
if(tx_ptr)
{
data1_reg_val |= *tx_ptr & 0xFF;
tx_ptr++;
}

/* Write to SPIDAT1 */
if((i == (nbytes -1)) && (terminate))
{
/* Release the CS at the end of the transfer when terminate flag is TRUE */
SPI_SPIDAT1 = data1_reg_val & ~(CSL_SPI_SPIDAT1_CSHOLD_ENABLE << CSL_SPI_SPIDAT1_CSHOLD_SHIFT);
} else
{
SPI_SPIDAT1 = data1_reg_val;
}

/* Read SPIBUF, wait untill the RX buffer is not empty */
while ( SPI_SPIBUF & ( CSL_SPI_SPIBUF_RXEMPTY_MASK ) );
/* Read one byte data */
buf_reg = SPI_SPIBUF;
if(rx_ptr)
{
*rx_ptr = buf_reg & 0xFF;
rx_ptr++;
}
}
return SPI_EOK;
}

1、May I ask what are the functions of while (SPI-SPIBUF&CSL_SPI-SPIBUF-TXFULL_MASK) and while (SPI-SPIBUF&(CSL_SPI-SPIBUF-RXEMPTY-MASK) in this function?
2、Is it necessary to add a waiting time to ensure SPI data synchronization for reading and writing data?

0 ?? ? over 1 year ago in reply to Shankari G

Prodigy 95 points

Is there a delay between multiple bytes in SPI frame data？

0 Praveen Rao over 1 year ago in reply to ?? ?

TI__Mastermind 48433 points

The engineer responsible is currently out of the office. Please expect a 1-2 day delay in response.

Thanks.

0 Shankari G over 1 year ago in reply to ?? ?

TI__Mastermind 25535 points

Dear Customer,

I have finished my experiments on SPI READ and SPI WRITE.

Calculated the number of CPU cycles consumed.

Please find the details below.

SPI READ:-

========

Experimented with 65536 bytes of data-read. It takes "0.066 seconds"

Started at cpu cycle = 599574093 cycles

Ended at cpu cycle = 666070777 cycles

Total_cpu_cycle of [SPI Read of 65536 bytes ] = 66496684 cycles

In my C6678 TI EVM - The DSP is running at a core frequency of 1000 MHz.

(DSP core frequency = 1000 MHz)

Time = 1 / Freq ( General formula )

= 1 / 1000 MHz ( DSP core frequency )

= 0.001 us ( Micro seconds)

1000000000 cycles = 1 sec

=>66496684 cycles = 0.066 sec

SPI WRITE:-

=========

Experimented with 65536 bytes of data-read. It takes "1.007 seconds"

Started at cpu cycle = 1421661660 cycles

Ended at cpu cycle = 2429013489 cycles

Total_cpu_cycle of [ SPI Write of 65536 bytes ] = 1007351829 cycles

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

The test result above reads 65536 bytes and uses 0.066s
Is a byte 8 bits?
Does the above data test whether CS has been consistently maintained? Switching time without CS.

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING CUI said:
Is a byte 8 bits?

yes.

XING CUI said:
Does the above data test whether CS has been consistently maintained? Switching time witho

CS, you mean chip select? ---> yes. It is.

Interfaced with NOR memory------> PLATFORM_DEVID_NORN25Q128 0xBB18 /**< 16MB NOR Flash */

SPI Flash --- "NOR 128M-bit N25Q128A21BSF40F"

NOR Device:
p_device->device_id = 47896
p_device->manufacturer_id = 32
p_device->width = 8
p_device->block_count = 256
p_device->page_count = 256
p_device->page_size = 256
p_device->spare_size = 0
p_device->handle = 47896
p_device->flags = 0
p_device->bboffset = 0
platform_device_close(handle=0xbb18) called
platform_device_open(deviceid=0x50,flags=0x0) called

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Hi,Shankari G

Thank you for your test results.
The estimated time to read a data is 66000000/65535=1007ns.
nor flash clk in the code is 25M, period 40ns? In theory, reading a byte (8bits) =(8+2)*40=400ns, why does the CPU take 1007ns?
How long does it take? Is it a delay between bytes? In our tests, we found not only byte delay but also CS switching delay(780ns).

Test Method When the NOR Flash continuous frame write address CMD=0x9f is simulated to write data, the oscilloscope waveform is read.

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

I have not looked into the theoretical value.

Let me relook.

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Shankari G,

Thank you very much.

Due to the developed product function, the SPI (30M SPI CLK) control TPIC2060 (TI motor IC) was implemented five times in the interrupt cycle (3.3us), and the interrupt execution timeout was found in the test.
If this issue is resolved, subsequent functionality cannot be implemented. The project is aborted, please help analyze and solve it.

Regards

Xing

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

I am planning to do the following in the upcoming days. You can also try the same.

1. DSP frequency to set - From 1000 MHz to 1200 MHz.

2. Increasing the "SPI_MAX_FREQ 25000000"

3. Altering the SPI paremeters like Delay settings etc..

I will let you know, if I observe any improved data rate in SPI read and write.

Thanks for your patience.

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Shankari G,

The holiday has ended and I will start working today.

Because it was my first time encountering DSP ICs, there were many things I understood.

The following test is only my own testing method and results, but it must be correct. Please understand!

In the instruction manual of C6678 IC:

Eight TMS320C66x DSP Core Subsystems (C66x CorePacs), Each with – 1.0 GHz, 1.25 GHz, or 1.4 GHz C66x

May I ask why the CPU frequency is not set to 1250MHz?

In code testing：

Device Speed Register (DEVSPEED)

#define DEVSPEED (*(unsigned int *)0x026203F8u)

Device speed grade = 1250MHz.

Software code settings:

DSP frequency to set 1250 MHZ

Increasing the "SPI_MAX_FREQ 10000000"

SPI scalar = ((spi_iclk / freq) - 1 )& 0xFF;

spi_iclk =10M，freq=gDSP_Core_Speed_Hz/6，

unsigned int gDSP_Core_Speed_Hz= 1250000000;

SPI PRESCALE= 19

spiRegs->SPIFMT[0] = (8 << CSL_SPI_SPIFMT_CHARLEN_SHIFT) |
(scalar << CSL_SPI_SPIFMT_PRESCALE_SHIFT)

The SPI CLK under the oscilloscope changes to 8.3333MHz. SPI CLK did not approach 10MHz

Regards

Xing

0 XING CUI over 1 year ago in reply to XING CUI

Prodigy 161 points

DSP frequency to set 1200 MHZ

Increasing the "SPI_MAX_FREQ 25000000"

SPI_init: SPI PRESCALE= 7

The SPI CLK under the oscilloscope changes to 20.833MHz.

0 XING CUI over 1 year ago in reply to XING CUI

Prodigy 161 points

The delay between bytes (8 bits) is 690ns, and the delay for a data frame ( CS switching delay )is also about 690ns.

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

As mentioned in my previous post, I am altering the SPI delay and the module clock and experimenting...

Will let you know the results soon.

Regards

Shankari G

0 Shankari G over 1 year ago in reply to Shankari G

TI__Mastermind 25535 points

Hey XING,

Of course, we can set the DSP core frequency as 1250 MHz.

Between, Good news to you.

I could achieve upto some 490 ns for SPI READ.

Will give the details shortly after running few tests with confirmation.

Regards

Shankari G

0 Shankari G over 1 year ago in reply to Shankari G

TI__Mastermind 25535 points

Hey XING,

Theoretical calculation :-

SPI module clk in the code is 25M, i.e, period ----> 40ns

1 / 25000000 Hz = 40 Nano seconds for 1 bit of transmission.

Theoretically, for 65536 bytes ---> 65536 x 8 x40 ns = 20971520 ns

1250000000 cycles = 1 second ----> when the DSP core runs at 1250 MHz.

26214400 cycles = 20971520 ns

i.e., 26214400 CPU cycles for 65536 bytes of transmission - OR - 40 Nano seconds for 1 bit

Practical Data Rate:-

SPI READ TEST - CPU CLOCK CYCLE - Measurement
==============================================

Started at cpu cycle = 598005431 cycles
Ended at cpu cycle = 647161553 cycles
Total_cpu_cycle of [SPI Read of 65536 bytes ] = 49156122 cycles

1250000000 cycles = 1 second ----> when the DSP core runs at 1250 MHz.

49156122 cycles = 0.039 seconds for 65536 bytes of SPI read.

======> ( 1 / 1250000000 ) x 49156122 = 0.039 seconds. ---- > 65536 bytes

=====> 0.039 seconds for 524288 bits.

=============> 0.039 / 524288 = 74 ns for 1 bit --- > practically

Here, out of 49156122 cycles = 26214400 + CPU cycles utilized / consumed to execute the software code ( in terms of instructions to do the SPI transmission )

At the max, I can achieve 0.039 seconds for 65536 bytes. OR in other words, 1.6 MB /sec or 1680 KB/ sec ---> SPI READ with 8 byte as Char length with NOR device, N25Q128A21BSF40F.

---

As far as I could improve, I could achieve only this much. It does not mean this is an idle one.

There might be possibilities to optimize the driver code, which I leave it to the customer's choice.

---

List of changes I did....

#define SPI_MODULE_CLK 250000000
#define SPI_MAX_FREQ 65000000 /* SPI Max frequency in Hz */

1. Removed the spi_delay() function through out the code.

2. Removed the register settings for Delay in spi_claim()

// SPI_SPIDELAY = (8 << CSL_SPI_SPIDELAY_C2TDELAY_SHIFT) |
// (8 << CSL_SPI_SPIDELAY_T2CDELAY_SHIFT);

3. changed the following in spi_claim () ( Please note, I used Chip select ---> 0 )

if ( cs == 0) {

// SPI_SPIFMT0 = (8 << CSL_SPI_SPIFMT_CHARLEN_SHIFT) |
// (scalar << CSL_SPI_SPIFMT_PRESCALE_SHIFT) |
// (CSL_SPI_SPIFMT_PHASE_DELAY << CSL_SPI_SPIFMT_PHASE_SHIFT) |
// (CSL_SPI_SPIFMT_POLARITY_LOW << CSL_SPI_SPIFMT_POLARITY_SHIFT) |
// (CSL_SPI_SPIFMT_SHIFTDIR_MSB << CSL_SPI_SPIFMT_SHIFTDIR_SHIFT);

SPI_SPIFMT0 = (8 << CSL_SPI_SPIFMT_CHARLEN_SHIFT) |
(scalar << CSL_SPI_SPIFMT_PRESCALE_SHIFT) |
(CSL_SPI_SPIFMT_PHASE_NO_DELAY << CSL_SPI_SPIFMT_PHASE_SHIFT) |
(CSL_SPI_SPIFMT_POLARITY_LOW << CSL_SPI_SPIFMT_POLARITY_SHIFT) |
(CSL_SPI_SPIFMT_SHIFTDIR_MSB << CSL_SPI_SPIFMT_SHIFTDIR_SHIFT);

----

-------

For SPI write, I could achieve upto 0.75 seconds for 65536 bytes i.e., 87 KB/sec.

-----( PS: - Not much attention given for SPI write compared to SPI read. )

SPI WRITE TEST - CPU CLOCK CYCLE - Measurement
==============================================
Started at cpu cycle = 1385879698 cycles
Ended at cpu cycle = 2331075093 cycles
Total_cpu_cycle of [ SPI Write of 65536 bytes ] = 945195395 cycles

1250000000 cycles = 1 second ----> when the DSP core runs at 1250 MHz.

945195395 cycles = seconds for 65536 bytes of SPI read.

======> ( 1 / 1250000000 ) x 945195395 = 0.75 seconds.

List of changes:

Removed the delay in nor_write() function.

// loopCount = 4000; //shankari
// loopCount = 400;
// while (loopCount--) {
// asm(" NOP");
// }

---

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Hi Shankari G,

Thank you very much for testing the data.

Please help me check below to confirm if the test results are understood correctly?

Debugging conditions:

when the DSP core runs at 1250 MHz.

SPI module clk in the code is 25M, i.e, period ----> 40ns

Final optimization test results:

1、Total_cpu_cycle of [SPI Read of 65536 bytes ] = 49 156 122 cycles

49156122 cycles = 0.039 seconds for 65536 bytes of SPI read.

======> ( 1 / 1250000000 ) x 49156122 = 0.039 seconds. ---- > 65536 bytes

=====> 0.039 seconds for 524288 bits.

=============> 0.039 / 524288 = 74 ns for 1 bit --- > practically

2、Total_cpu_cycle of [ SPI Write of 65536 bytes ] = 945195395 cycles

945195395 cycles = seconds for 65536 bytes of SPI read.

(I feel like there was a typo in this part. Should it be like this:

945195395 cycles = 0.75 seconds for 65536 bytes of SPI write. )

======> ( 1 / 1250 000 000 ) x 945 195 395 = 0.039 seconds.

( ( 1 / 1250 000 000 ) x 945 195 395 = 0.075 seconds. )

=============> 0.075 / 524288 = 143 ns for 1 bit --- > practically(Should this be the case here?)

Regards

Xing

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

(1/1250000000 ) x 945195395 = 0.75 seconds. ==>

It cannot be 0.075 seconds.

:-)

Please re-do the calculation.

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Hi Shankari G,

I'm really sorry, I miscalculated.

Is the test result like this?

1、Total_cpu_cycle of [SPI Read of 65536 bytes ] = 491 56 122 cycles

49156122 cycles = 0.039 seconds for 65536 bytes of SPI read.

======> ( 1 / 1250000000 ) x 49156122 = 0.039 seconds. ---- > 65536 bytes

=====> 0.039 seconds for 524288 bits.

====> 0.039 / 524288 = 74 ns for 1 bit --- > practically

2、Total_cpu_cycle of [SPI Write of 65536 bytes ] = 945 195 395 cycles

945195395 cycles = 0.75 seconds for 65536 bytes of SPI read.

======> ( 1 / 1250000000 ) x 945195395 = 0.75 seconds. ---- > 65536 bytes

=====> 0.75 seconds for 524288 bits.

====> 0.75 / 524288 = 1430 ns

Regards

Xing

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

YES, XING .

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Hi Shankari G,

The test results of C6678 platform show that the SPI transmission function is normal, but the transmission speed is not the theoretical calculation speed. According to the needs of the company project, I want to solve the problem of SPI transmission delay. Therefore, I hope TI technology can provide some solutions or continue to analyze the problem, and we will find ways to solve the problem after the specific positioning of the problem.

According to the above test results:

N25Q128A21BSF40F ic instruction manual Read Data Bytes (READ)(SPI_NOR_CMD_READ) timing:

According to the figure above, there is no CS switch when DSP continuously sends data. The reason why 75ns sends 1 bit is probably due to the delay between every 8 bits of data, which is confirmed by the SPI timing waveform in the picture of the chat record above.

In theory, there is a delay between each byte: 75-40=35ns,35*8=280ns

In spi_claim() SPIFMT[0].wdelay is not set and should be the register default of 0, so there should be no delay between bytes.

Now if the DSP operates at 1GHZ and the SPI peripheral works at 1/6 frequency division, the two operating frequencies are not the same. To ensure synchronization, SPI_SPIBUF.TXFULL and SPI_SPIBUF.RXEMPTY bits are added to the spi_xfer () function to wait for judgment bits before SPI data is read and written. Is that the mechanism?

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

I understand your point....

There is no SPI internal expert in TI now for this older device- C6678 to work along with you to further analyze..

The current SPI write data rate of 1.44 Micro seconds are not sufficient for you ? as your project interruption time requirement is 3.3us ? ( as per your first original port above? )

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Hi Shankari G,

Due to the high real-time requirements of the developed products, 3.3us cycle interruption was generated using the DSP internal timer, during which four TPIC2060 IC data writes and one AMC7812 read or write were scheduled to be completed.

Now our test results for this problem are the same. The reason for the delay is basically the time spent waiting for the change of SPI_SPIBUF.TXFULL and SPI_SPIBUF.RXEMPTY bits. However, in order to ensure the synchronization of DSP C6678, the user has no way to solve this problem.

Is TI C6678 still being shipped? If it is still being shipped, could you please help to consult an expert in this field?
Are there products with the same architecture as C6678 Keystone 1? If yes, could you help me consult an expert for this problem?

Regards

Xing

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

XING CUI said:
Is TI C6678 still being shipped? If it is still being shipped, could you please help to consult an expert in this field?

Yes. It is being shipped till date. But the support is almost NIL.

I am looping in our another expert, Praveen Rao who will help you understand the situation better.

XING CUI said:
Are there products with the same architecture as C6678 Keystone 1? If yes, could you help me consult an expert for this problem?

Yes, Keystone II devices like K2E, K2H uses the same architecture of C66x.....

But for Keystone III, they moved to C7x architectures...

Praveen Rao - Whom I looped in, may point out, an internal expert or he may help you to pursue other TI processor devices with high speed SPI or QSPI etc...

Regards

Shankari G

0 Praveen Rao over 1 year ago in reply to Shankari G

TI__Mastermind 48433 points

Hi Xing,

As Sankari explained, there would be limited support on these C6678 device. Please note that we do not have any dev support on this device as all the expert have move on to newer J7 devices and/or not in the company anymore. Suggest contacting your local TI sales team so that they can suggest the new device family that you can evaluate for your product.

Thanks.

0 XING CUI over 1 year ago in reply to Praveen Rao

Prodigy 161 points

Hi Praveen Rao,

The old product support is understandably weak. However, the product is being shipped. Your company's technical support test and I both found that the SPI module transmission speed has a timeout phenomenon.
The SPI module comes with LOOPBACK mode. Can you test whether the SPI transmission speed is normal in this mode? If it is not normal, it means that the product is not suitable for us, and I can consider changing the platform again.

Thanks.

Regards

Xing

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

We found out just the SPI DATA rate when interfaced with a particular NOR memory flash.

There is no timeout phenomenon.

The C6678 SPI transmission is successful with data verification and the SPI transmission is consistent and reliable.

:-)

Regards

Shankari G

0 XING CUI over 1 year ago in reply to Shankari G

Prodigy 161 points

Hi Shankari G,

Sorry, I may have expressed the content may be wrong.
Is that how it should be expressed? The underlying SPI code provided by TI can realize NOR Flash data reading and writing, but users feel that there are problems with the transmission speed or throughput rate, and request TI for technical support or performance verification.
For the following reasons:
In the SPI control code provided by TI, we shield the C2TDELAY and T2CDELAY Settings of C6678 according to the instruction manual of KeyStone Architecture Serial Peripheral Interface provided by TI. The default value of SPIFMT.wdelay is 0. The SPI module (SPI CLK =25MHZ(40ns)) continuously sends bytes with a throughput rate of 13.5Mbps(above test result: 74 ns for 1 bit -- > practically).

Regards

Xing

0 Shankari G over 1 year ago in reply to XING CUI

TI__Mastermind 25535 points

XING,

OK, Fine.

XING CUI said:
but users feel that there are problems with the transmission speed

I think, no problem with the transmission speed/throughput rate. That is the maximum it could achieve with the current sample SPI driver and the NOR device selected.

---

Perhaps, the SPI driver shall be optimized by the customer and improve upon.

Because, usually the sample drivers (SPI ) are provided just for a "proof of concept" to demo the functionality of the interfaces such as SPI etc...

---

In general customers will take the sample code of TI and modify, optimize and improve upon to adapt to their project needs and requirement.

---

Please contact your local TI sales team so that they can suggest new device family that you can evaluate for your product.

Regards

Shankari G

Processors

Processors forum

TMS320C6678: There is a certain delay between the CPU and external devices, which may be due to internal limitations of the CPU