Processors

Processors forum

State
Locked Locked
Replies 50 replies
Subscribers 103 subscribers
Views 5055 views
Users 0 members are here

Support feedback

Options

Options

Related

DM8148 peripheral access is slower than expected

Justin Jansen

Intellectual 655 points

Other Parts Discussed in Thread: SYSBIOS, SYSCONFIG

I have the a DM8148 dev kit project with SysBios and am noticing more cpu cycles spent than expected to access some of the peripherals. For example, accessing a gpio set register:

*(volatile int *)(0x48032190)=0x00000008;

I'm finding that the fastest I can toggle this discrete is 5Mhz, or 200ns per register access. My L3 is running at 200Mhz and my ARM core is running at 720Mhz. I'm new to this processor, so, is there something that can be improved for this kind of access time for GPIO? This seems excessively long to take 144 clock cycles to access an on-die discrete, 30-50 would be more reasonable to me.

Can anyone shed any light on this topic?

over 13 years ago

0 Renjith Thomas over 13 years ago

Guru 31670 points

Justin,

GPIO is connected to L4 interconnect. Assume you are running the SYSBIOS code from M3 which is running at a lower speed than 720MHz 200ns seems reasonable.

0 Justin Jansen over 13 years ago in reply to Renjith Thomas

Intellectual 655 points

I don't understand why would I assume my code is running from M3? My code is running from the A8, and then needs to go through the interconnect to get to the gpio. I understand the gpio is on the L4 which is still running at 100Mhz. How many clock cycles does it take to simply write a gpio value over the L4? Like I said, right now, that would add up to about 15-20 clock cycles on the L4 just to write a gpio value. This does not seem right, or at the very least seems like I should be able to configure the processor to improve this time. Is there any configuration to the L3 or L4 arbitration scheme that might help my access times from the ARM core?

0 Renjith Thomas over 13 years ago in reply to Justin Jansen

Guru 31670 points

You can check the registers for configuring L3 and L4 in the section 1.12.2.5 and 1.12.3.5 of the document sprugz8b.

0 Pavel Botev over 12 years ago in reply to Renjith Thomas

TI__Guru**** 170625 points

Hi Justin,

With this piece of code *(volatile int *)(0x48032190)=0x00000008; you are writing in the GPIO_CLEARDATAOUT register (offset 0x190). How exactly do you measure the 200ns write access time? Is it with some debug tool? Do you have the same 200ns write access time with the GPIO_SETDATAOUT register (offset 0x194) ?

Please note that the DSP has 128-bit read/write port, and Cortex-A8 ARM has 64-bit read/write port, so DSP read/write access should be faster. Here is one thread explaining the DSP interconnect bandwith:
http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/177428.aspx

If you increase the Cortex-A8 clock speed to 1GHz, do you have better (than 200ns) write time? This thread is explaining how to increase the Cortex-A8 clock speed:
http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/204555.aspx

I have check the L3 Interconnect and L4 Interconnect description and registers. The only section explaining the interconnect bandwidth is 1.12.2.3.3 Bandwidth Regulators.
Bandwidth regulators are mainly used to give priority to the following masters: HDVICP, TPTC_RD2/4, TPTC_WR2/4, MMU, ISS and SGX. So we need to check if these registers are programmed (L3_BW_R_BANDWIDTH and L3_BW_R_WATERMARK) to give priority of some of these masters over the Cortex-A8. Can you provide me the registers values?

Other thing that we can try is to increase the L3/L4 slow interconnect speed and the GPIO peripheral speed.

Best Regards,

Pavel

0 Justin Jansen over 12 years ago in reply to Pavel Botev

Intellectual 655 points

Pavel,

Yes, I have a string of code that writes to the clear and set registers alternating, and am looking on an oscilloscope how fast the gpio is toggleing. If I do the same thing with GPIO_0_SET, GPIO_1_SET,GPIO_0_CLR,GPIO_1_CLR...etc, a given discrete then only toggles every 400ns.

Having a wider bus on the DSP vs the ARM shouldn't impact access time to a 32bit resource. I agree that the bandwidth could be different, but single word access time shouldn't be impacted by the width of the bus.

Yes, if I increase the A8 core clock it gets marginally better. If I increase the L3 clock speed it gets better as well, but I'm hoping there is a better solution than using a sledge hammer to solve performance issue. There are obviously other issues gains by increasing clock speeds beyond the specified frequency.

We are not setting any bandwidth registers, so they are reset/default. I was having a hard time understanding what the default state/behavior would be and if it would even affect single cycle access latencies, or just high bandwidth data movement. Is there arbitration involved here that is configured by these bandwidth registers. I should also mention, at this time, I don't have anything else active on any other core.

Is there a way to configure interconnect speeds beyond just the L3 PLL?

0 Rahul Gulati over 12 years ago in reply to Justin Jansen

TI__Prodigy 20 points

Justin,

Given the details in the thread, I would like you to configure the “pressure input to interconnect” as follows:

Set bits 1:0 of INIT_PRIORITY_0 register in the control module to “11”

This should help prioritize ARM traffic within the interconnect. Please let me know if this helps in reducing the response latency for your test case.

Thanks and Regards,

Rahul

0 Rahul Gulati over 12 years ago in reply to Rahul Gulati

TI__Prodigy 20 points

Justin,

I had missed reading in your response earlier that there is "no other" concurrent traffic in your system from any other core. The above configuration parameter would help only if there is competing traffic within the interconnect.

Thanks and Regards,

Rahul

0 Justin Jansen over 12 years ago in reply to Rahul Gulati

Intellectual 655 points

Rahul,

Just to confirm, changing the priority did not change the time required to access a peripheral.

This wouldn't be a huge concern if this was capable of out of order execution, but my understanding of the A8 is that when a strongly ordered access type like this, its going to cause the A8 pipeline to stall out, sacrificing about 150 clock ticks to set peripheral registers.

Also, gpio is my simple scapegoat of an example. My real problem is Ethernet register access times.

0 Justin Jansen over 12 years ago in reply to Justin Jansen

Intellectual 655 points

Here is some timing data I have to a few different peripherals with non-cache, non-buffered, not-shared properties in the mmu.

[CortexA8] EMAC CPPI Write ( 1 0x4a102000): 174ticks 241ns Adj:230ns
[CortexA8] EMAC CPPI Read ( 2 0x4a102000): 180ticks 251ns Adj:239ns
[CortexA8] DDR3 uncached Write ( 3 0x82000000): 100ticks 140ns Adj:128ns
[CortexA8] DDR3 uncached Read ( 4 0x82000000): 116ticks 161ns Adj:150ns
[CortexA8] OCM RAM write ( 5 0x40300000): 119ticks 165ns Adj:154ns
[CortexA8] OCM RAM read ( 6 0x40300000): 108ticks 150ns Adj:139ns
[CortexA8] A8 SRAM write ( 7 0x402f1000): 44ticks 62ns Adj:50ns
[CortexA8] A8 SRAM read ( 8 0x402f1000): 42ticks 59ns Adj:47ns
[CortexA8] GPIO write ( 9 0x481ae13c): 159ticks 221ns Adj:209ns
[CortexA8] GPIO read (10 0x481ae138): 152ticks 211ns Adj:199ns

0 Justin Jansen over 12 years ago in reply to Justin Jansen

Intellectual 655 points

The simple question is how many A8 clock cycles does it take to get an access to the L3 interconnect, then how many L3 clock cycles does it take to issue a write request to the L4, and finally how many L4 clock cycles does it take to complete the write to a register resource.

A8 = 720

L3 = 200

L4 = 100 (implicitly L3/2 I think ?)

I might guess 4 L4 cycles, maybe 5 L3, and maybe 10 on the A8 adding up to ~(30+16+10) would be 56 A8 cycles. That would be a number I could easily accept, but 150+ cycles seems like I might have something configured less than optimal.

I have all caches enabled and have cache,buffer, and share properies on the previous entries disabled. My understanding makes these strongly ordered accesses. And since this is an A8 ARM, all accesses to strongly ordered resources will stall out the pipeline for this 150+ clock cycles.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

I'm not 100% sure, but is it because of idle mode? I'm not an expert in micro-architecture, but can there be delay in the transition because of idle mode support? Can you set the GPIO_SYSCONFIG to "No-idle" and try?

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

Renjith,

Thanks for the reply, but no, the interface is not in idle mode. If I let it go to idle mode the access time is closer to 500ns.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

This is interesting info. But if you see the table that you've posted for different peripherals, GPIO is not that bad when compared to un-cached DDR access :)

Can you try another experiment. Can you run A8 in bypass clock(~20MHz) and see whether its still close to 200ns or it goes up marginally. If it doesn't go up much, then the real issue has to be within the GPIO controller itself and if it goes up, then we've to suspect the L3/L4.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

I'll give it a try, but I've tried changing the frequency of the A8 with little change. I've changed the frequency of the L3 with significant change. Unfortunately, the datasheet says the L3 is limited to 200Mhz. And like I said, GPIO was just my L3/L4 scapegoat to figure out if I had the system configured optimally for accessing L3 and L4 resources. They all seem slower than I expected, and I want a document or confirmation that its the best it can be before just accepting the performance as is. The only exception is looking at the architecture, it doesn't look like the DDR3 goes through the L3. The L3 is causing me performance problems with my Ethernet drivers accessing registers and cppi ram.

0 Justin Jansen over 12 years ago in reply to Justin Jansen

Intellectual 655 points

Ok, I placed it into bypass, and this is what I see for resource access times. I'm guessing this means that this is likely the shear number of ticks to get through the A8 subsystem and MMU for the most part. Is 50-60 ticks expected? Of my ~150 ticks to access EMAC or GPIO, likely 50 of them are due to the ARM core, 100 ticks due to the L3 (or 28 ticks in that domain). Still seems higher than I would expect for both. These are the types of numbers I thought TI would have published, but I can't find them anywhere.

[CortexA8] Frequency 20000000Hz

[CortexA8] EMAC CPPI Write ( 1 0x4a102000): 59ticks 2958ns Adj:2555ns
[CortexA8] EMAC CPPI Read ( 2 0x4a102000): 53ticks 2674ns Adj:2271ns
[CortexA8] DDR3 uncached Write ( 3 0x82000000): 54ticks 2740ns Adj:2336ns
[CortexA8] DDR3 uncached Read ( 4 0x82000000): 55ticks 2798ns Adj:2395ns
[CortexA8] OCM RAM write ( 5 0x40300000): 52ticks 2638ns Adj:2235ns
[CortexA8] OCM RAM read ( 6 0x40300000): 48ticks 2438ns Adj:2035ns
[CortexA8] A8 SRAM write ( 7 0x402f1000): 44ticks 2248ns Adj:1845ns
[CortexA8] A8 SRAM read ( 8 0x402f1000): 50ticks 2500ns Adj:2097ns
[CortexA8] GPIO write ( 9 0x481ae13c): 64ticks 3248ns Adj:2845ns
[CortexA8] GPIO read (10 0x481ae138): 52ticks 2636ns Adj:2233ns

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

There are couple of things here. Since your code is "*(volatile int *)(0x48032190)=0x00000008;" using volatile, there will be atleast minimum 3 or more assembly instructions generated for this statement. The instructions really depend on the access times. The GPIO toggle happens after the Load and mov instructions. Can you profile the exact time taken from the store to the GPIO toggle to occur.

a. LDR (load the address from instruction memory)

b. MOV (move the value 0x8 to register)

c. STR ( store the value 8 to the address)

Also, you can check the following.

1. Instruction cache is enabled (assuming to be enabled)

2. Data cache is enabled (assuming to be enabled)

3. The optimization levels of the compiler

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

I'm not sure how you would profile the time from the start of STR to the toggle actually occurring. I'm working on learning to use Trace to capture the time to execute the STR instruction.

Instruction cache is enabled.

Data cache is enabled.

Optimization is -03

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Since STR is when the actual write starts and rest all other instructions are run as a prelude to STR instruction. We need not take all the other instructions into account as it might be different with various factors such as optimization levels, usage of volatile type qualifier, etc. I feel that profiling the STR instruction only can give the exact latency.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

Ah, you just want the number of clock cycles for the STR instruction. It sounded like you wanted it correlated to the state change on the pin.

But unfortunately, today is the first time using trace, and I'm not seeing data that makes sense to me. I have a block of code that my timestamp says takes 1700+ clock cycles, and trace cycle count column says ~300. So, the raw trace number is 19 cycles for a the STR gpio access, but I don't trust the definition of a cycle is the same at my A8 core frequency.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

One more experiment. In case of auto-idle 300ns gets added to the STR instruction only. That time if you see the cycle count it will be almost clear how 1 cycle translates to.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

I looked into it, it looks like the extra time was more due to instruction caching. I had the gel file forcefully waking up the gpio clock.

WR_MEM_32(CM_ALWON_GPIO_0_CLKCTRL, 2)

0 Justin Jansen over 12 years ago in reply to Rahul Gulati

Intellectual 655 points

Rahul and Pavel,

I'm still waiting for some TI support here on this topic. What is the expected clock cycles to get from the A8 to the L3, and how many L3 clock cycles can I expect for EMAC registers and CPPI RAM?

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

If you re-run the same instruction again, how much time is it taking as it is assumed to be cached?

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

Here is my simple test code to test EMAC CPPI RAM. The GPIO access time was just the ah-ha moment of why our Ethernet was not performing as we might expect. The first call to profile_write_time returns 295 ticks (assumed uncached instructions), 160 to 168 ticks for subsequent accesses.

unsigned int time32(void){
asm(" mrc p15, #0, r0, c9, c13, #0");
}

int profile_write_time(register int *address){
register int time1;
register int time2;
register int time3;
volatile int retVal;
time1 = time32();
time2 = time32();
*address = (int)address;
time3 = time32();
retVal = (time3 - time2) - (time2 - time1);
return retVal;
}

printf("%d CPU ticks\n",profile_write_time((int *)0x4A102000u));

And yet, this is the trace that I am seeing for a ~160 tick iteration. 22 cycles is clearly not 160. Its like the cycle index is based on a 100Mhz clock, but I don't know where that would be coming from.

Instruction	Instr Addr	Read Addr	Write Addr	Cycle Index	Cycle delta
MOV R12, R0	0x80106B54			697	0
BL 0x80106B48	0x80106B58			697	3
MRC P15, #0, R0, C9, C13, #0	0x80106B48			700	0
BX R14	0x80106B4C			700	5
MOV R2, R0	0x80106B5C			705	0
BL 0x80106B48	0x80106B60			705	2
MRC P15, #0, R0, C9, C13, #0	0x80106B48			707	1
BX R14	0x80106B4C			708	4
MOV R1, R0	0x80106B64			712	22
STR R12, [R12]	0x80106B68			734	1
			0x4A102000	735
				735	8
BL 0x80106B48	0x80106B6C			743	1
MRC P15, #0, R0, C9, C13, #0	0x80106B48			744	3
SUB R12, R0, R1, LSL #1	0x80106B70			747	1
ADD R12, R2, R12	0x80106B74			748	0

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

What is the real problem? Are you facing a throughput issue in Ethernet driver? If so, what is the current throughput that you are getting and what is the expected throughput?

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

I'm trying to optimize our Ethernet stack to the best it can be, so yes, in a way its a throughput issue because the driver consumes more CPU cycles than I would have expected. I don't want to talk about things at a high level because that just muddies the waters for my use case, I want to know if accessing a specified resource is expected and normal to take 200 clock cycles, be it a gpio regiser, CPPI RAM, EMAC registers, OCM RAM...etc. That information has a huge bearing on how I write my application and drivers. I'm concerned that I don't have something configured right or setup right, or just not optimized for the ARM core to access those resources, and I'm trying to figure out if the latency on access time is normal/expected/explainable.

160 clock cycles to access an on-die RAM resource just doesn't seem right. If that's the best I can do due to the architecture of the processor, I'd like to know that and that closes my issue. If I can do better, I'd like to figure out what I have configured incorrectly so we don't abandon this processor for a potential configuration issue.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

Can you just let me know the current throughput. I believe you might have explored the use of DMA etc. Also have you explored the use of burst transfer using ARM itself?

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

This is why I started the thread with gpio being the focus because I didn't want things to get muddy and sidetracked away from my core concern/question, so I'd rather not publish any high level throughput number. Ethernet performance is not the direct issue in question because the same issue exists for all the L3 resources. I don't want help solving a larger scale Ethernet stack performance question, I'll start a different thread if it comes to that. For this thread, I want help understanding if the access time to L3 resources is what I should be seeing.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

I'm sorry, I won't be able to help further. Somebody who is familiar with the micro architecture of 8148 might be able to help you. Hope some TI person responds.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

In your experience, is the trace cycle count supposed to reflect the full frequency cpu cycle count?

So ignore I said anything about Ethernet, lets go back to gpio access time. You don't have any further ideas how I could be configuring it wrong?

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

I haven't tried measuring trace cycle count.

Have you tried a simple loop where you keep toggling a GPIO line back to back and saw the wave form on an oscilloscope without idle mode? I guess that will clearly say what exactly is the delay.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

That is the exact test I did before I posted anything to the forum. The fastest I can toggle a gpio pin is 200ns. And, with a 720Mhz CPU, and 200Mhz L3 interconnect, I expected much faster than that, maybe somewhere around 50ns would have been more reasonable and would not have grabbed my attention that I might have something configured wrong. Not only that, but a co-worker tells me they can toggle the same gpio from the DSP running at 500Mhz at a 70ns rate.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

Can you just write assembly code with multiple STR instructions to keep toggle the GPIO? The problem with C code is that you keep executing additional instructions also.

0 Renjith Thomas over 12 years ago in reply to Renjith Thomas

Guru 31670 points

Justin,

Can you try the same experiment using M3 core and DSP core as well? This will give an idea whether the L3/L4 bus is causing the trouble or the ARM-L3 interface is the bottleneck here. Because in DM8148, using DMA people are doing high bandwidth video processing, if such an issue exists it should have been reflected there atleast.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

The writes to GPIO from the DSP are much faster. 40ns per access (until the pipeline fills up) It appears as though the DSP on this chip allows for a certain amount of out of order execution. I had all operations on the same D1 instruction bank, and see the pin toggle at a 40 ns rate for a burst of 8 to 10 toggles. That is a very different number than the ARM is giving at about 200ns per write, but I know the A8 doesn't allow out of order execution, and not only that, but this access would be categorized as strongly ordered and I don't know what that is going to do to the ARM pipeline. I don't know if it flushes it, or just stalls it, either way, its slow.

However, this does imply that the DSP is arbitrating for the L3 interconnect, and if it can keep ownership of the interconnect, it is about 6 or 7 L3 clock cycles per access. The ARM on the other hand may not be able to issue requests fast enough that somehow its going idle or something in which case each request takes 40 clock cycles on the L3. So it certainly seems that the L3 can perform better than what I'm seeing for single ARM accesses, but I can't figure out why or what to do about it.

As for the M3 on chip, I haven't figured out if I can use it for general programming. It doesn't show up on the jtag icepick.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

That is a very good info! If you select TI814x in the CCS target configuration, it will show all cores properly.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

Right, and the only ones it shows are the A8 and the DSP, the other connection points are things like the STM and ETB, there is no connection point for the M3

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Can you share a snap shot of your CCS configuration? I'm able to see all the cores here if I select "TI814x" instead of DM814x, or EVM814x etc..

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

That is not a valid configuration on my install. Just for clairity, which configuration are you talking about?

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Justin,

I'm not able to upload the snapshot here as my silverlight plugin is failing. If you can send me an email, I can mail it to you.

I'm talking about, when you open the .ccxml file of your project, you'll be able to see three tabs "basic" "advanced" and "source". You'll be able to select a target in the "basic" window. There you'll see a list of target platform. If you type "TI814x" in the filter, you'll be able to see an list item. If you select and go to the "advanced" tab you'll be able to see all the cores. I'm able to see 3-ARM9s, 1-DSP, 1-Cortex-A8, 3-Cortex-M3s (total 8) cores in my list.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

Thanks,

No, they aren't an option, but the dm8148.xml had them there, just commented out, so now I'm seeing them. I'm wondering which ones are available for custom programming...

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

Try M3-ISS or M3-Video. You might have to run a gel for connecting to it.

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

The gel doesn't appear to be set up to take those cores out of reset, I don't know that its worth continuing to go down the M3 route at the moment.

Why would I be able to write at 40ns from the dsp and it takes 200ns to execute a similar instruction on the ARM.

The reads on the two chips are very similar access times, both in the domain of 200ns. Its almost like the ARM waits for some sort of ack while the dsp doesn't and just moves on to the next instruction once it has issued the interconnect request.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

It makes sense to try on M3 because both, A8 and M3 and a 64-bit wide bus interface whereas DSP has a 128-bit wide interface.

0 Renjith Thomas over 12 years ago in reply to Renjith Thomas

Guru 31670 points

Jansen,

Is there any update about the issue?

0 Justin Jansen over 12 years ago in reply to Renjith Thomas

Intellectual 655 points

No luck on actually getting any custom code to run on the M3, I've seen other posts on the forum from TI that say this should not be done anyway, it is intended to be a block box part of the system.

0 Renjith Thomas over 12 years ago in reply to Justin Jansen

Guru 31670 points

I really don't think you should consider M3 as a black box. If you have EZSDK or AVBIOS package there will be lot of example applications available. You can just compile one of them and try to execute the app. Then there you can try your code easily.

0 Andrew Elder over 12 years ago in reply to Justin Jansen

Expert 1680 points

Justin,

Did you ever solve this? I'm seeing +200ns for the DSP to read a single value from the CPPI.

- Andrew

0 Justin Jansen over 12 years ago in reply to Andrew Elder

Intellectual 655 points

The simple answer is, thats just how long it takes....live with it. ;)

I actually found for Ethernet CPPI it was faster to use OCM for the descriptor space. But in general, 150-200ns is a reasonable number for a single read access across the L3 interconnect for this processor. You can speed up the L3 up to 220Mhz, but beyond that, not much you can do for reads. So, read from it ONLY when you HAVE to.

0 Andrew Elder over 12 years ago in reply to Justin Jansen

Expert 1680 points

Thanks for the response Justin. Like you perhaps, I'm doing custom stuff with Ethernet descriptors. I worked out my read was taking 156 cycles. Well, if my numbers are "expected" I guess I'll take another approach and re-write a bunch of code :-).