This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM6437 EMAC -data order problem (maybe related to endianess)

Hi,

I am working on EMAC on our own board. We have a EMAC driver code originally written for C6455, which is big-endian. I modified the code such as register addresses and similar things which are specific to DM6437.

We can send a ping packet currently, but the problem is that I get the content of the packet reversed.

When I check from Ethereal, packet has a content such as FF FF FF FF FF FF   00 1e  0b 31    b9 3b 08 06.....

But when I check my buffer in memory, it has:                        FF FF FF FF FF FF   31 0B 1E 00     06 08 3B B9.....

I don't know if this problem is an endianess problem, because code doesn't do any operation specific to an endian. It seems DMA writes the packet in reverse order to memory. Is there any register configuration related to this? Do you have any suggestions?

Thanks in advance,

Erman

  • Everything in the DM6437 should already be hard wired for little endian operation, so I am reluctant to think that this is an issue with the DM6437 EMAC itself (there is no endian control in the EMAC anyway). My first suspect would be the PHY, as that should be the major difference between your ethernet setup and that which is used on the EVM, perhaps your PHY has some endian setting that is reversed?

  • Hi,

    I'm currently working on EVM-DM6437 board and I'd like to try EMAC on my side also.

    Do you know are there any official drivers for DM6437 and if not can you share with me your modified C6455 drivers?

    Does anyone else have some other drivers for DM6437?

    Best,

    -Dejan

  • Hello,

    You should receive an evaluation of the NDK inside the DVSDK download for your EVM. The eval NDK software will include all of the core NDK stack as well as a HAL (Hardware Abstraction Layer) library specifically for the DM6437.

    Note that you may have to register your board if you have not already done so to download the latest DVSDK software.

  • Hi,

    I found it and I start with examples.(there is some evaluation period of 5 min. What does it means? Do I need to purchase it?)

    I have one more question. Reason why I'm trying to setup Ethernet connection is to use EMAC to transfer video data from file to the board and back. Do you think that using EMAC is the best solution for it or there is better way to do it on EVM-DM6437?

    Best,

    -Dejan

     

     

  • dejan sajic said:
    (there is some evaluation period of 5 min. What does it means? Do I need to purchase it?)

    The NDK version included with the DVSDK is the demo/evaluation version so it does time out (though I was under the impression the time out was 24 hours, at least it used to be). For production you do need to 'purchase' it, however it can be purchased for free as it is included with the DM643x Codec Production bundle that can be requested through this site.

    dejan sajic said:
    I have one more question. Reason why I'm trying to setup Ethernet connection is to use EMAC to transfer video data from file to the board and back. Do you think that using EMAC is the best solution for it or there is better way to do it on EVM-DM6437?

    In general the Ethernet connection is likely the best way to go as it is fairly flexible and fast, for example the Ethernet interface is what is used in the DM6437 demo for streaming video files to and from a PC. The alternative way would be to use the PCI interface instead, however the example software we have for it requires a Linux host PC so this may or may not be good for your application.

  • Hi Bernie,

    The time out is 24 hours, I made a mistake. :-)

    If I understand you correctly I should apply for DM643x Codec Bundle BASIC – PRODUCTION bundle and NDK is part of it?

    I'd like to double check is it for free because I don’t understand what is "One-Time License Fee (normally $20,000 USD) " and if it's for free why the version that is included with the DVSDK is the demo/evaluation version?

    I'll appreciate if you can little bit more explain me how it works!

     

    Best,

    -Dejan

     

  • dejan sajic said:
    If I understand you correctly I should apply for DM643x Codec Bundle BASIC – PRODUCTION bundle and NDK is part of it?

    That is how it is these days, the NDK used to be a seperate item and was a flat $5k USD, but since it is included in that production bundle I would suggest going for that.

    dejan sajic said:
    I'd like to double check is it for free because I don’t understand what is "One-Time License Fee (normally $20,000 USD) " and if it's for free why the version that is included with the DVSDK is the demo/evaluation version?

    This is a great question, I believe it is because when we originally released the DVSDK for the DM643x these codecs and the NDK were not free, since then the marketing of the appliation software has changed, perhaps some day the software will be included with the DVSDK itself though I am not sure if this would happen based on the licensing. Though there is likely more to it from a legal stand point (which I am less familiar with as I am not a laywer) based on the fact that you end up ordering it for $20k with a $20k coupon code. If you have concerns about this I would suggest running the license agreement through your legal department before accepting the software.

  • Thanks a lot Bernie,

    I'll accept your suggestion and ask guys from legal department first.

    If I get positive answer from them what I need to implement then. Same kind of client server application where client should be on PC side and server application should be implemented as separate task in board application.

    Client should send video data over network and server should receive them and store in working buffer.

    Can you give me little bit more information about it (I don’t have so much experience working with network)?

    Does it need to be separate task and what about priorities?

    I tried to put “Hello world” example in my application as separate task and now I’m able to send message from PC over network to DSP and to get it back but only periodically because mostly I get message “Receive timeout”.

    Because with UDP I can lose packets without any notice should I use TCP for my application?  I’ll appreciate any suggestion from your side because this is my first time that I’m trying to do something like this.

    Is there any useful example that I can use as a start point (with adequate .tcf configuration,…)?

     

    Best,

    -Dejan

     

  • Hi,

    I downloaded and installed “bundle-dm643x-BASIC-v1.3” with NDK inside it. Then I have created new task that setup TCP server on board and wait for packets from client side (PC).  I’m able to send packets from PC and to receive them on server side. I’m putting them in separate buffer that represent video input buffer for application running on the board. Problem is how I can synchronize the sending/receiving packets and processing them in separate tasks.

    In other words I’m able to receive video input but too fast (it depends on network speed) and it’s not synchronized with processing task. I’d like to send it from PC, receive it on board side and to do some processing in Appl. Task.( for example to change it to Grayscale).

    Am I first one that faced with that kind of problem? I believe that everyone that faced with it has simple template that is able to receive and process video using NDK. If it’s true please share it with me or suggest me how I should implement it.

    Best,

    -Dejan

  • Hi,

    One more thing, I just measured time between two pictures received from PC side and it takes about 2621.893555 ms.

    So I’m packing 752 bytes inside one the package (one picture line) and I’m sending 480 packets for a one picture. Measured time, to achieve that, is about 2621.893555 ms per picture and I should simulate 30ms per picture video streaming.

    It’s too slow and I’m not sure how to speed up the things. Any idea?

    Best,

    -Dejan

  • dejan sajic said:
    In other words I’m able to receive video input but too fast (it depends on network speed) and it’s not synchronized with processing task.

    You would need to implement some sort of synchronization, such that the server sends out the data at the expected rate and the data has time stamps embedded so the receiver can synchronize properly. You could also implement hand shaking where the board and the server acknowledge each other periodically to remain in sync. The closest example I know of for this is the DVSDK demo (C:\dvsdk_1_01_00_15\dm6437_demo_1_30_00) which actually implements file i/o functions over the network as opposed to true streaming, however it is able to play compressed video from over the network properly.

    For communication/synchronization between your BIOS tasks you could use the various APIs included in BIOS such as a semaphore (SEM_pend, SEM_post) or the mailbox manager if you want to pass data (MBX_pend, MBX_post), these APIs and more are discussed within SPRU403.

    dejan sajic said:
    It’s too slow and I’m not sure how to speed up the things. Any idea?

    The NDK is dependent on system load and access to memory, so to speed it up the best thing to do typically is adjust your memory map, there is a FAQ that mentions this on the Wiki.

  • Hi,

    I finally  implemented it and now I'm able to stream video data form PC to board, process the received data and present them on the display. :-)

    The problem that I'm not able to solve is the speed. The Ethernet connection is fast enough to do it but from some reasons it works very slowly. (I receive every 300ms on packet 1504 Byte). If anyone have any idea how I can speed up it please mail me. I'm stuck on this issue and I don't have any idea what else I can do. :-(

    Note. I created following sections:

    SECTIONS {

                    .L1Buffer > L1DSRAM

                    .L2Buffer > IRAM

                    .ExtBuffer > DDR2

                    .NDK_PACKETMEM       > L1DSRAM

                    .NDK_MMBUFFER          > L1DSRAM

                    .NDK_OBJMEM                               > L1DSRAM

    }

    Best,

    -Dejan 

  • Hi,

    I’m still trying to get it working but without any results. So, did anybody success to get it working in real time or I’m just wasting my time trying to figure out why it’s not working on my side?

    Once again I’m trying to stream video data over Ethernet connection using NDK. Video data format is: 752 columns, 480 rows, 1byte per pixel.  I get it working but very slow, about 250ms per frame.

    I know that it works in demo application but in that case data are compressed and bandwidth is not so high. In my case it’s raw data and bandwidth should be 752*480*1Byte = 360960Byte per frame. I would like to simulate 30ms per frame play rate. I don’t run anything in parallel so I can use even L1 or L2 memory for implementation. Any idea? 

    Best,

    -Dejan

  • Hi all,

    Guys, I really need your help. Is it possible that no one else has faced this problem before?

    Any idea? Bernie :-) ? 

  • Without more info about your application, I have to assume that the ethernet transfer is your bottleneck, although it could easily be something else.  To get 30 frames a second, you are only moving about 10 MBytes/second, which means you are trying to move ~80Mbits/second across a 100Mbps link.  This is very ambitious, especially considering that you are splitting your packet sizes up so small.  The easiest way to increase your speed would be to cram more info into each packet, since you pay an overhead penalty to send every packet.

    There are always other bottleneck possibilities...

    Are you sure that you are running at 100Mbps and not 10Mbps?  (especially if this is a custom board)

    If you are streaming data from a PC off the hard drive it might be slowing things down periodically.

    If you are doing anything with the data in the CPU, you might be slowing things down for cache misses.

    I confess to not knowing how the ndk handles data as it arrives... there may be something in that memory movement that could be optimized.  In my experience, anytime your data does not fit nicely into internal memory, then moving memory into and out of the dsp core is always a bottleneck.

    I noticed that you used the word simulate... are you running this test on actual hardware or using a simulator?

  •  

    Ok, I made a new project that is able to receive packets from PC, to put them in the output buffer and present the received picture on the display. I’d like to send it but I’m not sure how. Can you provide me your email address or please suggest another way to do it?

    Yes, I’m working with the actual board (EVM-DM6437) and I’d like to “imitate” the VPFE video in behavior (8 bits raw video data). So I’m sending raw data from PC (752 bytes pro package) then I’m receiving them on the board side and fill the input buffer. Another task will grab these data on every frame and present them on the display. These two tasks (NDK related task and preview related task) are synchronized using semaphore.  

     

  •  

    I increased the transfer packet size to 45120 bytes (8 transfers per frame) but it didn't speed up the things. :-(

     

     

  • I can't make any promises, but if you want to upload your source files to your profile page, I might find some time to look at it this weekend.  Just the basics c files, leave all the ndk stuff out.

  • Hi,

    Sorry for delayed response but I was not at the office in the past few days. Today I uploaded NDKTest.zip file (pass:123456) with all project files included, except NDK .lib files. Please take a look on it if you find the time for it. It’s very important for me to get it working on 30 ms per frame.

    Best,

    - Dejan 

  • Any idea???

    :-(

     

  • Sorry, I haven't had time to take a look at it yet.  Answering a few quick questions on a forum is a different animal than looking at code, and some of us aren't Ti employees.

  • After a brief look at your code, I am a little confused.   I don't have any way to run this to confirm, but it looks like you are acquiring an image from the video port front end, then waiting until you receive another  image over tcp/ip, then performing some processing on the tcp/ip image (process_image_my ignores the framebufferptr arg), but then displaying the image acquired from the video port (and never actually displaying the tcp/ip image).  If you are just using this to get an idea of tcp/ip speed, there are still a number of issues.  The vpfe call might be competing for DDR2 bandwidth with your tcp/ip transfer.  The process_image_my function, which is basically just an example, is a big image read from ddr2 using cache without any optimization... and then writing it right back out.  I would strip out these calls and work to display the actual data being sent though tcp/ip to get a better starting point.  I pasted the main loop from your code below.  For any others reading, the SEM_pend() call waits on a separate task that is accepting the tcp/ip transfer.

    while (status == 0) {

       /* grab a fresh video input frame */
       FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
      
       SEM_pend(semInputStream, SYS_FOREVER);
        process_image_my((void*)(frameBuffPtr->frameBufferPtr) , NUM_OF_ROWS, NUM_OF_COLS, frameCnt);

       // display the video frame
       FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
       
       if(frameCnt < 5){
        nxtInpBufPtr = nxtInpBufPtr+PIC_SIZE;
        frameCnt++;
       }
       else{
        nxtInpBufPtr = inpBuf;
        frameCnt = 0;
       }
        //NOTE: this could be done cleaner by using the BCACHE API from DSP/BIOS and only writing back the frame data
       *L2WBINVregPtr = 0x1; //initiate a writeback/invalidate to all to the L2 cache
       while( *L2WBINVregPtr != 0);
       }

  • Hi,

    I’m really glad that you’ve found a time to take a look on it. So, my idea was to use simple preview application that grab input from VPFE, convert it to gray scale and display it on VPBE. Then I added NDK task that imitates VPFE and grabs input over TCP/IP. So, inside main application I grab input from VPFE, then wait for valid input from TCP/IP and then inside process_image_my function I overwrite grabbed VPFE buffer with data received over TCP/IP and finally pass it to VPBE. I agree that I should remove wait for valid data from VPFE but my intention was to have both cases (VPFE and TCP/IP) on the one place.

    Anyway I tried to disable VPFE at all but it doesn’t speed up things. Obviously VPFE isn’t bottle-neck in my application. I uploaded new NDKTest_Main.c function on my profile. To be really sure that main task doesn’t slow down a transfer I removed task call from configuration file and run just NDK task (also removed semaphore wait inside NDK task) and I got same transfer speed. You can try it on your side also and you will see that it takes about 120-160 ms per frame. 

  • Can you take a closer look at your ethernet frame data size?  Larger frame data sizes should give you better bandwidth utilization.

    The maximum data size per ethernet frame is 1500 bytes.  I'm not sure if setting the packet size to 45120 bytes will guarantee that your frame data size is 1500 bytes.  You could very well be splitting the 45120 bytes into 452 frames of 100 data bytes or less.