Hi,
I am planning to build and test openMPI over SRIO on m800 moonshot server which employs keystone-hpc 3.0.1.11 package.
I tried to configure between two nodes on one cartridge c21n1 and c21n2. I finished configured these two nodes with srio_topology.bin and srio_host. After that I use the script proposed by processors.wiki.ti.com/.../MCSDK_HPC_3.x_MPI_over_SRIO.
I use the following command to run nbody with openMPI.
/opt/ti-openmpi/bin/mpirun --mca btl self,srio --mca btl_base_verbose 100 -np 2 -host [hostname1],[hostname2] ./nbody 1000
Then the test runs with many debugging information and finally come out with:
[hptest6:17305] SrioLogCRITICAL:SRIO Driver has been initialized
[hptest6:17305] SrioLogCRITICAL:Allocating Queues..
[hptest6:17305] SrioLogCRITICAL:RM Socket: pkt len 264
[hptest5:01007] SrioLogCRITICAL:RM Socket: pkt len 264
[hptest5:01007] SrioLogCRITICAL:RM Socket: pkt len 264
[hptest6:17305] SrioLogCRITICAL:RM Socket: pkt len 264
[hptest5:01007] SrioLogCRITICAL:RM Socket: pkt len 264
[hptest6:17305] SrioLogCRITICAL:RM Socket: pkt len 264
[hptest5:01007] SrioLogCRITICAL:RM Socket: pkt len 264
Error: Tx Credit Queue failed to open
[hptest6:17305] BTL_SRIO: Type11_init failed
[hptest6:17305] select: init of component srio returned failure
[hptest6:17305] select: module srio unloaded
Seems that the 2nd node come across Tx Credit Queue problem and unloaded srio module. Is there any one who came cross with the same problem?
Thanks