This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

achieving optimal packed data access for c64+ compiler

As mentioned in previous posts, I have been playing around with the tuning tools trying to get improved performance for some code with the dm6435 target.  I have a specific question about how to get/tell the compiler to make use of its double word store per cycle capability.  I tried two different ways of initializing an array of shorts to default values.  The first init function takes 513 clocks, the 2nd one takes 257.  The "compiler consultant" tool did not offer any advice to indicate that I could improve the first loop.  Is there a way to get the compiler to perform this data pack/loop unroll optimization without manually coding it?  Or at the very least to notify me that manually coding it would improve execution?  I set optimization to its highest level.  Here is my code:

 #define MAX_LIST 512
#pragma DATA_ALIGN(coord_list, 8)
#pragma DATA_SECTION(coord_list, ".fast")
unsigned short coord_list[MAX_LIST * 2];


#define LARGEST_USHORT 0xffff
#define SMALLEST_USHORT 0x0000
#define INT_MAXMIN 0xffff0000

static void initCoordList( unsigned short *restrict list );
static void initCoordList2( unsigned int *restrict list );
static void initCoordList3( unsigned short *restrict list );

void compilerTest( )
{
 initCoordList( coord_list );
 initCoordList2 ( (unsigned int *)coord_list);
}

// Initialize elements of list to default values
// List is organized xmin, xmax
static void initCoordList( unsigned short *restrict list )
{
 unsigned int i;

 _nassert( (int)list % 8 == 0 );

 for ( i = 0; i < MAX_LIST; i++)
 {
  *list++ = LARGEST_USHORT;
  *list++ = SMALLEST_USHORT;
 }
}

// Initialize elements of a list to default values
// List is organized as ushort min, ushort max but addressed as combined int
static void initCoordList2( unsigned int *restrict list )
{
 unsigned int i;

 for ( i = 0; i < MAX_LIST/2; i++ )
 {
  *list++ = INT_MAXMIN;
  *list++ = INT_MAXMIN;
 }
}