This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DWARF debug info: memory architecture information

Other Parts Discussed in Thread: MSP430F248, TMS320F28335

For various reasons I am working on some automated analysis tools that parse the DWARF debug info in a .out file. I will be using these tools for a variety of processors including MSP430 and C2000 DSPs.

I have been frequently referring to SPRAAB5 as well as the DWARF 2.0 standard and have gotten all the information I need with the exception of memory architecture parameters like:

1) how many bits are in a memory word (e.g. 8 for MSP430, 16 for C2000)

2) whether a multi-word variable is stored little-endian or big-endian

3) inherent range of values (e.g. char stored in 16-bit word for C2000)

How can I get this information from the .out file?

Here's what I get for base types with my software tool from the DWARF info (columns = DW_AT_byte_size attribute, name, then DW_AT_encoding attribute)

MSP430: (MSP430F248 in this case)

1 bool [DW_ATE_boolean]
1 signed char [DW_ATE_signed_char]
1 unsigned char [DW_ATE_unsigned_char]
2 wchar_t [DW_ATE_signed_char]
2 short [DW_ATE_signed]
2 unsigned short [DW_ATE_unsigned]
2 int [DW_ATE_signed]
2 unsigned int [DW_ATE_unsigned]
4 long [DW_ATE_signed]
4 unsigned long [DW_ATE_unsigned]
4 long long [DW_ATE_signed]
4 unsigned long long [DW_ATE_unsigned]
4 float [DW_ATE_float]
4 double [DW_ATE_float]
4 long double [DW_ATE_float]

C2000: (TMS320F28335 in this case)

1 bool [DW_ATE_boolean]
1 signed char [DW_ATE_signed_char]
1 unsigned char [DW_ATE_unsigned_char]
1 wchar_t [DW_ATE_signed_char]
1 short [DW_ATE_signed]
1 unsigned short [DW_ATE_unsigned]
1 int [DW_ATE_signed]
1 unsigned int [DW_ATE_unsigned]
2 long [DW_ATE_signed]
2 unsigned long [DW_ATE_unsigned]
4 long long [DW_ATE_signed]
4 unsigned long long [DW_ATE_unsigned]
2 float [DW_ATE_float]
2 double [DW_ATE_float]
4 long double [DW_ATE_float]

The DW_AT_byte_size attribute seems to be used by the TI compiler to store the word size, not the byte size, despite the fact that the Dwarf standard says this (end of section 5.1)

For example, the C type int on a machine that uses 32-bit integers would be represented by a base type entry with a name attribute whose value was ‘‘int,’’ an encoding attribute whose value was DW_ATE_signed and a byte size attribute whose value was 4.

On the C2000, a "long" is a 32-bit integer, but its "byte size" is stored in the DWARF file as 2, not 4. (words rather than bytes) Is this use of DW_AT_byte_size documented somewhere? (If not, it should be added to a new revision of SPRAAB5)

 

  • Also, the section on bit field offsets (Dwarf Std 2.0 Section 5.5.4) mentions that the bit offset field in the DWARF file is the offset from the most significant bit. Therefore "bits 6:2" in a conventional bit numbering sense, would have DW_AT_bit_offset = 9 for a 16-bit variable, and DW_AT_bit_offset = 25 for a 32-bit variable; conventional bit#31 is the leftmost bit of a 32-bit word and therefore would have a DW_AT_bit_offset of 0. So DW_AT_bit_offset = (N-1) - (conventional bit # of most significant bit of the bitfield).

    I can't calculate the conventional bit # unless I can calculate the size of the bit field's base type, in bits, and I can't do that unless I can look up the word size somehow, since the DW_AT_byte_size of types is expressed in words and not in bytes.

    So how do I find out the word size?

    Also is there a way to tell from the file that the .out file is produced by:

    - TI compilers (so I can use the DW_AT_TI_xxx attributes)

    and/or

    - a compiler that records DW_AT_byte_size in # of words, not # of bytes? (I plan to use my DWARF-parsing tool on any executable that contains DWARF info, so if I have another micro with 16-bit words but their compiler uses DW_AT_byte_size measured in # of bytes, then I have a problem deciding whether to decode it as # of bytes or # of words.)

  • DW_AT_byte_size attributes are not in terms of target words, they are in terms of target bytes.  The C standard specifies that a byte is the minimum addressable unit, which in the case of C2000 is 16 bits.  Thus, DW_AT_byte_size for a 32-bit long is 2, since it is twice as long as a byte.  The term "byte" is commonly used in industry to mean exactly 8 bits (also known as an "octet" for disambiguation), but the C standard doesn't follow that convention, and in this case DWARF follows the C convention.

    I don't think the number of bits in a target byte is recorded in the DWARF information.  I think this is just something you have to know about the target a priori, or look in limits.h

  • Archaeologist said:
    DW_AT_byte_size attributes are not in terms of target words, they are in terms of target bytes.  The C standard specifies that a byte is the minimum addressable unit, which in the case of C2000 is 16 bits.  Thus, DW_AT_byte_size for a 32-bit long is 2, since it is twice as long as a byte.  The term "byte" is commonly used in industry to mean exactly 8 bits (also known as an "octet" for disambiguation), but the C standard doesn't follow that convention, and in this case DWARF follows the C convention.

    I don't think the number of bits in a target byte is recorded in the DWARF information.  I think this is just something you have to know about the target a priori, or look in limits.h

    Thanks, but that's a pretty disappointing answer. There's some real inconsistency on this subject in TI's literature:

    I am looking at the datasheet (SPRS439G) and product folder for the TMS320F2823x/2833x and the product folder for the 28335 says RAM: 68KB, Flash: 512KB, On-Chip Memory F28335/F28235: 256K x 16 Flash, 34K x 16 SARAM. The data sheet on p. 13 lists the memory size in 16-bit words. The only uses I find in the datasheet for the term "byte" are as follows:

    • in reference to the eCAN module, e.g. p. 88 where the mailbox RAM and eCAN memory both state they have 512 bytes, and on p. 89 each of the 512-byte RAM blocks take up an address length of 256 memory locations (512 bytes from address 6000h - 60ffh, and another 512 bytes from address 6100-61ffh), with each mailbox consisting of 16 bytes spread over 8 memory locations (e.g. 61E8h - 61EFh).
    • in the SCI section (p. 92) and SPI section (p. 96) it talks about the lower byte (bits 7-0) and upper byte (bits 15-8).

    In the compiler documentation (SPRU514C), it says on p.82:

    Note: TMS320C28x Byte is 16 Bits
    By ANSI/ISO C definition, the sizeof operator yields the number of bytes required to store an
    object. ANSI/ISO further stipulates that when sizeof is applied to char, the result is 1. Since
    the TMS320C28x char is 16 bits (to make it separately addressable), a byte is also 16 bits.
    This yields results you may not expect; for example, size of (int) = = 1 (not 2). TMS320C28x
    bytes and words are equivalent (16 bits). To access data in increments of 8 bits, use the
    __byte() and __mov_byte() intrinsics described in Section 7.4.4.

    Huh? So bytes are 16 bits, but if I want to access data in increments of 8 bits, I use __byte() and __mov_byte(). I would think TI should call those intrinsics __octet() and __mov_octet(), or keep them __byte() but abandon the idea of claiming that a C28xx byte is 16 bits.

    In the 28xx instruction set documentation (SPRU430E), p. 5-31 talks about byte addressing modes and the MS and LS bytes of a 16-bit memory location; there are instructions ANDB and MOVB; section B.1 talks about byte packing and unpacking operations; and the term "byte" is used several times in the glossary to reference 8-bit portions of 16-bit and 32-bit registers.

     

    I would respectfully request TI to at least amend SPRAAB5 to clarify this issue, and in the next possible version of the compiler, to place some identifying information about byte widths / word widths / endianness / compiler version / etc.

     

     

  • I would respectfully request TI to at least amend SPRAAB5 to clarify this issue, and in the next possible version of the compiler, to place some identifying information about byte widths / word widths / endianness / compiler version / etc.

    ...in the DWARF information, that is.

  • So I asked this on the DWARF mailing list and the person who responded agreed that DWARF doesn't specify an 8-bit byte on the target, though he said maybe it should be mentioned in the standard.

    He also said that the machine architecture info is outside the scope of DWARF but should be present in the enclosing file (ELF, or in this case COFF).

    The SPRAAO8 COFF document mentions that file header bytes 20-21 lists the device family, and the file header flags includes little/big endianness. So I guess I can handle the device family ID, there aren't too many options. (I would not have wanted to parse strings and manage a database of part strings)

     

    I do hope, however, that whenever TI transitions from COFF to something else (ELF?), please find a spot that is as standardized as possible to put machine information like byte width and endianness.