ARM Cortex M3/M4 bit-banding (Thumb2 issue)

Alexey Bagaev

Hi all,

Please help me understand why bit-banding (1-bit memory write access) practically cannot be executed in one clock cycle at ARM Cortex M3/M4 Thumb2 and uses at least 10 bytes of code memory.

Thank you,

Alexey

over 8 years ago

0 f. m. over 8 years ago

Guru 11940 points

Please help me understand why bit-banding (1-bit memory write access) practically cannot be executed in one clock cycle at ARM Cortex M3/M4 Thumb2...

AFAIK, a bitbanding write access is actually an (atomic) read-modify-write. That would explain why it takes more than 1 or two cycles.

... and uses at least 10 bytes of code memory.

Really ?

Generally, the ARM Infocenter is a good source of information about the core. Most vendors do not replicate the whole core documentation, but just refer to ARM.

0 Robert Cowsill over 8 years ago in reply to f. m.

Guru 16361 points

f. m. said:

Please help me understand why bit-banding (1-bit memory write access) practically cannot be executed in one clock cycle at ARM Cortex M3/M4 Thumb2...

AFAIK, a bitbanding write access is actually an (atomic) read-modify-write. That would explain why it takes more than 1 or two cycles.

Yes, it's a RMW cycle implemented inside the AHB bus controller. Not only does it have to do the exact same operations that the CPU would do for a single-bit access, it must also lock out all bus masters throughout the operation to ensure atomicity. That prevents the pipelining which makes single-cycle operation possible.

It's faster than using a synchronisation primitive like a semaphore, but slow compared to any regular non-atomic operations. I think the documentation could be improved to make this point clearer (by ARM, not just TI).

0 Alexey Bagaev over 8 years ago in reply to f. m.

Genius 5495 points

f. m. said:
bitbanding write access is actually an (atomic) read-modify-write

ARM Infocenter sais: "It enables individual bits to be toggled without performing a read-modify-write sequence of instructions." :)

And MSP432 RAM and Flash controller support single bit manipulations.

f. m. said:
Really ?

Thumb2 seems do not to have direct data write instructions. All such manipulation seems needs to do through CPU registers. And such instructions seems cost 2 clock cycles. Address bus uses 32 bit access for bit-banding. So address point costs 4 bytes. So, in the end it should be pretty expensive operation to modify an 1 bit only.

0 Alexey Bagaev over 8 years ago in reply to Alexey Bagaev

Genius 5495 points

One remark: of course, Flash technology didn't change and to toggle to positive bit controller needs to erase all big sector.

0 f. m. over 8 years ago in reply to Alexey Bagaev

Guru 11940 points

ARM Infocenter sais: "It enables individual bits to be toggled without performing a read-modify-write sequence of instructions." :)

Not sure if I understand this correctly, but only certain address ranges are supported, but no peripheral registers, and certainly not Flash programming.

Thumb2 seems do not to have direct data write instructions. All such manipulation seems needs to do through CPU registers. And such instructions seems cost 2 clock cycles. Address bus uses 32 bit access for bit-banding. So address point costs 4 bytes. So, in the end it should be pretty expensive operation to modify an 1 bit only.

If you include all those "preparatory" instructions -yes. But that is not different for any other address, so there is no special "burden" for bitbanding.

0 Alexey Bagaev over 8 years ago in reply to f. m.

Genius 5495 points

f. m. said:
certainly not Flash programming

I agree. I've just mentioned as an example for capability of bitwise reading-writing operations.

0 Robert Cowsill over 8 years ago in reply to Alexey Bagaev

Guru 16361 points

Alexey Bagaev said:
ARM Infocenter sais: "It enables individual bits to be toggled without performing a read-modify-write sequence of instructions." :)

Indeed. Bitbanding doesn't perform a read-modify-write sequence of instructions, the CPU just performs a single write instruction.

That single instruction, however, triggers a read-modify-write sequence of accesses within the bus controller.

Strictly speaking the ARM documentation doesn't conflict with what I've said here, but I do agree that it could be stated more clearly.

0 Tony Philipsson over 8 years ago

Guru 12050 points

Bit-Banding is a trick to reach individual bits by column+row trick to memory address that don't exists.

You write to a byte to reach one bit so it will use 8 times the memory address slots,
as they are fake addresses and you have 32bit address available it does not waste hardware.

But bit-banding is still mostly (always?) memory mapped for special registers and ram, and not to Flash.

Some manufactures ARM have a little more advanced bit-banding that allows atomic toggle instructions etc to GPIO

0 Alexey Bagaev over 8 years ago

Genius 5495 points

Thank you for all your replies,

We all generally understand for what and how bit-banding is intended to be. In theory this trick should save CPU clock cycles and code memory in read-only and write-only one bit operations. In our minds we expect from bit-banding direct one clock cycle operation of 1 byte (CPU code/data bus is always 32 bit) to the designated bit-banding address to change one bit in designated byte. That is OK.

BUT, can anyone prove it by posting ARM Cortex M3/M4 Thumb2 single assembler code instruction as and example?

Alexey

0 f. m. over 8 years ago in reply to Alexey Bagaev

Guru 11940 points

BUT, can anyone prove it by posting ARM Cortex M3/M4 Thumb2 single assembler code instruction as and example?

If I understood the answers correctly, a single "STR R<n>, [R<m>]" will do, if R<m> points to an address in the proper bitband range.

But even this machine instructions are somehow abstract, and refer to the programming model for the core. More important is the silicon (hardware) that executes the instruction, depending on the target address. I'm still not clear if the "bitbanding access hardware" is entirely part of the core (i.e. comes from ARM), or requires adaption by the silicon vendor (TI).

0 Alexey Bagaev over 8 years ago in reply to f. m.

Genius 5495 points

f. m. said:
single "STR R<n>, [R<m>]" will do

R<n> and [R<m>] - is actually a CPU registers. To make this instruction work properly you should first load to appropriate registers corresponding data.

Alexey

0 f. m. over 8 years ago in reply to Alexey Bagaev

Guru 11940 points

To make this instruction work properly you should first load to appropriate registers corresponding data.

That's correct. But, as said before, this holds true for any memory access, and is no specific burden for bitbanding.

0 Alexey Bagaev over 8 years ago in reply to f. m.

Genius 5495 points

Exactly, that's why I posted as an example Flash controller as well. Another reason is because of good bit-wise implementation in TI's 16 bit CPUs.

0 Alexey Bagaev over 8 years ago in reply to f. m.

Genius 5495 points

ARM claimed that M3/M4 CPUs have specific support for bit-banding. So where it is?

0 Robert Cowsill over 8 years ago in reply to Alexey Bagaev

Guru 16361 points

Alexey Bagaev said:
We all generally understand for what and how bit-banding is intended to be. In theory this trick should save CPU clock cycles and code memory in read-only and write-only one bit operations. In our minds we expect from bit-banding direct one clock cycle operation of 1 byte (CPU code/data bus is always 32 bit) to the designated bit-banding address to change one bit in designated byte.

That's what I expected when I first read the documentation about bitbanding from ARM/TI.

Having tested it in practice I don't think it's actually designed to save cycles in the general case of single bit operations, but rather in cases where atomic RMW is necessary. The documentation suggests bitbanding is what you expect, but the implementation does not match that expectation.

I'd say the implementation complies with the letter of the law, but not the spirit.

**Attention** This is a public forum

Because of the Thanksgiving holiday in the U.S., TI E2E™ design support forum responses may be delayed from November 25 through December 2. Thank you for your patience.

MSP low-power microcontrollers

MSP low-power microcontroller forum

ARM Cortex M3/M4 bit-banding (Thumb2 issue)