This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Where is the am3359 Crypto Hardware Accelerators documentation?

Other Parts Discussed in Thread: AM3359, CC2538, SHA-256, TM4C129ENCPDT

I want to develop a driver for the hardware random number generator on the am3359. I can't find any documentation about the crypto hardware accelerators in the latest TRM (Rev. I) nor the TI website. Can someone kindly point me to the documentation? Thanks!

  • That page doesn't seem to have any information about the registers used by the crypto accelerators. It just has information about the Linux driver. I want to develop a driver for another operating system (Minix) and I can't use the GPL code from the Linux driver (because Minix is BSD licensed).

    The TRM explains how to use peripherals like i2c, rtc, etc. I'm looking for something similar that explains the crypto accelerators, their registers, etc.

  • Hi,

    I am also interested in getting the technical documentation for the hardware crypto accelerator that is on the am335x boards.

    Who should I speak with and what are the requirements to get this documentation? (I'm assuming it does exist right?)

    If it does exist, is the reason it is not published publicly because of export restrictions?

    Thanks!

  • Hi,

    You should contact your nearest TI representative.

  • Biser Gatchev-XID said:

    Hi,

    You should contact your nearest TI representative.

    Hi Biser, thanks for your reply.

    Are you able to put me in contact with somebody? :-)

    Thanks,

    Cameron

  • Sorry, I don't know where you are located. You can find customer support from the main TI page: www.ti.com

  • The linux driver is basically the only public "documentation" available.  There's also this thread where I explained and explored the hardware RNG (it's kind of left unfinished due to lack of any futher response/feedback; I was getting the feeling I was performing a monologue without an audience).  It should still help you get started with the RNG and if someone motivates me by showing interest I'll probably continue that investigation (although it may have to wait till next week).

    Some components also appear in other TI hardware for which sometimes docs are available.  You may want to browse the docs for the Keystone security accelerator, and maybe download the CC2538 foundation firmware which also has useful info.  In both of those however there's a wrapper that's normally organizing things and e.g. doing key management, while on the AM335x you just have a raw components to work with, so only limited bits of pieces of info are useful.

    The CLKCTRL registers for the six modules are at offset 0x90 in PRCM, consecutively in order: RNG, AES 0, AES 1, DES, Hash, PKA.  For the addresses of the six components on the AM335x see my memory map: the Hash (MD5/SHA1/SHA-256) engine and the two AES engines are directly on the L3 interconnect, the RNG and public-key accelerator (PKA) reside on L4LS (aka L4-Per).  I haven't found the DES accelerator yet but I also haven't looked very hard since, well, DES, who cares :P  For IRQs you may find this spreadsheet I made useful.

    Note that on a GP device, normally both the "secure" and "public" contexts (and associated IRQs) are freely usable.  AES 1 seems to be uncooperative though, access results in error and the "L3 firewall error" IRQ gets asserted -- yet none of the L3 firewalls show an error (and they're all configured for open access anyway).  Odd.

    I don't think there are any copyright issues from merely studying the AES and Hash drivers and reimplementing them from scratch.  Due to vagueness of various aspects, some empirical science would be recommended anyway.  I also have some headers that are basically my notes on those components, I can check with my employer if it's okay if i release them.  The PKA is still a bit enigmatic but I've managed to poke it into doing what seemed to be a modular multiplication (on a DM814x, but the one on the AM335x reports the same version).  I have futher investigation of that thing on my to-do list.

    It's perhaps also worth mentioning most (if not all) of the accelerators appear to be SafeXcel Embedded IP with TI wrappers and/or other modifications: the RNG is SafeXcel EIP-75a, the PKA identifies itself as EIP-29, my notes claim the AES engines are EIP-38, and dunno about the Hash engine.  This information is of limited use since SafeNet is about as open as an epoxy-sealed oyster about their products.  In fact, the lack of info on EIP-29 is so impressive that this forum post will probably end up being the top google hit for it just because I mentioned it (twice). I guess these people are still true believers in security through obscurity.  [[edit: currently the only google hit actually]]

    A thought that crossed my mind is that it's possible TI isn't allowed to publicly release documentation on these components since they're not their IP.  That would somewhat explain the otherwise pretty silly state of pretending the accelerators don't exist in the TRM while releasing linux drivers from them at the same time.  I don't know, it clearly doesn't accomplish anything in the long run, other than requiring an unnecessary amount of effort to get this useful functionality up and running...

  • Hi, thanks again for your absolutely fantastic response!

    I'm thinking because the EIP-29 doesn't appear to exist maybe actually what they are is:

    PKA -> EIP-28

    AES -> EIP-39

    The reason I specifically think AES is EIP-39 is because of this post listing the AES modes->

    http://e2e.ti.com/support/arm/sitara_arm/f/791/t/156983.aspx

    The EIP-39 datasheet specfically lists 

    "- Feedback modes: ECB, CBC, CTR, CFB (128-bit), f8
    - Authentication only modes CBC-MAC, f9"

    EIP-38 doesn't appear to do f8 mode and CBC-MAC.

    The hash engine is probably from the EIP-57 family.

    Can you explain a little bit about why there is a "secure" and "public" context for AES/hash? (Aka what is that usually for on a non-GP device?)

    There are a few things I'm curious about that I'd like to just be reassured by documentation, such as, if AES_REG_KEY can be read back out somehow or is copied somewhere else where it could be read.

    Also if it  has a function to securely store a key, so you don't need to continuously write the AES_REG_KEY from main memory. 

    Anyway, I'll fiddle with it a bit... Just to add your explanation about the TRNG is great! And your spreadsheets have been very helpful :-)

  • Cameron Moree said:

    I'm thinking because the EIP-29 doesn't appear to exist maybe actually what they are is:

    PKA -> EIP-28

    Unfortunately not.  The version register unambiguously advertises EIP-29.  Another, less subtle, clue is that the register interfaces of 28 and 29 do not appear to be even vaguely similar.

    Cameron Moree said:

    The reason I specifically think AES is EIP-39 is because of this post listing the AES modes->

    http://e2e.ti.com/support/arm/sitara_arm/f/791/t/156983.aspx

    Interesting, I hadn't seen that.  My notes were probably just a guess since (unlike the RNG and PKA) there's no EIP identification register in the AES or Hash accelerators.  Also, a product brief on EIP-38 was one of the few documents I found and it seemed a reasonable match, while I still can't find a product brief on EIP-39 (although I have far greater appreciation for companies which publish information with some actual substance, you'd think that a company would at least make it easy to find some fluffy marketing material on the stuff they sell, but no).

    I had some time to play with the AES engine for the first time, which was actually quite easy: after reset, just wrote the desired mode, a key, an IV if applicable (it doesn't seem to be fussy about the order), then just write/read blocks of data when the status register indicates this is possible (or route that to irq or dma if desired). Haven't tried any of the fancy modes yet, just ECB/CBC/CFB, but always nice to get output that matches test-vectors.

    Cameron Moree said:

    Can you explain a little bit about why there is a "secure" and "public" context for AES/hash? (Aka what is that usually for on a non-GP device?)

    I haven't really had time to explore it yet, but my superficial understanding of it so far is that it just keeps two contexts (although I'd guess it can only operate on one at any given time), each with its own register interface to allow interconnect firewalls to be used for access control, which on HS devices are probably configured to restrict access to "secure" contexts to secure-world only.  On a GP device, the firewalls are open by default so you can put the two contexts to use as you see fit.  There is some slight asymmetry between the two though, some things that affect the entire peripheral such as a soft-reset can only be performed via the "secure" register interface.

    Cameron Moree said:

    There are a few things I'm curious about that I'd like to just be reassured by documentation, such as, if AES_REG_KEY can be read back out somehow or is copied somewhere else where it could be read.

    Also if it  has a function to securely store a key, so you don't need to continuously write the AES_REG_KEY from main memory.

    Well, it's definitely not plainly readable, but I don't know what the "save-context" stuff does exactly (or how it works). They're in the end just crypto accelerators, intended to be operated by an OS which is trusted.  I'm not aware of any secure key store facility on the AM335x, although the AES module does mention key-encryption stuff.

    In that regard things are very different on the DM814x/AM387x which is what I'm mostly working with: it has the same set of crypto accelerators (plus a second hash module), but all inside a "security subsystem" (SecSS) which also houses a Cortex-M3, some timers, a DMA controller, more than 128 KB of local RAM, and its local interconnect has the most elaborate firewall I've so far ever seen.  This obviously makes a comfortable environment for making an embedded "crypto smart card" or even a TPM (if "the keys to castle" are handed over to it during early boot, i.e. giving it sole right to reconfigure firewalls. It could probably inspect the state of the cortex-A8 via non-invasive processor trace to the ETB.)

  • Whoops, pressed Post too soon...

    Cameron Moree said:
    Anyway, I'll fiddle with it a bit...

    Let me know if you discover any surprises!  I won't have much time this week due to a deadline, but plan to play a bit more with the crypto stuff after that.  I'll also check if I can release some of my notes/headers/tests that may be helpful. Most of my test code is written in Forth though, which I believe is generally regarded as thoroughly unreadable by everyone who doesn't happen to belong to the (not very large) group of Forth enthausiasts ;-)

    Cameron Moree said:
    Just to add your explanation about the TRNG is great!

    Thanks!  Even though there's probably still stuff I ought to add, since iirc I didn't mention much about how to deal with the quality monitoring / alarm mechanism, or how to quickly find good settings for the "detune" bits using simulated annealing (the name is fancier than the code, it was like 10 lines or so)

    Cameron Moree said:
    And your spreadsheets have been very helpful :-)

    Yeah, sometimes the docs just don't seem to have all the relevant info together in the right way or something... spreadsheets also have their limitations, but nevertheless I find them quite useful.  (Probably the most complicated one I did so far is for the Centaurus pinout... most of that data is in the datasheet, but the pinmux table doesn't e.g. show which voltage supplies those pins are on, while the terminal functions listing makes it hard to get overview w.r.t. what conflicts with what)

  • Matthijs van Duin said:

    Interesting, I hadn't seen that.  My notes were probably just a guess since (unlike the RNG and PKA) there's no EIP identification register in the AES or Hash accelerators.  Also, a product brief on EIP-38 was one of the few documents I found and it seemed a reasonable match, while I still can't find a product brief on EIP-39 (although I have far greater appreciation for companies which publish information with some actual substance, you'd think that a company would at least make it easy to find some fluffy marketing material on the stuff they sell, but no).

    For the EIP-39 I did find this block of text, which I think was cached from a product brief pdf that was at one point available (if you are interested):

    http://pastebin.com/QDQ7hSfx

  • Thanks!  OK, so no major differences really between EIP-38 and EIP-39... some "classic" modes of operation nobody cares about were dumped, some practically relevant modes added instead (none of which add significant complexity afaik), lower gatecount, slower when measured in clock cycles but with higher max clock frequency to compensate.

    BTW, I suggest taking a look at the Tiva C series crypto drivers you can download from this page (scroll almost to the bottom).  The AES driver in there is the best EIP-39 documentation I've seen so far, and it's BSD-licensed.

  • Thanks for the tip about the Tiva C crypto drivers, unfortunatly the link on the Ti page is 404!

    Are you able to upload the files somewhere?

    Thanks!

  • Cameron Moree said:
    Thanks for the tip about the Tiva C crypto drivers, unfortunatly the link on the Ti page is 404!

    *sigh*

    Cameron Moree said:
    Are you able to upload the files somewhere?

    Here ya go: 2626.SW-TM4C-CRYPTO-DRV-2.0-alpha1-conf.zip

  • Thanks this is rather well commented, and lists how to set all the supported modes on the chip! :-D

    I plan to get XTS mode working on linux, and store the key outside of RAM in some registers that are cleared on reset.

    Currently the best location I know to store the key would be in the IV slots of AES 0 public/secure contexts. (I've verified that these are reset), maybe you have some good ideas about this? I still would like to find other good spots to stash the key, because I need more space if I was to use AES 256bit (with XTS I need 512bit keys, with CBC mode one of the IV slots is actually used)

    One other thing I need to verify, is if this implementation of AES is constant time. 

  • I'm not sure I follow... rather tautologically, all keying material needed for XTS mode obviously needs to be provided to the AES accelerator as keying material for using it in XTS mode, and it seems to be safe there.

    Cameron Moree said:
    One other thing I need to verify, is if this implementation of AES is constant time.

    Only software implementations sometimes screw up that part, I can't imagine any way a hardware implementation could go wrong there.

  • If anyone is interested, the AES/DES/SHA/MD5 documentation can be found in:

    www.ti.com/lit/ds/symlink/tm4c129encpdt.pdf

    I've looked at the AES documentation in this document, and it appears to be the same as what is in am335x :-D

    Unfortunately it doesn't seem to cover the RNG or PKA, but if you search hard enough, you probably can find something...

    Edit...

    This device technical document explains a very similar AES module (it appears to be an older version that doesn't have some of the advanced chaining modes), but it also documents a PKA, which is possibily the same or similar to what is on am335x:

    www.ti.com/lit/ug/swru319c/swru319c.pdf

  • Cameron Moree said:
    If anyone is interested, the AES/DES/SHA/MD5 documentation can be found in:

    http://www.ti.com/lit/ds/symlink/tm4c129encpdt.pdf

    I've looked at the AES documentation in this document, and it appears to be the same as what is in am335x :-D

    Nice catch! It indeed appears to be the same AES, Hash, and DES modules present in recent TI SoCs (DES absent in the am335x afaik).

    Cameron Moree said:
    Unfortunately it doesn't seem to cover the RNG or PKA, but if you search hard enough, you probably can find something...

    The RNG is somewhat documented in the Keystone security accelerator docs, and of course the linux kernel driver.

    Cameron Moree said:
    This device technical document explains a very similar AES module (it appears to be an older version that doesn't have some of the advanced chaining modes), but it also documents a PKA, which is possibily the same or similar to what is on am335x:

    http://www.ti.com/lit/ug/swru319c/swru319c.pdf

    Unfortunately not, it documents SafeXcel EIP-28 which is not the same as or even similar to the EIP-29 that's on the am335x (and dm814x etc). The PKA for now remains in a state of mystery.

  • Matthijs van Duin said:
    all keying material needed for XTS mode obviously needs to be provided to the AES accelerator as keying material for using it in XTS mode, and it seems to be safe there

    After some experimentation it turns out that the latter part is actually not the case for XTS: the IV register is read/write at all times, and XTS mode uses it to store the current tweak value which is security-sensitive.

     

    My other observations so far:

     

    I've tried several modes with indefinite length (no write to the length register). I've so far seen nothing that justifies the comment in the linux kernel driver that rekeying is necessary. In fact, I've been able to change IV, mode of operation, direction (encrypt/decrypt), even key size, and the key register was preserved through all of this.

     

    ECB stores the last ciphertext in IV just like CBC and CFB-128 do, even though it doesn't use it as input. This could be useful to initialize the IV to the encryption of some value (needed for various modes).

     

    CTR and f8 mode store the counter in IV, using big-endian encoding. For example, if a 32-bit counter size is used, it wraps as follows:

    aa aa bb bb cc cc dd dd ee ee ff ff ff ff ff fe
    aa aa bb bb cc cc dd dd ee ee ff ff ff ff ff ff
    aa aa bb bb cc cc dd dd ee ee ff ff 00 00 00 00
    aa aa bb bb cc cc dd dd ee ee ff ff 00 00 00 01

    ICM just means CTR mode with a 16-bit counter, no idea why they call it that. I don't think it's a valid selection for other modes which require a counter size to be configured (f8, GCM, CCM), but I haven't really tested this yet.

     

    XTS mode 1 ("previous/intermediate tweak loaded") also works fine in indefinite-length mode. It performs ECB encrypt/decrypt with pre- and post-xor of the tweak (stored in IV), and then updates the tweak by multiplying it with x in GF(2)[x]/(x128+x7+x2+x+1) represented in little-endian.  Example sequence of consecutive tweak values:

    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
    87 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0e 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    If you write some value 16·j to the "auth_len" register prior to en-/decrypting a block, the tweak value is first multiplied by xj before the block is processed, which is equivalent to processing j blocks (16·j bytes) of dummy data. This multiplication does not happen immediately when writing auth_len, you have to process a block of data.

     

    The two contexts ("secure" and "public") indeed appear to be independent sets of state registers (key, IV, control reg, etc), and can be used alternately without interference. Presumably if you try to use them simultaneously their access will be multiplexed at block level, most likely with the "secure" context taking precedence, but I haven't tested this theory yet.

     

    Other modes are still on my to-do list to try, when I have the time.

  • Matthijs van Duin said:
    I've tried several modes with indefinite length (no write to the length register). I've so far seen nothing that justifies the comment in the linux kernel driver that rekeying is necessary. In fact, I've been able to change IV, mode of operation, direction (encrypt/decrypt), even key size, and the key register was preserved through all of this.

    A minor gotcha: as the Snowflake (Tiva C129x) documentation explains, when the AES decryption primitive is needed (e.g. for ECB, CBC, and XTS decryption), the engine first needs to convert the encryption key into a decryption key (which takes 32 cycles). Although not visible, I've been able to infer that it stores the decryption key back into the key1 register and sets some internal flag to indicate it is a decryption key. If the direction is subsequently changed to encryption (or the mode changed to e.g. CTR or CFB-128) it will automatically convert the decryption key back into an encryption key. The existence of the latter logic also shows that the ability to change direction without manually rekeying is an explicit design feature and not an accident.

    This internal flag is cleared when you set a new key, but more generally it is cleared upon a write to any word of the key1 or key2 registers. This means that if (for some reason) you want to update only part of the key while the last operation was an ECB/CBC/XTS decryption, you first need to perform e.g. a dummy ECB encryption to ensure that key1 is in encryption form. Also, switching to f8 mode will produce garbage if key1 is in decryption form, I suspect because f8 mode implicitly updates key2 in its initialization procedure before key1 is used (see below). Again, injecting a dummy ECB encryption fixes the problem, although I should note that using the same key for different modes of operation is generally a Very Bad Idea™ anyway.
     

    Matthijs van Duin said:
    ECB stores the last ciphertext in IV just like CBC and CFB-128 do, even though it doesn't use it as input. This could be useful to initialize the IV to the encryption of some value (needed for various modes).

    Correction: ECB decryption (unlike CBC or CFB-128 decryption) leaves the plaintext instead of the ciphertext in the (readable) IV register. YIKES. Still, this is often not an issue because ECB is rarely used and it's common to leave the peripheral in a keyed state anyway, but when wiping the key state of the peripheral be sure to also zeroize the IV (or reset the peripheral, but note that a reset can only be performed via the secure context and also resets the public context, so that's not very polite).
     

    I've finally figured out how f8-mode works. Let aux_key and aux_iv denote the lower and upper half of the key2 register respectively (each two dwords, offsets { 0x18, 0x10 } for aux_key, offsets { 0x08, 0x00 } for aux_iv). The de-/encryption of a block of data is then given by:

    data ^= aux_iv = aes_encrypt( key1, aux_key ^ aux_iv ^ iv++ );

    The incrementation of iv works the same as in CTR mode, and is affected by the configured counter size. Contrary to my previous suspicion, 16-bit counter mode ("ICM") also works for f8-mode. When the first block of data is submitted, f8-mode first performs an initialization procedure given by:

    aux_key = aes_encrypt( aux_key, aux_iv );
    aux_iv = 0;

    There is afaik no way to prevent this initialization from happening, but you can overwrite some or all of the key registers after the initialization has been done. Initialization is only triggered again after a write to the control/status register. As mentioned in the Snowflake documentation, f8-mode requires 128-bit keys. In spite of this, the docs also give performance data for larger key sizes, and the engine doesn't refuse to be operated with them, but I've been unable to model its behaviour in that configuration. For a practical application of f8-mode, see RFC 3711 on SRTP (but if you plan on writing any code to it, be sure to also read its errata).
     

    Note: the data "register" is actually a 4-word wide FIFO port, supporting single-word access and multi-word bursts. The offset used for an access is unimportant as long as it's word-aligned and the burst is entirely contained inside the FIFO window. Since the window itself is at a 32-byte aligned address, EDMA FIFO mode (with 32-bit FIFO size to match the target port width) can be used to stream words into or out of it, but since you can't transfer more than 4 words per DMA request anyway I don't expect any performance benefit compared to using normal quadword transfers. When not using DMA, the best performance is probably achieved by using Neon quadword loads and stores (also for the key and IV registers), though the difference with using LDM / STM instructions is probably small.
     

    In case it's not obvious already, my intention is to make "here, in this thread" the answer to the original poster's question of where the documentation of the crypto accelerators is, at least for the AES module. The snowflake docs and driver/example code are helpful, but it's clear by now they are also far from being complete and accurate.
     

    I also made a slightly depressing observation: according to the linux clock tree the aes module is clocked directly from the main osc, so assuming the cycle timings from the Snowflake docs also apply here this yields a max throughput of 11.92 MB/s (32 cycles @ 25 MHz = 1.28 μs per 16-byte block). The best benchmark results I've seen for software AES-128-CTR implementations on the am335x go at 308 cycles per 16-byte block excluding key expansion, which already yields 14.86 MB/s @ 300 MHz and 49.54 MB/s @ 1 GHz. Still, the crypto accelerator runs in the background while the CPU can do other things, it probably consumes less power, and it's much more key-agile: performance on the Cortex-A8 drops rapidly for smaller messages, since the key schedule (of this implementation) takes as much time as encrypting about 300 bytes of data.

    I have no idea why the AES module would get such a slow clock though, since on the DM814x it can be clocked at 100 or 192 MHz (and the EIP-39 flyer even claims clock rates up to 600 MHz, no doubt in some particular implementation process that the marketing department favored). I haven't done any actual benchmarks yet.

  • Matthijs van Duin said:
    I haven't done any actual benchmarks yet.

    Finally got around to doing some preliminary benchmarks... but this turned out to be harder than I thought, since I wasn't using DMA and the cortex-a8 was having a lot of trouble getting data in/out of the AES module fast enough, even with the use of Neon load/store intrinsics. I eventually ended up with a loop that compiled to:

    .L8:
            vld1.64 {d16-d17}, [r3:64]  ; read (and discard) ciphertext
            vst1.64 {d18-d19}, [r3:64]  ; write next plaintext (constant)
            subs    r2, r2, #1
            bne     .L8

    Note the complete lack of test whether the AES has data available or is ready to accept new data. Nevertheless, this still didn't result in read underrun / write overrun, except when CCM mode was selected (which is twice as slow as all other modes).

    Evidently, am33xx-clocks.dtsi is wrong about the modules being clocked at 24 MHz. It actually looks more like the AES module is clocked at 200 MHz, i.e. it is hooked up to the L3F and its functional clock equals its interface clock. That would imply a max thoughput around 90-95 MB/s (for 128-bit keys, and excluding CCM). Sweet.