Hi,
I am running a network with Relu/Relu6/Relu8.
The cost of time is 12ms with Relu and 33ms for Relu6 and Relu8.
As in supplied documents "ReLU8 can be performed with just shift operation in fixed point inference without needing a floating-point computation /look up table".
It shouldn't be the same speed as Relu6.
I am using Relu followed by a Clip(0,8) to implement relu8. SDK version is "rtos-j721e-evm-07_01_00_11".