Tensorflow lite quantization(float32 to int8) doesn not reduce delay

Hi, recently I just use the quantization skills (from float 32bits to integer 8 bits) on Xiao NRF52840. The mode size on the memory is reduced, but the inference time does not reduced. I check the datasheet of nrf52840, there is a FPU. I am wandering how you set the FPU and if I block the FPU, the inference time will be reduced ?