I set the clock to 2 MHz and checked the waveform again. 1 byte transfer time is
mbed : 24uS
non-mbed : 8.5uS
The throughput of mbed is about 1/3. The frame rate difference may be the correct result.
I’m also working on a “2-peripheral” version using non-mbed, but I do not even know if it is possible to expand to “2-peripheral” in the first place. I still need more time.