Latency of cdc_acm driver in AArch64

This forum is for topics dealing with problems with software specifically in the AArch64 repo.

Latency of cdc_acm driver in AArch64

Postby axeloide » Thu Feb 23, 2017 6:34 pm

Hi everyone!

I'm having trouble with the performance of USB virtual serial ports (e.g. /dev/ttyACM0) on ARMv8 devices when accessing them via the python-pyserial (v3.2.1-2) package for Python (v3.6.0).
I'm aware of how crude my approach is, which I am about to explain, but I haven't much experience in investigating this sort of thing. I'd be very grateful if somebody could point me into the right direction.

Until January I had been using ARMv7 on Raspberry Pi 3 without any trouble: I could (and still can) send and receive data to my custom microcontroller-board.

This is the info given by lsusb on the Rpi3 running ARMv7:
$this->bbcode_second_pass_code('', '
Bus 001 Device 004: ID 2504:0301
Couldn't open device, some information will be missing
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 2 Communications
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x2504
idProduct 0x0301
bcdDevice 1.01
iManufacturer 1
iProduct 2
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 67
bNumInterfaces 2
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 0
iInterface 0
CDC Header:
bcdCDC 1.10
CDC Call Management:
bmCapabilities 0x01
call management
bDataInterface 1
CDC ACM:
bmCapabilities 0x06
sends break
line coding and serial state
CDC Union:
bMasterInterface 0
bSlaveInterface 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0010 1x 16 bytes
bInterval 8
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
')

And the dmesg of the Rpi3 running ARMv7 also looks fine:
$this->bbcode_second_pass_code('', '
[ 5.024936] usb 1-1.5: new full-speed USB device number 4 using dwc_otg
[ 5.130277] usb 1-1.5: language id specifier not provided by device, defaulting to English
[ 5.142658] usb 1-1.5: New USB device found, idVendor=2504, idProduct=0301
[ 5.145550] usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[...]
[ 6.598722] cdc_acm 1-1.5:1.0: ttyACM0: USB ACM device
[ 6.604938] usbcore: registered new interface driver cdc_acm
[ 6.606381] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
')

I've been now trying to get the very same microcontroller-board to communicate with exactly the same software and pySerial version running on AArch64 (BTW, the reason for this is because the mongodb package is not available anymore on 32bit platforms).
Initially I tried AArch64 on the Rpi3, but it seems quite immature and unstable and I assumed that the communication problems were due to this.
I therefore tried on an Odroid-C2 and was really impressed by the performance boost, but ran again into serial port communication problems, which this time I could not blame onto any sluggish performance issues of the OS as a whole.

Since my app is quite complex, I decided to narrow down the issue, by writing a minimal test script in Python, which only depends on the pySerial package:
$this->bbcode_second_pass_code('', '
import sys
import logging
import serial
import time

logger = logging.getLogger("test_serialport")
logger.setLevel(logging.INFO)
log_handler = logging.StreamHandler(sys.stdout)
log_format = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
log_handler.setFormatter(log_format)
logger.addHandler(log_handler)

# portname = 'COM6'
portname = '/dev/ttyACM0'
iterations = 10000


def compute_checksum(bytes_received):
checksum = 0
for b in bytes_received[:-2]:
checksum += b
if checksum > 65535: # Simulate overflowing of a UInt16 value.
checksum -= 65535
return checksum

worst_write_delay = 0.0
worst_read_delay = 0.0

def send_and_receive(bytes_to_send):
global worst_write_delay, worst_read_delay

start_write = time.perf_counter()
serial_port.write(bytes_to_send)
end_write = time.perf_counter()
write_delay = end_write - start_write
if write_delay > worst_write_delay:
worst_write_delay = write_delay

expected_reply_length = len(bytes_to_send) + 3

start_read = time.perf_counter()
message_received = serial_port.read(expected_reply_length)
end_read = time.perf_counter()
read_delay = end_read - start_read
if read_delay > worst_read_delay:
worst_read_delay = read_delay

assert len(message_received) == expected_reply_length

expected_checksum = compute_checksum(message_received).to_bytes(2, byteorder='big')
received_checksum = message_received[-2:]
assert received_checksum == expected_checksum

return message_received[:-2]

if __name__ == "__main__":

logger.info("Setting up pySerial object...")
serial_port = serial.Serial(port=portname,
rtscts=True,
dsrdtr=False,
timeout=2.0,
writeTimeout=2.0)

logger.info("Sending 'reset' command...")
cmd_reset = bytes.fromhex("80 00 00 00 00 00")
send_and_receive(cmd_reset)

logger.info("Sending %d echo commands and receiving their replies..." % iterations)
start_time = time.perf_counter()
for idx_attempt in range(iterations):
cmd_echo = bytes.fromhex("02 01 04") + idx_attempt.to_bytes(3, byteorder='big')
reply = send_and_receive(cmd_echo)
assert cmd_echo[:3] == reply[:3]
assert cmd_echo[-3:] == reply[-3:]
end_time = time.perf_counter()

elapsed_period_in_seconds = end_time - start_time

logger.info("Sent and received %d echo messages in %.0fms" % (iterations, elapsed_period_in_seconds*1000.0, ))
logger.info(" Bandwidth of writing: %.0f bytes/s" % (iterations * 6.0 / elapsed_period_in_seconds, ))
logger.info(" Bandwidth of reading: %.0f bytes/s" % (iterations * 9.0 / elapsed_period_in_seconds, ))
logger.info(" Worst write delay: %fms" % (worst_write_delay * 1000.0, ))
logger.info(" Worst read delay: %fms" % (worst_read_delay * 1000.0,))

serial_port.close()
')

These are some typical results on the Rpi3 running on ARMv7:
$this->bbcode_second_pass_code('', '
$ python test_serialport.py
2017-02-23 17:32:13,501 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:32:13,504 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:32:13,666 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:32:27,425 - test_serialport - INFO - Sent and received 10000 echo messages in 13759ms
2017-02-23 17:32:27,426 - test_serialport - INFO - Bandwidth of writing: 4361 bytes/s
2017-02-23 17:32:27,427 - test_serialport - INFO - Bandwidth of reading: 6541 bytes/s
2017-02-23 17:32:27,427 - test_serialport - INFO - Worst write delay: 5.701809ms
2017-02-23 17:32:27,428 - test_serialport - INFO - Worst read delay: 160.945806ms
$ python test_serialport.py
2017-02-23 17:35:06,387 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:35:06,388 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:35:07,194 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:35:20,948 - test_serialport - INFO - Sent and received 10000 echo messages in 13753ms
2017-02-23 17:35:20,949 - test_serialport - INFO - Bandwidth of writing: 4363 bytes/s
2017-02-23 17:35:20,950 - test_serialport - INFO - Bandwidth of reading: 6544 bytes/s
2017-02-23 17:35:20,950 - test_serialport - INFO - Worst write delay: 2.554345ms
2017-02-23 17:35:20,951 - test_serialport - INFO - Worst read delay: 804.715371ms
$ python test_serialport.py
2017-02-23 17:35:27,971 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:35:27,972 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:35:27,974 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:35:41,735 - test_serialport - INFO - Sent and received 10000 echo messages in 13760ms
2017-02-23 17:35:41,736 - test_serialport - INFO - Bandwidth of writing: 4360 bytes/s
2017-02-23 17:35:41,736 - test_serialport - INFO - Bandwidth of reading: 6541 bytes/s
2017-02-23 17:35:41,737 - test_serialport - INFO - Worst write delay: 9.331508ms
2017-02-23 17:35:41,738 - test_serialport - INFO - Worst read delay: 1.537795ms
')

And these are some typical results on the Odroid-C2 running on AArch64:
$this->bbcode_second_pass_code('', '
$ python test_serialport.py
2017-02-23 17:40:03,904 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:40:03,905 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:40:03,906 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:40:16,415 - test_serialport - INFO - Sent and received 10000 echo messages in 12508ms
2017-02-23 17:40:16,416 - test_serialport - INFO - Bandwidth of writing: 4797 bytes/s
2017-02-23 17:40:16,417 - test_serialport - INFO - Bandwidth of reading: 7195 bytes/s
2017-02-23 17:40:16,417 - test_serialport - INFO - Worst write delay: 1.519065ms
2017-02-23 17:40:16,417 - test_serialport - INFO - Worst read delay: 1.622070ms
$ python test_serialport.py
2017-02-23 17:40:18,959 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:40:18,961 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:40:18,962 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:40:31,469 - test_serialport - INFO - Sent and received 10000 echo messages in 12506ms
2017-02-23 17:40:31,469 - test_serialport - INFO - Bandwidth of writing: 4798 bytes/s
2017-02-23 17:40:31,470 - test_serialport - INFO - Bandwidth of reading: 7197 bytes/s
2017-02-23 17:40:31,470 - test_serialport - INFO - Worst write delay: 1.806076ms
2017-02-23 17:40:31,470 - test_serialport - INFO - Worst read delay: 1.546066ms
$ python test_serialport.py
2017-02-23 17:40:32,831 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:40:32,833 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:40:33,521 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:40:46,027 - test_serialport - INFO - Sent and received 10000 echo messages in 12506ms
2017-02-23 17:40:46,028 - test_serialport - INFO - Bandwidth of writing: 4798 bytes/s
2017-02-23 17:40:46,028 - test_serialport - INFO - Bandwidth of reading: 7197 bytes/s
2017-02-23 17:40:46,029 - test_serialport - INFO - Worst write delay: 684.704967ms
2017-02-23 17:40:46,029 - test_serialport - INFO - Worst read delay: 2.746117ms
$ python test_serialport.py
2017-02-23 17:40:49,296 - test_serialport - INFO - Setting up pySerial object...
2017-02-23 17:40:49,298 - test_serialport - INFO - Sending 'reset' command...
2017-02-23 17:40:50,128 - test_serialport - INFO - Sending 10000 echo commands and receiving their replies...
2017-02-23 17:41:02,638 - test_serialport - INFO - Sent and received 10000 echo messages in 12509ms
2017-02-23 17:41:02,639 - test_serialport - INFO - Bandwidth of writing: 4796 bytes/s
2017-02-23 17:41:02,640 - test_serialport - INFO - Bandwidth of reading: 7195 bytes/s
2017-02-23 17:41:02,640 - test_serialport - INFO - Worst write delay: 827.009477ms
2017-02-23 17:41:02,640 - test_serialport - INFO - Worst read delay: 2.740114ms
')

As you can see, the simplified script does actually work on both systems. I therefore rule out any problems with the µC-Board side of things (It actually runs fine on a Win10 PC, at about 7fold higher bandwidths). The main difference I see, is that the 32bit Rpi tends to have sporadic high latency in read operations, while the 64bit Odroid-C2 tends to have sporadic high latency in write operations. This would explain why my (complex) app is having trouble on AArch64, since it uses a read-timeout of zero, but a write-timeout of 500ms. Increasing the write-timeout doesn't fully solve my problems, because in situations where I attempt to write to the serial port while at the same time a high bandwidth of data is being received, the occasional high write-latency seems to cause a read-buffer overrun.

Just for the sake of completeness, these are the lsusb and dmesg infos on the Odroid-C2 running AArch64:
$this->bbcode_second_pass_code('', '
Bus 001 Device 003: ID 2504:0301
Couldn't open device, some information will be missing
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 2 Communications
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x2504
idProduct 0x0301
bcdDevice 1.01
iManufacturer 1
iProduct 2
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 67
bNumInterfaces 2
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 0
iInterface 0
CDC Header:
bcdCDC 1.10
CDC Call Management:
bmCapabilities 0x01
call management
bDataInterface 1
CDC ACM:
bmCapabilities 0x06
sends break
line coding and serial state
CDC Union:
bMasterInterface 0
bSlaveInterface 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0010 1x 16 bytes
bInterval 8
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
')

$this->bbcode_second_pass_code('', '
[ 4.972448] usb 1-1.3: new full-speed USB device number 3 using dwc_otg
[ 5.074417] usb 1-1.3: string descriptor 0 malformed (err = -61), defaulting to 0x0409
[ 5.080118] cdc_acm 1-1.3:1.0: This device cannot do calls on its own. It is not a modem.
[ 5.085984] cdc_acm 1-1.3:1.0: ttyACM0: USB ACM device
')

Any ideas on why the write latency does occasionally go up from about 2ms to about 800ms?
Is there any way to configure the size of the receive buffers of the cdc_acm kernel module?

Thank's in advance,
Axel.
axeloide
 
Posts: 1
Joined: Fri Feb 10, 2017 2:12 pm

Return to ARMv8

Who is online

Users browsing this forum: No registered users and 5 guests