Appendix - PTP Guide

PTP Profiles Guide

Overview

This guide provides instructions on setting the Precision Time Protocol (PTP) profile of the Ouster sensor. The profile of the Ouster sensor and your master clock must match for time synchronization to be possible.

PTP Profiles

There are several PTP profiles that are commonly used. The supported profiles on the Ouster sensor are listed below:

"default" - The IEEE 1588 Default PTP profile addresses many common applications. Most PTP capable devices support the Default profile.

"gptp" - Generalized PTP (gPTP) is the common name for the IEEE standard 802.1AS-2011 which improves the interoperability of PTP by simplifying the supported options. The gPTP profile is useful when using the Ouster sensor with gPTP compatible hardware such as an Audio Visual Bridge (AVB), e.g. the MOTU AVB.

"automotive-slave" - The Automotive Slave PTP profile is an extension to gPTP for automotive specific use cases. In particular it disables the BMCA and handles time steps to expedite convergence.

"default-l2-relaxed" - This profile is based on the "default" profile, but with the network transport set to L2 and a relaxed 1 second time step threshold.

PTP HTTP API

The PTP profile of the sensor is changed using an HTTP PUT request. This can be done using several different tools such as HTTPie, curl, Advanced REST Client, etc.

The request URL is GET /api/v1/time/ptp/profile and PUT /api/v1/time/ptp/profile.

Valid values are (“”, are included):

"default"[1]

"gptp"[2]

"automotive-slave"[3]

"default-l2-relaxed"[4]

Note

Changing the PTP profile does not require reinitialization or writing the configuration text file to be persistent. It is persistent as soon a valid PUT request is executed and a valid response is received.

Enabling the PTP profiles

Below are some examples using popular command-line tools.

Example using cURL

In this example we are setting the PTP profile of the Ouster sensor to "gptp" using the cURL command line tool.

Command

curl -X PUT -H "Content-Type: application/json" -d '"gptp"' http://<sensor_hostname>/api/v1/time/ptp
/profile/

Response
```
"gptp"%
```

Example using HTTPie

In this example we are setting the PTP profile of the Ouster sensor to "default" using the HTTPie command line tool.

Command

http PUT http://<sensor_hostname>/api/v1/time/ptp/profile <<< '"default"'

Response
```
"default"%
```

Sync Verification

Please see the Verifying Operation section for details on how to verify the sensor is synchronized.

PTP Quickstart Guide

Overview

There are many configurations for a PTP network, this quick start guide aims to cover the basics by using Ubuntu 18.04 as an example. It provides configuration settings for a commercial PTP grandmaster clock and also provides directions on setting up a Linux computer (Ubuntu 18.04) to function as a PTP grandmaster.

The linuxptp project provides a suite of PTP tools that can be used to serve as a PTP master clock for a local network of sensors.

Assumptions

Command line Linux knowledge (e.g., package management, command line familiarity, etc.).
Ethernet interfaces that support hardware timestamping.
Ubuntu 18.04 is assumed for this tutorial, but any modern distribution should suffice.
Knowledge of systemd service configuration and management.
Familiarity with Linux permissions.

Physical Network Setup

Ensure the Ouster sensor is connected to the PTP master clock with at most one network switch. Ideally the sensor should be connected directly to the PTP grandmaster. Alternatively, a simple layer-2 gigabit Ethernet switch will suffice. Multiple switches are not recommended and will add unnecessary jitter.

Third Party Grandmaster Clock

A dedicated grandmaster clock should be used for the highest absolute accuracy often with a GPS receiver.

It must be configured with the following parameters which match the linuxptp client defaults:

Transport: UDP IPv4
Delay Mechanism: E2E
Sync Mode: Two-Step
Announce Interval: 1 - sent every 2 seconds
Sync Interval: 0 - sent every 1 second
Delay Request Interval: 0 - sent every 1 second

For more settings, review the port_data_set field returned from the sensor’s HTTP /time/ptp interface.

Linux PTP Grandmaster Clock

An alternative to an external grandmaster PTP clock is to run a local Linux PTP master clock if accuracy allows. This is often implemented on a vehicle computer that interfaces directly with the lidar sensors.

This section outlines how to configure a master clock.

Example Network Setup

This section assumes the following network setup as it has elements of a local master clock and the option for an upstream PTP time source.

+-------------------------------------+
| Ubuntu 18.04 System                 |
| * 2x Intel i210 Ethernet Interfaces |
| * Linux PTP service                 |
|                                     |
|      eno1                 eno2      |
+-------+---------------------+-------+
        |                     |
+-------+-------+    +--------+------+
| Trimble GM100 |    |               + +
|  GPS -> PTP   |    |  Ouster OS1   | |
|  grandmaster  |    |               | |
|  (optional)   |    |               | |
+---------------+    +---------------- |
                      +--------------- +

The focus is on configuring the Linux PTP service to serve a common clock to all the downstream Ouster sensors using the Linux system time from the Ubuntu host machine.

Optionally, a grandmaster clock can be added to discipline the system time of the Linux host.

Installing Necessary Packages

Several packages are needed for PTP functionality and verification:

linuxptp - Linux PTP package with the following components:
- ptp4l daemon to manage hardware and participate as a PTP node
- phc2sys to synchronize the Ethernet controller’s hardware clock to the Linux system clock or shared memory region
- pmc to query the PTP nodes on the network.
chrony - A NTP and PTP time synchronization daemon. It can be configured to listen to both NTP time sources via the Internet and a PTP master clock such as one provided by a GPS with PTP support. This will validate the time configuration makes sense given multiple time sources.
ethtool - A tool to query the hardware and driver capabilities of a given Ethernet interface.

$ sudo apt update
...
Reading package lists... Done
Building dependency tree
Reading state information... Done

$ sudo apt install linuxptp chrony ethtool
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  chrony ethtool linuxptp
0 upgraded, 3 newly installed, 0 to remove and 29 not upgraded.
Need to get 430 kB of archives.
After this operation, 1,319 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu bionic/main amd64 ethtool amd64 1:4.15-0ubuntu1 [114 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu bionic/universe amd64 linuxptp amd64 1.8-1 [112 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu bionic-updates/main amd64 chrony amd64 3.2-4ubuntu4.2 [203 kB]
Fetched 430 kB in 1s (495 kB/s)
Selecting previously unselected package ethtool.
(Reading database ... 117835 files and directories currently installed.)
Preparing to unpack .../ethtool_1%3a4.15-0ubuntu1_amd64.deb ...
Unpacking ethtool (1:4.15-0ubuntu1) ...
Selecting previously unselected package linuxptp.
Preparing to unpack .../linuxptp_1.8-1_amd64.deb ...
Unpacking linuxptp (1.8-1) ...
Selecting previously unselected package chrony.
Preparing to unpack .../chrony_3.2-4ubuntu4.2_amd64.deb ...
Unpacking chrony (3.2-4ubuntu4.2) ...
Setting up linuxptp (1.8-1) ...
Processing triggers for ureadahead (0.100.0-20) ...
ureadahead will be reprofiled on next reboot
Setting up chrony (3.2-4ubuntu4.2) ...
Processing triggers for systemd (237-3ubuntu10.13) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up ethtool (1:4.15-0ubuntu1) ...

Ethernet Hardware Timestamp Verification

Identify the Ethernet interface to be used on the client (Linux) machine,: e.g., eno1. Run the ethtool utility and query this network interface for supported capabilities.

Output of ethtool -T for a functioning Intel i210 Ethernet interface:

$ sudo ethtool -T eno1
Time stamping parameters for eno1:
Capabilities:
        hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
        software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
        hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
        software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
        software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
        hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
        off                   (HWTSTAMP_TX_OFF)
        on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
        none                  (HWTSTAMP_FILTER_NONE)
        all                   (HWTSTAMP_FILTER_ALL)

Configurations

Configuring `ptp4l` for Multiple Ports

On a Linux system with multiple Ethernet ports (i.e. Intel i210) /etc/linuxptp/ptp4l.conf needs to be configured to support all of them.

boundary_clock_jbod 1
[eno1]
[eno2]

Note

Add the above required modification at the end of the existing file. Deleting or editing the default settings section of the ptp4l.conf file will result in an error.

The default systemd service file for Ubuntu 18.04 attempts to use the eth0 address on the command line. Override systemd service file so that the configuration file is used instead of hard coded in the service file.

Create a systemd drop-in directory to override the system service file:

$ sudo mkdir -p /etc/systemd/system/ptp4l.service.d

Create a file at /etc/systemd/system/ptp4l.service.d/override.conf with the following contents:

[Service]
ExecStart=
ExecStart=/usr/sbin/ptp4l -f /etc/linuxptp/ptp4l.conf

Restart the ptp4l service so the change takes effect:

$ sudo systemctl daemon-reload
$ sudo systemctl restart ptp4l
$ sudo systemctl status ptp4l
* ptp4l.service - Precision Time Protocol (PTP) service
   Loaded: loaded (/lib/systemd/system/ptp4l.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/ptp4l.service.d
           └─override.conf
   Active: active (running) since Wed 2019-03-13 14:38:57 PDT; 3s ago
     Docs: man:ptp4l
 Main PID: 25783 (ptp4l)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/ptp4l.service
           └─25783 /usr/sbin/ptp4l -f /etc/linuxptp/ptp4l.conf

Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.756] port 1: INITIALIZING to LISTENING on INITIALIZE
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.756] driver changed our HWTSTAMP options
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.756] tx_type   1 not 1
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.756] rx_filter 1 not 12
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.756] port 2: INITIALIZING to LISTENING on INITIALIZE
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.757] port 0: INITIALIZING to LISTENING on INITIALIZE
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.757] port 1: link up
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.757] port 2: link down
Mar 13 14:38:57 leadlizard ptp4l[25783]: [590188.757] port 2: LISTENING to FAULTY on
FAULT_DETECTED (FT_UNSPECIFIED)
Mar 13 14:38:58 leadlizard ptp4l[25783]: [590189.360] port 1: new foreign master 001747.fffe.700038-1

The above systemctl status ptp4l console output shows systemd correctly reading the override file created earlier before starting several seconds after the restart command.

The log output shows that a grandmaster clock has been discovered on port 1 (eno1) and port 2 (eno2) is currently disconnected and in the faulty state as expected. In the test network a Trimble Thunderbolt PTP GM100 Grandmaster Clock is attached on eno1.

Logs can be monitored (i.e. followed) like so:

$ journalctl -f -u ptp4l
-- Logs begin at Fri 2018-11-30 06:40:50 PST. --
Mar 13 14:51:37 leadlizard ptp4l[25783]: [590948.224] master offset   -17 s2 freq  -25963 path delay 14183
Mar 13 14:51:38 leadlizard ptp4l[25783]: [590949.224] master offset   -13 s2 freq  -25964 path delay 14183
Mar 13 14:51:39 leadlizard ptp4l[25783]: [590950.225] master offset    35 s2 freq  -25920 path delay 14192
Mar 13 14:51:40 leadlizard ptp4l[25783]: [590951.225] master offset   -59 s2 freq  -26003 path delay 14201
Mar 13 14:51:41 leadlizard ptp4l[25783]: [590952.225] master offset   -24 s2 freq  -25986 path delay 14201
Mar 13 14:51:42 leadlizard ptp4l[25783]: [590953.225] master offset   -39 s2 freq  -26008 path delay 14201
Mar 13 14:51:43 leadlizard ptp4l[25783]: [590954.225] master offset    53 s2 freq  -25928 path delay 14201
Mar 13 14:51:44 leadlizard ptp4l[25783]: [590955.226] master offset   -85 s2 freq  -26050 path delay 14207
Mar 13 14:51:45 leadlizard ptp4l[25783]: [590956.226] master offset   127 s2 freq  -25863 path delay 14207
Mar 13 14:51:46 leadlizard ptp4l[25783]: [590957.226] master offset     9 s2 freq  -25943 path delay 14208
Mar 13 14:51:47 leadlizard ptp4l[25783]: [590958.226] master offset   -23 s2 freq  -25973 path delay 14208
Mar 13 14:51:48 leadlizard ptp4l[25783]: [590959.226] master offset   -61 s2 freq  -26018 path delay 14190
Mar 13 14:51:49 leadlizard ptp4l[25783]: [590960.226] master offset    69 s2 freq  -25906 path delay 14190
Mar 13 14:51:50 leadlizard ptp4l[25783]: [590961.226] master offset   -73 s2 freq  -26027 path delay 14202
Mar 13 14:51:51 leadlizard ptp4l[25783]: [590962.226] master offset    19 s2 freq  -25957 path delay 14202
Mar 13 14:51:52 leadlizard ptp4l[25783]: [590963.226] master offset   147 s2 freq  -25823 path delay 14202
...

Configuring `ptp4l` as a Local Master Clock

The IEEE-1588 Best Master Clock Algorithm (BMCA) will select a grandmaster clock based on a number of masters. In most networks there should be only a single master. In the example network the Ubuntu machine will be configured with a non-default clockClass so its operation qualifies it to win the BMCA.

Replace the default value with a lower clock class (higher priority) and restart linuxptp. Edit /etc/linuxptp/ptp4l.conf and comment out the default clockClass value and insert a line setting it 128.

#clockClass     248
clockClass      128

Restart ptp4l so the configuration change takes effect.

$ sudo systemctl restart ptp4l

This will configure ptp4l to advertise a master clock on eno2 as a clock that will win the BMCA for an Ouster OS1 sensor.

However, the ptp4l service is only advertising the Ethernet controller’s PTP hardware clock, not the Linux system time as is often expected.

Configuring `phc2sys` to Synchronize the System Time to the PTP Clock

To synchronize the Linux system time to the PTP hardware clock the phc2sys utility needs to be run. The following configuration will tell phc2sys to take the Linux CLOCK_REALTIME and write that time to the PTP hardware clock in the Ethernet controller for eno2. These interfaces are then connected to PTP slaves such as Ouster OS1 sensors.

Create a systemd drop-in directory to override the system service file:

$ sudo mkdir -p /etc/systemd/system/phc2sys.service.d

Create a file at /etc/systemd/system/phc2sys.service.d/override.conf with the following contents:

[Service]
ExecStart=
ExecStart=/usr/sbin/phc2sys -w -s CLOCK_REALTIME -c eno2

Note

If multiple interfaces need to be synchronized from CLOCK_REALTIME then multiple instances of the phc2sys service need to be run as it only accepts a single slave (i.e. -c) argument.

Restart the phc2sys service so the change takes effect:

$ sudo systemctl daemon-reload
$ sudo systemctl restart phc2sys
$ sudo systemctl status phc2sys

Configuring Chrony to Set System Clock Using PTP

An upstream PTP grandmaster clock (e.g., a GPS disciplined PTP clock) can be used to set the system time if precise absolute time is needed for sensor data.

Chrony is a Linux time service that can read from NTP and PTP and set the Linux system time using the most accurate source available. With a properly functioning PTP grandmaster the PTP time source will be selected and the error from the public time servers can be reviewed.

The following phc2shm service will synchronize the time from eno1 (where the external grandmaster is attached) to the system clock.

Create a file named /etc/systemd/system/phc2shm.service with the following contents:

# /etc/systemd/system/phc2shm.service
[Unit]
Description=Synchronize PTP hardware clock (PHC) to NTP SHM
Documentation=man:phc2sys
After=ntpdate.service
Requires=ptp4l.service
After=ptp4l.service

[Service]
Type=simple
ExecStart=/usr/sbin/phc2sys -s eno1 -E ntpshm -w

[Install]
WantedBy=multi-user.target

Then start the newly created service and check that it started.

$ sudo systemctl start phc2shm
$ sudo systemctl status phc2shm

Add the PTP time source to the chrony configuration which will read the shared memory region managed by the phc2shm service created above.

Append the following to the /etc/chrony/chrony.conf file:

refclock SHM 0 poll 1 refid ptp

Restart chrony so the updated configuration file takes effect:

$ sudo systemctl restart chrony

After waiting a minute for the clock to synchronize, review the chrony client timing accuracy:

$ chronyc tracking
Reference ID    : 70747000 (ptp)
Stratum         : 1
Ref time (UTC)  : Thu Mar 14 02:22:58 2019
System time     : 0.000000298 seconds slow of NTP time
Last offset     : -0.000000579 seconds
RMS offset      : 0.001319735 seconds
Frequency       : 0.502 ppm slow
Residual freq   : -0.028 ppm
Skew            : 0.577 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000003448 seconds
Update interval : 2.0 seconds
Leap status     : Normal

$ chronyc sources -v
210 Number of sources = 9

  .-- Source mode  '~' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| /   '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* ptp                           0   1   377     1    +27ns[  +34ns] +/-  932ns
~- chilipepper.canonical.com     2   6   377    61   -482us[ -482us] +/-   99ms
~- pugot.canonical.com           2   6   377    62   -498us[ -498us] +/-  112ms
~- golem.canonical.com           2   6   337    59   -467us[ -468us] +/-   95ms
~- alphyn.canonical.com          2   6   377    58   +957us[ +957us] +/-   95ms
~- legacy13.chi1.ntfo.org        3   6   377    62    -10ms[  -10ms] +/-  178ms
~- tesla.selinc.com              2   6   377   128   +429us[ +514us] +/-   42ms
~- io.crash-override.org         2   6   377    59   +441us[ +441us] +/-   58ms
~- hadb2.smatwebdesign.com       3   6   377    58  +1364us[+1364us] +/-   99ms

Note that the Reference ID matches the ptp reference ID from the chrony.conf file and that the sources output shows the ptp reference ID as selected (signified by the * state in the second column). Additionally, the NTP time sources show a small relative error to the high accuracy PTP time source.

In this case the PTP grandmaster is properly functioning.

If this error is large, chrony will select the NTP time sources and mark the PTP time source as invalid. This typically signifies that something is mis-configured with the PTP grandmaster upstream of this device or the linuxptp configuration.

Verifying Operation

Changing the profile prompts the sensor to restart the synchronization process, generally preferred over rebooting the entire product. Typically, only the relaxed profiles adjust the clock more than once.

Verifying PTP Operation for Rev7 with FW v3.1 and later
PTP Parameters	Default	gPTP	Automotive Slave	Default-L2 Relaxed
`parent_data_set.grandmaster_identity`	`00005e.fffe. 005301`	`00005e.fffe. 005301`	NA	`00005e.fffe. 005301`
`port_data_set.port_state`	`SLAVE`	`SLAVE`	`SLAVE`	`SLAVE`
`time_status_np.gm_present`	`true`	`true`	`false`	`true`
`time_status_np.master_offset`	< 1μs	< 1μs	< 1μs	< 1μs

PTP Example JSON Response for “Profile”: "default"

To Query Sensor PTP State: Refer to GET /api/v1/time/ptp/profile and PUT /api/v1/time/ptp/profile.

{
  "profile": "default",
  "parent_data_set":
  {
    "grandmaster_identity": "001747.fffe.700038",
    "parent_port_identity": "ac1f6b.fffe.1db84e-2",
    "parent_stats": 0,
    "gm_clock_class": 6,
    "observed_parent_clock_phase_change_rate": 2147483647,
    "gm_clock_accuracy": 33,
    "gm_offset_scaled_log_variance": 65535,
    "grandmaster_priority1": 128,
    "grandmaster_priority2": 128,
    "observed_parent_offset_scaled_log_variance": 65535
  },
  "current_data_set":
  {
    "steps_removed": 1,
    "offset_from_master": 61355,
    "mean_path_delay": 117977.0
  },
  "port_data_set":
  {
    "port_state": "SLAVE",
    "peer_mean_path_delay": 0,
    "log_min_delay_req_interval": 0,
    "port_identity": "bc0fa7.fffe.c48254-1",
    "log_sync_interval": 0,
    "log_announce_interval": 1,
    "delay_mechanism": 1,
    "log_min_pdelay_req_interval": 0,
    "announce_receipt_timeout": 3,
    "version_number": 2
  },
  "time_status_np":
  {
    "gm_time_base_indicator": 0,
    "gm_identity": "001747.fffe.700038",
    "cumulative_scaled_rate_offset": 0,
    "scaled_last_gm_phase_change": 0,
    "ingress_time": 0,
    "master_offset": 61355,
    "last_gm_phase_change": "0x0000'0000000000000000.0000",
    "gm_present": true
  },
  "time_properties_data_set":
  {
    "frequency_traceable": 0,
    "leap61": 0,
    "time_traceable": 0,
    "current_utc_offset": 37,
    "leap59": 0,
    "current_utc_offset_valid": 0,
    "time_source": 160,
    "ptp_timescale": 1
  }
}

LinuxPTP PMC Tool

The sensor will respond to PTP management messages. The linuxptp pmc (see man pmc) utility can be used to query all PTP devices on the local network.

On the Linux host for the pmc utility to communicate with then run the following command:

$ sudo pmc 'get PARENT_DATA_SET' 'get CURRENT_DATA_SET' 'get PORT_DATA_SET' 'get TIME_STATUS_NP' -i eno2
sending: GET PARENT_DATA_SET
sending: GET CURRENT_DATA_SET
sending: GET PORT_DATA_SET
sending: GET TIME_STATUS_NP
        bc0fa7.fffe.c48254-1 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET
                parentPortIdentity                    ac1f6b.fffe.1db84e-2
                parentStats                           0
                observedParentOffsetScaledLogVariance 0xffff
                observedParentClockPhaseChangeRate    0x7fffffff
                grandmasterPriority1                  128
                gm.ClockClass                         6
                gm.ClockAccuracy                      0x21
                gm.OffsetScaledLogVariance            0x4e5d
                grandmasterPriority2                  128
                grandmasterIdentity                   001747.fffe.700038
        bc0fa7.fffe.c48254-1 seq 1 RESPONSE MANAGEMENT CURRENT_DATA_SET
                stepsRemoved     2
                offsetFromMaster 61355.0
                meanPathDelay    117977.0
        bc0fa7.fffe.c48254-1 seq 2 RESPONSE MANAGEMENT PORT_DATA_SET
                portIdentity            bc0fa7.fffe.c48254-1
                portState               SLAVE
                logMinDelayReqInterval  0
                peerMeanPathDelay       0
                logAnnounceInterval     1
                announceReceiptTimeout  3
                logSyncInterval         0
                delayMechanism          1
                logMinPdelayReqInterval 0
                versionNumber           2
        bc0fa7.fffe.c48254-1 seq 3 RESPONSE MANAGEMENT TIME_STATUS_NP
                master_offset              61355
                ingress_time               0
                cumulativeScaledRateOffset +0.000000000
                scaledLastGmPhaseChange    0
                gmTimeBaseIndicator        0
                lastGmPhaseChange          0x0000'0000000000000000.0000
                gmPresent                  true
                gmIdentity                 001747.fffe.700038

Tested Grandmaster Clocks

Trimble Thunderbolt PTP GM100 Grandmaster Clock

Firmware version: 20161111-0.1.4.0, November 11 2016 15:58:25
PTP configuration:

 > get ptp eth0
             Enabled : Yes
            Clock ID : 001747.fffe.700038-1
             Profile : 1588
       Domain number : 0
  Transport protocol : IPV4
             IP Mode : Multicast
     Delay Mechanism : E2E
           Sync Mode : Two-Step
         Clock Class : 6
          Priority 1 : 128
          Priority 2 : 128
       Multicast TTL : 0
       Sync interval : 0
    Del Req interval : 0
       Ann. interval : 1
Ann. receipt timeout : 3

Ubuntu 18.04 + Linux PTP as a master clock
- Intel i210 Ethernet interface
- PCI hardware identifiers: 8086:1533 (rev 03)
Ubuntu 18.04 kernel package: linux-image-4.18.0-16-generic
Ubuntu 18.04 linuxptp package: linuxptp-1.8-1

Analyzing Linux Networking Issues

Note

Users are recommended to follow this section only in the case of intermittent packet drops or packet reordering. Please make sure to double check udp_dest settings at the beginning of this section, as the information provided is not useful if users are getting zero data.

In case the users are getting zero data and are unable to resolve the issue please contact our Field Application Team.

This section captures tools and procedures to troubleshoot networking issues for a system consisting of a PC/Workstation L2 Switch and one or more Ouster Sensors. Though examples use the Linux Operating System as a model, the material is equally relevant to debugging issues in the Windows environment. Where possible Windows command-line and UI analogs will be discussed in passing.

Debugging the Workstation Data Path

The workstation maintains a set of statistics associated with each layer in the network stack that can be used to diagnose packet loss. The correct way to approach a network stack problem is to start with the lowest layer in the stack first, examine the statistics for errors, and work your way up to the highest layer. The reason that we start with the lowest layer is that issues in the lowest layer can cause issues in other parts of the data-path.

Link Layer Statistics and Configuration

ethtool

In Linux, ethtool is used to query the NIC for statistics as well as view and change the NIC configuration. Linux also offers more generic mechanisms to do this by writing/reading keys in the kernel file-system. Ethtool is often the tool that is widely use to debug system, and is generally the most complete system for configuration and debug. Ethtool is a double edged-sword, because ethtool is vendor-centric the output of its commands and range of configuration options will be slightly different depending on which NIC is used.

Line Interface Statistics

The most useful starting point when debugging the link-layer is to examine the line-interface statistics, these are queried with ethtool -S <ethX> where ethX is the identifier of the NIC as listed by ifconfig, if the device has multiple NICs and you are uncertain which NIC is receiving the traffic, run some traffic and monitor the stats reported by ifconfig.

Note

The output of ethtool -S <ethX> is 100% NIC vendor specific and will be quite different depending on NIC vendor used in your system.

Example: Output of ethtool -S:

NIC statistics:
    rx_packets: 0
    tx_packets: 0
    rx_bytes: 0
    tx_bytes: 0
    rx_broadcast: 0
    tx_broadcast: 0
    rx_multicast: 0
    tx_multicast: 0
    rx_errors: 0
    tx_errors: 0
    tx_dropped: 0
    multicast: 0
    collisions: 0
    rx_length_errors: 0
    rx_over_errors: 0
    rx_crc_errors: 0
    rx_frame_errors: 0
    rx_no_buffer_count: 0
    rx_missed_errors: 0
    tx_aborted_errors: 0
    tx_carrier_errors: 0
    tx_fifo_errors: 0
    tx_heartbeat_errors: 0
    tx_window_errors: 0
    tx_abort_late_coll: 0
    tx_deferred_ok: 0
    tx_single_coll_ok: 0
    tx_multi_coll_ok: 0
    tx_timeout_count: 52
    tx_restart_queue: 0
    rx_long_length_errors: 0
    rx_short_length_errors: 0
    rx_align_errors: 0
    tx_tcp_seg_good: 0
    tx_tcp_seg_failed: 0
    rx_flow_control_xon: 0
    rx_flow_control_xoff: 0
    tx_flow_control_xon: 0
    tx_flow_control_xoff: 0
    rx_csum_offload_good: 0
    rx_csum_offload_errors: 0
    rx_header_split: 0
    alloc_rx_buff_failed: 0
    tx_smbus: 0
    rx_smbus: 0
    dropped_smbus: 0
    rx_dma_failed: 0
    tx_dma_failed: 0
    rx_hwtstamp_cleared: 0
    uncorr_ecc_errors: 0
    corr_ecc_errors: 0
    tx_hwtstamp_timeouts: 0
    tx_hwtstamp_skipped: 0

MAC Errors

Users are mainly interested in the path where the sensor is transmitting to the workstation, focusing on the “rx” (receive) statistics. Generally, anything that is labeled as rx.*error on this NIC constitutes a stats that might be helpful in diagnosing the problem.

Based on the NIC, these “error” statistics are primarily associated with problems identified by the MAC. Such problems are generally indicative of an L1 problem (though they could also indicate a problem with the link-partner’s MAC), such as a loose connector, faulty transceiver, or an out-of-spec cable.

Internal System Errors

User might come across stats like rx_dma_failed and rx_no_buffer_count that do not have an “error” postfix but constitute very real errors. These are indicative of failures in the hand-off between the NIC driver.

Solving MAC Errors

If users encounter MAC errors this most likely points to a cabling issue, so the first step would be to replace the cable. If the errors persist, the next step would be to try to test against a different node. One can use the “iPerf” or “iPerf3” utility (discussed below) to validate that the workstation against another workstation computer. A final step would be to swap out the sensor.

Solving Internal System Errors

These errors are often the most difficult to understand. It can be quite surprising that the MAC is receiving everything and traffic is still being dropped. The root cause is generally that the processor cannot handle the peak rate. Though the average load may be only a few hundred megabits, the real situation is that all traffic received by the NIC arrives at line rate – for a 10G NIC this means that many frames may be received back-to-back at the line rate of the NIC.

Just how many frames arrive depends on the behavior of the sensors. Ouster sensor attempts to transmit the data as it is captured. Assuming a 40K (on the wire) LiDAR frame and 10 sensors, the worst case load will be 40K x 10 = 400K at 10G (since the peak transmit rate of each sensor is 1G x 10 = 10G.) 400K is a lot of 10G data to process all at once, and without hardware buffering things will certainly fail.

The NIC maintains a hardware ring-buffer or on advanced hardware, potentially multiple ring-buffers. The entries in the ring-buffer are pointers into kernel packet-buffer structures. This mechanism enables the NIC to efficiently deliver packets to the kernel at line rate. For our specific use-case the default size of this ring-buffer may be too small.

To update this value user can use ethtool:

ethtool -g <ethX> will display the current setting and device limits
ethtool -G <ethX> rx <value> is used to update the setting

Example: Using a laptop/system, ring-buffer has enough buffer for 256 entries by default:

ethtool -g enp0s31f6
Ring parameters for enp0s31f6:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

To find out how much buffer is sufficient we can apply the burst-tolerance equation:

fill_rate = NIC_line_speed - max_measured_throughput
fill_time = rx_buffer_size * 1518 * 8 / fill_rate
MBS = fill_time * NIC_line_speed

Note

It is not always easy to obtain max_measured_throughput, and in a busy workstation it can be subject to variable delay.

As a rule-of-thumb we need to at least accommodate one max-burst (one LiDAR packet) from the sensor. Assuming a 40KB LiDAR packet that’s 40KB/1518=27 frames. So 256 should be more than adequate.

However, even with the default buffer of 256, user can observe packet loss due to DMA errors. This is because the work-station is not a real-time system and the delay can be quite variable. Linux uses a technique called interrupt coalescence that determines how often it will service the driver, when it gets very busy.

Interrupt coalescence is controlled by the kernel filesystem key:

/proc/sys/net/core/netdev_budget_usecs and by default its 8000us!

If the problem is not resolved by increasing the buffer size, its possible to reduce netdev_budget_usecs in order to favor moving data over other activities that the system could be doing. Its also possible to increase the maximum number of frames the OS is willing to process when the line interface does get serviced which is controlled by:

/proc/sys/net/core/netdev_budget

Note

On some systems the user need to make the rx-ring-buffer quite large or disable interrupt coalescence all together.

In addition to the “soft” interrupt coalescence that is found under /proc/sys/net/core the NIC itself will delay the hardware interrupt. User can find the settings with ethtool in the usual way. Here is an example that shows the ACQ107’s default settings:

ethtool -c enp4s0
Coalesce parameters for enp4s0:
Adaptive RX: off
TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 112
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0
tx-usecs: 510
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frames-low: 0
tx-usecs-low: 0
tx-frames-low: 0
rx-usecs-high: 0
rx-frames-high: 0
tx-usecs-high: 0
tx-frames-high: 0

Another useful parameter is the /proc/sys/net/core/netdev_max_backlog. The backlog queue, is a FIFO on the other side of the NIC ring-buffer. Increasing the backlog buffer is one more way to add capacity earlier in the data-path. Its difficult to determine when to increase netdev_max_backlog vs increasing the rx ring-buffer. Certainly the ring-buffer is the only place where we can add capacity that can absorb traffic bursts at line rate.

Troubleshooting Advanced NICs

Advanced hardware interfaces have multiple ring-buffers that are typically mapped to different CPU cores (a technique known as RSS.) Each NIC has it’s own proprietary scheme for mapping input traffic flows to ring-buffers, and sometimes a NIC will incorrectly split a traffic flow into multiple FIFOs. If you see this behavior it means that the NIC itself will cause frames to be reordered in a way that will horribly disrupt the IP stack above it. The ACQ107 is one such NIC. The problem can be identified by looking at ethtool -S <ethX>. The NIC will list stats for each FIFO, and by sending a single large traffic flow we can see that device errantly split the flow into all of the different FIFOs. Below you can see that this NIC has stats labeled Queue[0] … Queue[7].

Example:

ethtool -S enp4s0
NIC statistics:
InPackets: 350287807
InUCast: 350048688
InMCast: 231724
InBCast: 7395
InErrors: 0
OutPackets: 363162007
OutUCast: 363160208
OutMCast: 1306
OutBCast: 493
InUCastOctets: 525223100117
OutUCastOctets: 545214487081
InMCastOctets: 16440320
OutMCastOctets: 206101
InBCastOctets: 1316312
OutBCastOctets: 58497
InOctets: 525240856749
OutOctets: 545214751679
InPacketsDma: 23207849
OutPacketsDma: 22064728
InOctetsDma: 34568308793
OutOctetsDma: 33164524696
InDroppedDma: 2002075
Queue[0] InPackets: 23087183
Queue[0] InJumboPackets: 0
Queue[0] InLroPackets: 0
Queue[0] InErrors: 0
Queue[0] AllocFails: 0
Queue[0] SkbAllocFails: 0
Queue[0] Polls: 7373190
Queue[0] OutPackets: 649028
Queue[0] Restarts: 0
Queue[1] InPackets: 80
Queue[1] InJumboPackets: 0
Queue[1] InLroPackets: 0
Queue[1] InErrors: 0
Queue[1] AllocFails: 0
Queue[1] SkbAllocFails: 0
Queue[1] Polls: 14672
Queue[1] OutPackets: 1651541
Queue[1] Restarts: 0
Queue[2] InPackets: 103
Queue[2] InJumboPackets: 0
Queue[2] InLroPackets: 0
Queue[2] InErrors: 0
Queue[2] AllocFails: 0
Queue[2] SkbAllocFails: 0
Queue[2] Polls: 215484
Queue[2] OutPackets: 3815296
Queue[2] Restarts: 0
Queue[3] InPackets: 269
Queue[3] InJumboPackets: 0
Queue[3] InLroPackets: 0
Queue[3] InErrors: 0
Queue[3] AllocFails: 0
Queue[3] SkbAllocFails: 0
Queue[3] Polls: 14469
Queue[3] OutPackets: 1580307
Queue[3] Restarts: 0
Queue[4] InPackets: 119681
Queue[4] InJumboPackets: 0
Queue[4] InLroPackets: 0
Queue[4] InErrors: 0
Queue[4] AllocFails: 0
Queue[4] SkbAllocFails: 0
Queue[4] Polls: 157920
Queue[4] OutPackets: 3670607
Queue[4] Restarts: 0
Queue[5] InPackets: 83
Queue[5] InJumboPackets: 0
Queue[5] InLroPackets: 0
Queue[5] InErrors: 0
Queue[5] AllocFails: 0
Queue[5] SkbAllocFails: 0
Queue[5] Polls: 9006
Queue[5] OutPackets: 931971
Queue[5] Restarts: 0
Queue[6] InPackets: 407
Queue[6] InJumboPackets: 0
Queue[6] InLroPackets: 0
Queue[6] InErrors: 0
Queue[6] AllocFails: 0
Queue[6] SkbAllocFails: 0
Queue[6] Polls: 15387
Queue[6] OutPackets: 1636793
Queue[6] Restarts: 0
Queue[7] InPackets: 43
Queue[7] InJumboPackets: 0
Queue[7] InLroPackets: 0
Queue[7] InErrors: 0
Queue[7] AllocFails: 0
Queue[7] SkbAllocFails: 0
Queue[7] Polls: 11584
Queue[7] OutPackets: 343508
Queue[7] Restarts: 0
PTP Queue[16] InPackets: 0
PTP Queue[16] InJumboPackets: 0
PTP Queue[16] InLroPackets: 0
PTP Queue[16] InErrors: 0
PTP Queue[16] AllocFails: 0
PTP Queue[16] SkbAllocFails: 0
PTP Queue[16] Polls: 0
PTP Queue[16] OutPackets: 0
PTP Queue[16] Restarts: 0
PTP Queue[31] InPackets: 0
PTP Queue[31] InJumboPackets: 0
PTP Queue[31] InLroPackets: 0
PTP Queue[31] InErrors: 0
PTP Queue[31] AllocFails: 0
PTP Queue[31] SkbAllocFails: 0
PTP Queue[31] Polls: 0
MACSec InCtlPackets: 0
MACSec InTaggedMissPackets: 0
MACSec InUntaggedMissPackets: 23252064
MACSec InNotagPackets: 23252064
MACSec InUntaggedPackets: 0
MACSec InBadTagPackets: 0
MACSec InNoSciPackets: 0
MACSec InUnknownSciPackets: 0
MACSec InCtrlPortPassPackets: 0
MACSec InUnctrlPortPassPackets: 23252064
MACSec InCtrlPortFailPackets: 0
MACSec InUnctrlPortFailPackets: 0
MACSec InTooLongPackets: 0
MACSec InIgpocCtlPackets: 0
MACSec InEccErrorPackets: 0
MACSec InUnctrlHitDropRedir: 0
MACSec OutCtlPackets: 1
MACSec OutUnknownSaPackets: 22064727
MACSec OutUntaggedPackets: 0
MACSec OutTooLong: 0
MACSec OutEccErrorPackets: 0
MACSec OutUnctrlHitDropRedir: 0

The vendor provided a workaround in their README.

Note

RSS for UDP

Currently, NIC does not support RSS for fragmented IP packets, which leads to an incorrect handling of RSS for fragmented UDP traffic. To disable RSS for UDP one can use the following RX Flow L3/L4 rule: ethtool -N eth0 flow-type udp4 action 0 loc 32

When Stats Fail

Sometimes a NIC will drop frames without any error stats incrementing. When this happens, the issue can be detected by inserting a managed L2 switch in between the sensor and the workstation. The managed switch will report receive and transmit stats, which can be correlated against the rx stats of the NIC to determine that the NIC has dropped frames without incrementing any stat.

IP Statistics

After the link layer the next layer up is IP. IP errors can be identified with the netstat tool:

netstat -s

This tool will output a lot of information, but in this document we will focus on only the IP section.

In this report you can see that there are a few different error categories, and you have to review carefully through all of the text to find them:

Let’s look at each class of error and consider its implications:

Packets received with invalid address means that they were sent to our MAC, but with an incorrect source IP.
Packets dropped because of missing route indicates that the packet was sent to the correct IP address but no client program was listening on the destination port.
Fragments dropped after timeout means that we received some data but subsequent data didn’t arrive in time to be processed.
Fragments reassemblies failed means that some data was missing due to an Ethernet frame being aborted by the stack or being lost in transit and the IP layer was not able to reassemble a complete datagram.

Debugging a Layer 3 Issue

The best way to debug issues in the IP layer is to find them in the link layer, because generally speaking layer-3 issues are caused by layer-2 bugs, but this is not always the case.

For instance, packets received with invalid address are probably indicative of stale ARP table entries or some other external network bug or temporal state that will most likely clear up on its own. This sort of problem is probably not worth debugging unless its persistent. Packets dropped because of missing route is more indicative of an issue at the application layer (the client or server simply wasn’t listening when the packets arrived).

If a problem is detectable by L3 and not by L2, then it’s most likely a problem in the NIC itself, and if the NIC isn’t providing a FIFO or DMA stat that explains it. One possibility is packet reordering by the NIC. This can be detected by modifying

/proc/sys/net/ipv4/ipfrag_max_dist

This kernel attribute determines the systems tolerance to receiving out-of-order IPv4 frames. Nominally L2 networks do not reorder packets, so you should be able to configure a value of 1 and not observe a change in behavior. However, if setting a low threshold exacerbates the issue, or setting a high value makes the problem less severe then the NIC is most likely to blame.

Useful network debugging tools

iPerf

iPerf is a useful tool when debugging the performance of a network. It can be used to quickly validate whether or not a system can handle a given throughput. It can be configured to output a stream of data in a variety of formats to mimic the expected load on the system during use. For more information refer to iPerf documentation.

How to use iPerf to debug sensor network issues

iPerf can be used to rule out sensor failures, and quickly reproduce errors that occur when the network is under a high-traffic load. iPerf must be used from two machines:

Server (receiving data)

Client (sending data)

Both the server and client will measure the number of packets sent/received, and report a percentage of packets lost.

Example usage of iPerf to test sender can send 300Mbps of UDP packets of 20KB to receiver:

Receiver arguments

--server : Required to indicate that this is the machine that will be RECEIVING data.

--port 5300 : Specify the port at which to listen for incoming data. Useful if testing with multiple sources simultaneously.

Sender arguments

--client 192.168.88.248 : The IP address to send data to. Must be the IP address or hostname of the receiver.

--port 5300 : The port to send data to. This must match the –port argument provided by the receiver.

--udp : Indicates that UDP traffic will be sent. If not supplied, TCP data will be sent.

--bitrate 300M : The rate in (in bits per second) to send data to the receiver. This can be used to simulate different amounts of network load.This supports a suffix such as K , M , or G to indicate Kbps, Mbps, or Gbps instead of bps.

--length 20K

Appendix - PTP Guide

PTP Profiles Guide

Overview

PTP Profiles

PTP HTTP API

Enabling the PTP profiles

Example using cURL

Example using HTTPie

Sync Verification

PTP Quickstart Guide

Overview

Assumptions

Physical Network Setup

Third Party Grandmaster Clock

Linux PTP Grandmaster Clock

Example Network Setup

Installing Necessary Packages

Ethernet Hardware Timestamp Verification

Configurations

Configuring ptp4l for Multiple Ports

Configuring ptp4l as a Local Master Clock

Configuring phc2sys to Synchronize the System Time to the PTP Clock

Configuring Chrony to Set System Clock Using PTP

Verifying Operation

LinuxPTP PMC Tool

Tested Grandmaster Clocks

Analyzing Linux Networking Issues

Link Layer Statistics and Configuration

IP Statistics

Useful network debugging tools

Configuring `ptp4l` for Multiple Ports

Configuring `ptp4l` as a Local Master Clock

Configuring `phc2sys` to Synchronize the System Time to the PTP Clock