Mitigation Rules and the <code>prefer</code> Keyword

Mitigation Rules and the `prefer` Keyword

Introduction

The mechanics of the NTP algorithms which select the best data sample from each available peer and the best subset of the peer population have been finely crafted to resist network jitter, faults in the network or peer operations, and to deliver the best possible accuracy. Most of the time these algorithms do a good job without requiring explicit manual tailoring of the configuration file. However, there are times when the accuracy can be improved by some careful tailoring. The following sections explain how to do that using explicit configuration items and special signals, when available, that are generated by some radio clocks. In order to understand the effects of the various schemes involved, it is necessary to understand some arcane details on how the algorithms decide on a synchronization source, when more than one source is available. This is done on the basis of a set of explicit mitigation rules.

Mitigation Rules

In order to provide robust backup sources, stratum-1 peers are usually operated in a diversity configuration, in which the local server operates with a number of remote peers in addition with one or more radio clocks operating also as local peers. In these configurations the suite of algorithms used in NTP to refine the data from each peer separately and to select and weight the data from a number of peers can be used with the entire ensemble of remote peers and local radios. However, Because of small but significant systematic time offsets between the peers, it is in general not possible to achieve the lowest jitter and highest stability in these configurations. In addition, there are a number of special configurations involving special radio clock signals, telephone backup services and other special cases, so that a set of mitigation rules becomes necessary.

The mitigation rules are based on a set of special characteristics of the various reference clock drivers configured on the server. For instance, it is possible to designate a peer as "preferred," in which case, all other things being equal, this peer will be selected for synchronization over all other eligible candidates in the clock selection procedures. The precise characterization of the prefer peer is described below. In addition, when a pulse-per-second (PPS) signal is connected via the PPS Clock Discipline Driver (type 22), the corresponding peer is called the PPS peer. The manner in which this peer operates is described below. When the Undisciplined Local Clock Driver (type 1) is configured in the server, this becomes the local-clock peer. When a modem driver such as the Automated Computer Time Service Driver (type 18) is configured in the server, this becomes the ACTS peer. Both the local-clock and ACTS peers operate in the manner described in the reference clock drivers documentation. Finally, where support is available, the PPS signal may be processed directly by the kernel, as described in the precision kernel modifications documentation. In the following this will be called the kernel discipline.

The mitigation rules apply in the clock selection procedures following the sanity checks, intersection algorithm and clustering algorithm. The survivors at this point represent the subset of all peers which can provide the most accurate, stable time. In the general case, with no designated prefer peer, PPS peer or local-clock peer, the mitigation rules require all survivors be averaged according to a weight depending on the reciprocal of the dispersion, as provided in the NTP specification.

The mitigation rules establish the choice of system peer, which determine the stratum, reference identifier and several other system variables which are visible to clients of the local server. In addition, they establish which source or combination of sources control the local clock. In detail, these rules operate as follows:

If there is a prefer peer and it is the local-clock peer or the ACTS peer; or, if there is a prefer peer and the kernel discipline is active, choose the prefer peer as the system peer. If the prefer peer is not the ACTS peer, disregard the time as determined by NTP. In this case a source other than NTP is controlling the system clock. If the prefer peer is the ACTS peer, then it does control the system clock, but special rules are in effect due to the relatively long intervals between updates.
If the above is not the case and there is a PPS peer, then choose it as the system peer and its offset as the system clock offset.
If the above is not the case and there is a prefer peer (not the local-clock or ACTS peer in this case), then choose it as the system peer and its offset as the system clock offset.
If the above is not the case and the peer previously chosen as the system peer is in the surviving population, then choose it as the system peer and average its offset along with the other survivors to determine the system clock offset. This behavior is designed to avoid excess jitter due to "clockhopping," when switching the system peer would not materially improve the time accuracy.
If the above is not the case, then choose the first candidate in the list of survivors ranked in order of synchronization distance and average its offset along with the other survivors to determine the system clock offset. This is the default case and the only case considered in the current NTP specification.

The specific interpretation of the prefer peer and PPS peer require some explanation, which is given in following sections.

Using the `prefer` Keyword

For the reasons stated previously, a scheme has been implemented in NTP to provide an intelligent mitigation between various classes of peers, one designed to provide the best quality time without compromising the normal operation of the NTP algorithms. This scheme in its present form is not an integral component of the NTP specification. but is likely to be included in future versions of the specification. The scheme is based on the "preferred peer," which is specified by including the prefer keyword with the associated server or peer command in the configuration file. This keyword can be used with any peer or server, but is most commonly used with a radio clock.

The prefer scheme works on the set of peers that have survived the sanity and intersection algorithms of the clock select procedures. Ordinarily, the members of this set can be considered truechimers and any one of them could in principle provide correct time; however, due to various error contributions, not all can provide the most stable time. The job of the clustering algorithm, which is invoked at this point, is to select the best subset of the survivors providing the least variance in the combined ensemble compared to the variance in each member of the subset. The detailed operation of the clustering algorithm, which are given in the specification, are not important here, other than to point out it operates in rounds, where a survivor, presumably the worst of the lot, is discarded in each round until one of several termination conditions is met.

In the prefer scheme the clustering algorithm is modified so that the prefer peer is never discarded; on the contrary, its potential removal becomes a termination condition. If the original algorithm were about to toss out the prefer peer, the algorithm terminates right there. The prefer peer can still be discarded by the sanity and intersection algorithms, of course, but it will always survive the clustering algorithm.

Along with this behavior, the clock select procedures are modified so that the combining algorithm is not used when a prefer peer is present. Instead, the offset of the prefer peer is used exclusively as the synchronization source. In the usual case involving a radio clock and a flock of remote stratum-1 peers, and with the radio clock designated a prefer peer, the result is that the high quality radio time disciplines the server clock as long as the radio itself remains operational and with valid time, as determined from the remote peers, sanity algorithm and intersection algorithm.

A preferred peer retains that designation as long as it survives the intersection algorithm. If for some reason the prefer peer fails to provide updates for some time, presently five minutes, it loses that designation and the clock selection remitigates as described above.

While the model does not forbid it, it does not seem useful to designate more than one peer as preferred, since the additional complexities to mitigate among them do not seem justified from on the air experience. Note that the prefer peer interacts with the PPS peer, as discussed below. It also interacts with the Undisciplined Local Clock Driver (type 1), as described in reference clock drivers documentation.

Using the Pulse-per-Second (PPS) Signal

Most radio clocks are connected using a serial port operating at speeds of 9600 bps or lower. The accuracy using typical timecode formats, where the on-time epoch is indicated by a designated ASCII character, like carriage-return , is limited to a millisecond at best and a few milliseconds in typical cases. However, some radios produce a precision pulse-per-second (PPS) signal which can be used to improve the accuracy in typical workstation servers to the order of a few tens of microseconds. The details of how this can be accomplished are discussed in the README.magic file; the following discusses how this signal is implemented and configured in a typical working server.

First, it should be pointed out that the PPS signal is inherently ambiguous, in that it provides a precise seconds epoch, but does not provide a way to number the seconds. In principle and most commonly, another source of synchronization, either the timecode from an associated radio clock, or even a set of remote peers, is available to perform that function. In all cases a specific, configured peer or server must be designated as associated with the PPS signal. This is done by including the prefer keyword with the associated server or peer command in the configuration file. This PPS signal can be associated in this way any peer or server, but is most commonly used with the radio clock generating the PPS signal.

The PPS signal is processed by a special PPS Clock Discipline Driver (type 22) described in precision kernel modifications documentation. That description specifies the hardware configurations in which this signal can be connected to the server. This driver replaces the former scheme based on conditional compilation and the PPS, CLK and PPSCLK compile-time switches. Regardless of method, the driver, like all other drivers, is mitigated in the manner described for the prefer peer above. However, in the case of the PPS peer, the behavior is slightly more complex.

First, in order for the PPS peer to be considered at all, its associated prefer peer must have survived the sanity and intersection algorithms and have become the prefer peer. This insures that the radio clock hardware is operating correctly and that, presumably, the PPS signal is operating correctly as well. Second, the absolute time offset from that peer must be less than CLOCK_MAX, the gradual-adjustment range, which is ordinarily set at 128 ms, or well within the +-0.5-s unambiguous range of the PPS signal itself. Finally, the time offsets generated by the PPS peer are propagated via the clock filter to the clock selection procedures just like any other peer. Should these pass the sanity and intersection algorithms, they will show up along with the offsets of the prefer peer itself. Note that, unlike the prefer peer, the PPS peer samples are not protected from discard by the clustering algorithm. These complicated procedures insure that the PPS offsets developed in this way are the most accurate, reliable available for synchronization.

A PPS peer retains that designation as long as it survives the intersection algorithm; however, like any other clock driver, it runs a reachability algortihm on the PPS signal itself. If for some reason the signal fails or displays gross errors, the PPS peer will either become unreachable or stray out of the survivor population. In this case the clock selection remitigates as described above.

Finally, the mitigation procedures described above for the prefer peer are modified so that, if the PPS peer survives the clustering algorithm, its offset is mitigated over the prefer peer offset; in other words in case of ties, the PPS offset wins. See the main text for the mitigation rules applying to the general case.

Using the Kernel Discipline

Code to implement the kernel discipline is a special feature that can be incorporated in the kernel of some workstations as described in the README.kernel file. The discipline provides for the control of the local clock oscillator time and/or frequency by means of an external PPS signal interfaced via a modem control lead. As the PPS signal is derived from external equipment, cables, etc., which sometimes fail, a good deal of error checking is done in the kernel to detect signal failure and excessive noise.

In order to operate, the kernel discipline must be enabled and the signal must be present and within nominal jitter and wander error tolerances. In the NTP daemon the kernel is enabled only when the prefer peer is among the survivors of the clustering algorithm, as described above. Then, the PPS peer is designated the prefer peer as long as the PPS signal is present and operating within tolerances. Under these conditions the kernel disregards updates produced by the NTP daemon and uses its internal PPS source instead. The kernel maintains a watchdog timer for the PPS signal; if the signal has not been heard or is out of tolerance for more than some interval, currently two minutes, the kernel discipline is declared inoperable and operation continues as if it were not present.

David L. Mills (mills@udel.edu)