Tuesday, 3 January 2017

An Extra ‘Leap Second’ In 2016 Caused a Mini-Y2K

An extra second was bolted onto the waning moments of 2016; a leap second designed to make up for the slowing rate of Earth’s spin against our cesium-measured atomic clocks. Leap seconds are a fairly regular addition to international timekeeping, happening on average every 18 months, and have historically caused problems for web companies using code that can’t handle the time anomalies, much like the whole Y2K debacle.

This time round, content delivery network provider CloudFlare saw some of its servers go down at midnight UTC on New Year’s Day thanks to a fault in some code that saw a number go “negative when it should always have been, at worst, zero”. Most of the affected servers were patched within 90 minutes, but worldwide problems persisted for almost seven hours, and CloudFlare did not reveal how many of its servers were tricked.

Blogging out the mishap, CloudFlare admitted that some customers were hit when the company’s DNS proxy software, RRDNS, started panicking. You can get into the nitty gritty of the non-monotonic properties of Go here, but essentially, the root cause of the bug that affected CloudFlare’s DNS service was “the belief that time cannot go backwards”. A small number of machines across CloudFlare’s 102 data centers were affected, which in turn knocked out a some customers who use CloudFlare’s CNAME DNS records to validate DNS queries for their domains. Motherboard has contacted CloudFlare by email to clarify which customers were affected, but we did not receive a response in time for this article’s publication.

“Internally, Cloudflare operates DNS resolvers to lookup DNS records from the Internet and RRDNS talks to these resolvers to get IP addresses when doing CNAME lookups. RRDNS keeps track of how well the internal resolvers are performing and does a weighted selection of possible resolvers (we operate multiple per data center for redundancy) and chooses the most performant. Some of these resolutions ended up recording in a data structure a negative value during the leap second,” explained CloudFlare in a blog post. “The weighted selection code, at a later point, was being fed the negative number which caused it to panic. The negative number got there through a combination of the leap second and smoothing.”

Read more: Why the Whole World Is Fighting Over the Leap Second

Without the leap second, which is orchestrated by the International Earth Rotation and Reference Systems Service and injected into Universal Coordinated Time (UTC), the time we keep on our clocks would start drifting out of sync with the movements of the Earth. Over hundreds of years, if the leap second wasn’t added, our noon would end up being in the middle of the night. While to humans the immediate difference is negligible, systems such as GPS and Russian-made GLONASS satellites rely on pinpoint accuracy for navigational purposes. Since 1972, a leap second has now been added 27 times in total.

But not all nations agree on the extra slice of time. In 2015, the International Telecommunication Union (ITU) announced its decision on whether or not the leap second should be abolished would be delayed until 2023, giving the body more time to study a potential leap second elimination. Demetrios Matsakis, Chief Scientist for Time Services at the US Naval Observatory, told Motherboard in 2015 that the leap second should be abolished. "We should eliminate leap seconds because of the real-world practical impossibility of reliably implementing them, due to either their inherent nature or to general lack of knowledge of their very existence," he said. Japan, France, Italy also join the US in calling for the end to the leap second, while the UK and Russia remain supporters.

Get six of our favorite Motherboard stories every day by signing up for our newsletter.

from An Extra ‘Leap Second’ In 2016 Caused a Mini-Y2K

No comments:

Post a Comment