Here’s part two of my two-part story about the bug that waited for five years to cause the test suite to fail for the first time – and would wait for another eleven to do it again. Last time I already fixed a part of the problem; this time we will see what 2020 has brought to the party and try to fix those date problems once and for all.

A little reminder

To get the full context, please check part 1 of the story. To recap, we’re checking the code excerpt available here from an issue tracking application API responsible, i.a., for providing data to generate a bar graph with information how many issues were created throughout the year.

A sample graph from the application

Bring it on

Last time I wrote that – despite time zone problems – the Stats service’s test will not fail until 2030. Little did I know about 2020’s plans to entertain me – and all of us. For the sake of this blog post, let’s say currently there are five issues in the database: two created on December 30th and 31st, 2020, and three created on Janary 1st, 4th and 20th, 2021.

January month view from 2021 calendar

Given the above, the application’s API response should look like this:

[["2020-53", 3], ["2021-1", 1], ["2021-2", 0], ["2021-3", 1]]

Wait, week 53?

Personally, my knowledge about week numbers ended at the very general statement that a year contains 52 weeks. If you ask Google, most of the search results will use the term approximately 52 weeks and some will be more precise, saying that there are exactly 52.177457 of them. So how come there’s week 53?

PostgreSQL’s and Ruby’s week date definitions are based on the ISO 8601 standard, which says:

The first calendar week of the year is the one that includes the first Thursday of that year.

In the Gregorian calendar, this is equivalent to the week which includes January 4th. This also means that every Gregorian year that contains 53 Thursdays will have 53 weeks – and there are exactly 71 years like that in every 400-year-long Gregorian leap cycle.

Looking at the calendar above, you can see that this is how 2020 got its extra week (as if it wasn’t long enough already). If you feel bad you didn’t know, note that our frontend developer working on the graph also didn’t – that’s why week 3 is missing at the beginning of the example graph.

What year is it anyway?

The existence of week 53 is a valuable piece of information, but there is one more important conclusion there – sounds obvious now, but the week-based year for a given date can differ from the Gregorian year – e.g., January 1st, 2021 is still a part of the 53rd week of 2020. There are days like this almost every year: in the regular 52-week-years it’s the last days of December that are a part of week 1 of the following year.

Going back to the code: why wouldn’t the test fail for other New Year Eves then – and would fail for 2020? Well, that’s because the test shares the same error with the code. Let’s check again how the year information is retrieved for the issues. In the code we had previously analysed:

<<-SQL
  date_part('week', timezone('#{zone}', created_at::timestamptz)) as week,
  date_part('year', timezone('#{zone}', created_at::timestamptz)) as year,
  count(*) as total
SQL

We mostly use the date_part PostgreSQL function here. It’s modelled on the extract function, which retrieves subfields (such as year or hour) from date/time values. Sounds perfect for the job, doesn’t it? Running the full query in a PostgreSQL console would give us the result:

 week | year | total 
------+------+-------
   53 | 2020 |     2
    1 | 2021 |     1
    3 | 2021 |     1
   53 | 2021 |     1

As expected, the first days of January are evaluated as a part of week 53 – but the year is incorrect. Right – because selecting year will give us the Gregorian year. Luckily, there is an easy solution: one of the accepted subfields that the date_part function can select is isoyear – the ISO 8601 week-numbering year that the date falls in. Perfect, let’s use it and check the console:

 week | year | total 
------+------+-------
   53 | 2020 |     3
    1 | 2021 |     1
    3 | 2021 |     1

What about the test? It uses Rails to fetch the year.

[created_at.strftime("%Y"), created_at.strftime("%V").to_i]

As briefly mentioned in the previous blog post, %V will give us the week number and %Y will give us the year. Oh yes, the Gregorian year again. Let’s see if the docs forstrftime have something for us as well:

ISO 8601 week-based year and week number:
The week 1 of YYYY starts with a Monday and includes YYYY-01-04.
The days in the year before the first week are in the last week of
the previous year.
  %G - The week-based year
  %g - The last 2 digits of the week-based year (00..99)
  %V - Week number of the week-based year (01..53)

Exactly what we want and need – %G is the answer. If you look closely at the code, it’s also used in another part, so missing it here was probably an oversight, which caused this bug to be hidden.

One last thing

There should be no more bugs in the code, but one thing still bugged me a lot – it’s the extract_date method, which is used to get a Date object from a previously formatted string. The data argument here is a Hash with the year-week dates as keys and issue counts (in those weeks) as values.

def extract_date(data, type)
  week_parts = data.keys.first.split("-") if type == 'first'
  week_parts = data.keys.last.split("-") if type == 'last'
  year, week = week_parts[0].to_i, week_parts[1].to_i
  n = 0
  begin
    Date.commercial(year, week, 1)
  rescue ArgumentError
    raise if (n+=1) > 3
    year -= 1 if week > 50
    retry
  end
end

First thing worth noticing there is the Date#commercial Ruby method. Very useful – it takes the year, the week number and the day number to create a Date object. In my case – Date.commercial(year, week, 1) – it returns the first day of a given week.

Also perfect, but what’s the purpose of that disturbing rescue there? Do we need it every time? If I told you it was introduced as a fix in 2015, could you guess how many weeks that year had…? Date#commercial throws an error if the arguments are incorrect, e.g., for the 53rd week of 2021 – as it doesn’t exist. Rolling the year back in time will help, but surely it is a code smell. Luckily, with my prior fixes, the rescue can be safely removed, as there should no longer be any incorrect dates.

Conclusion

Working with dates is hard. Hopefully this blog post gave you a little insight into the week-based year and it will be somehow useful in the future.

A little summary

  • If you compare dates from different sources, make sure both use the same time zone
  • PostgreSQL AT A TIMEZONE is non-obvious, use it twice to get the timestamp without time zone in your specified zone
  • A year can have 52 or 53 weeks
  • Week-based year for a given date can differ from the Gregorian year
  • PostgreSQL gives you the date_part function, which can extract both the week and the isoyear from a timestamp
  • Ruby is a great help too – you can use Date#commercial to create a date from a week number and a year, and Date#strftime with %G and %V to get the week number and the ISO year from a date
  • Never do fixes on Sunday evenings