Comparing current data to last time period’s data in Postgres

I have a Postgres table where each row shows the number of traffic tickets written by each city per month.

I needed to show the percentage change between the current and prior month’s ticketwriting. For example, if a city wrote 1000 tickets this month and 500 last month, then I wanted a field in the row for this month to show a 200% increase over last month.

Here’s how I did it.

I first had to create two functions:

-- number of traffic tickets for
-- the given month and city
CREATE FUNCTION traffictickets(
                   cityName CHARACTER VARYING,
                   MONTH TIMESTAMP WITHOUT TIME zone
                ) RETURNS INTEGER
     AS 'select "TrafficTickets"
         FROM raw."OCA tickets per city"
         WHERE "Date" = $2 AND "City" = $1;'
     LANGUAGE SQL
     IMMUTABLE
     RETURNS NULL ON NULL INPUT;

Then I made a short SQL script that referenced this function:

UPDATE raw."OCA tickets per city"
     SET "Change" = "TrafficTickets"::REAL /
        (CASE WHEN traffictickets("City", "Date" - INTERVAL '1 month') = 0
         THEN -1
         ELSE traffictickets("City", "Date" - INTERVAL '1 month')
         END)::REAL;

Viola, now each of my rows shows the percentage change between the current and prior month’s ticket counts! Except there’s a huge problem–the changes aren’t consistently meaningful.

If ticketwriting dropped month over month, the change field will be between 0 and 1: going from 100 tickets to 20 tickets the next month, the change is 0.2. But here’s the problem: if ticketwriting increased, the number will be between 1 and infinity. If ticketwritng increased from 20 to 100, the change is 5. Or if it changed from 2 to 150, the change is 75!

Yikes, I need some way of bringing these positive changes down to Earth.

I think I have a solution: show the change relative to the city’s average ticketwriting over all its years in my database.

I added a new function:

-- average number of traffic tickets written by
-- that city over all months in my table
CREATE FUNCTION averageticketsforcity(
                   cityName CHARACTER VARYING
                ) RETURNS REAL
     AS 'select avg("TrafficTickets"::real)::real
         FROM raw."OCA tickets per city"
         WHERE "City" = $1;'
     LANGUAGE SQL
     IMMUTABLE
     RETURNS NULL ON NULL INPUT;

Here’s the new version of the query:

UPDATE raw."OCA tickets per city"
     SET "Change" = ("TrafficTickets"::REAL -
      traffictickets("City", "Date" - INTERVAL '1 month')::REAL) /
      CASE WHEN averageticketsforcity("City") = 0
        THEN 1
        ELSE averageticketsforcity("City")
      END;

That seems to have fixed it. Now all my values appear to be equally meaningful to each other.

Leave a Reply

Your email address will not be published. Required fields are marked *