Enter AI
I had read about the ability of some large language models to provide users with coding tips and so, having discovered a particular problem as I sought to analyse signal intermittency, decided to give it a go.
The problem I had was remembering how best to write a query that compared a row in my table with those that had gone before. I had a vague memory that the LAG function in PostgresSQL was the way to go about it, but the function’s correct use eluded me. Out of little more than curiosity at first, I turned to Microsoft’s new AI, ‘Copilot’ which is powered by ChatGPT4.
The first question I asked was: ‘In PostgresSQL, explain to me what the LAG function is and how I would use it in a query’ which returned a solid page of detail that I won’t reproduce here. Ultimately, I wasn’t hugely impressed. It seemed to me that I could have used a traditional search engine to find the same results, if a little slower as I would have to scroll through the returns until I found one pitched at the level I was after. That said, MS Copilot had neatly referenced its sources in what were, admittedly, well written instructions on the use of the function. I pushed on.
The table I was investigating has a unique logger reference and two dates on each row. One date column showed the last date on which the logger ‘dialled in’ and the second date column was a list of consecutive dates on which it could have dialled in (I.E all the days in the analysis period). The loggers ‘dial in’ once a day to provide the data they collected in the previous 24 hours and so, by identifying mismatches between the two date columns, I could find the days that loggers had failed to dial in.
As a starting point for investigating signal intermittency, I wanted to know how regularly the two dates did not match each other, and how frequently a logger failed to dial in on consecutive days. So, I wanted a total count of mismatched dates per logger and a count of how many times the dates switched between matching and not matching, so that a run of consecutive mismatches was counted only once.
So, I asked:
‘In PostgresSQL, I have a table with a UID column and two date columns, called X and Y. When the two dates do not match we will call this a ‘mismatch’. I want to find how many mismatches are against each UID. I also want a column where mismatches on consecutive dates, as defined by column Y, are counted as one mismatch. This column should be called ‘outages’. lastly, I want to rank the UIDs by those with the fewest mismatches. What query should I use?’
To my amazement I received the following response:
WITH mismatches AS (
SELECT uid, x, y, x <> y AS mismatch
FROM your_table
),
outages AS (
SELECT uid, x, y, mismatch,
CASE WHEN mismatch AND lag(mismatch) OVER (PARTITION BY uid ORDER BY y) THEN FALSE ELSE mismatch END AS outage
FROM mismatches
)
SELECT uid, count(mismatch) AS mismatches, count(outage) AS outages
FROM outages
WHERE mismatch
GROUP BY uid
ORDER BY mismatches ASC;
While I didn’t use this exact query for a couple of boring technical reasons, this was an incredibly useful guide to the correct use of the Lag function, as well as helping to structure a query with multiple requirements.