I had read about the ability of some large language models to provide users with coding tips and so, having discovered a particular problem as I sought to analyse signal intermittency, decided to give it a go.
The problem I had was remembering how best to write a query that compared a row in my table with those that had gone before. I had a vague memory that the LAG function in PostgresSQL was the way to go about it, but the function’s correct use eluded me. Out of little more than curiosity at first, I turned to Microsoft’s new AI, ‘Copilot’ which is powered by ChatGPT4.
The first question I asked was: ‘In PostgresSQL, explain to me what the LAG function is and how I would use it in a query’ which returned a solid page of detail that I won’t reproduce here. Ultimately, I wasn’t hugely impressed. It seemed to me that I could have used a traditional search engine to find the same results, if a little slower as I would have to scroll through the returns until I found one pitched at the level I was after. That said, MS Copilot had neatly referenced its sources in what were, admittedly, well written instructions on the use of the function. I pushed on.
The table I was investigating has a unique logger reference and two dates on each row. One date column showed the last date on which the logger ‘dialled in’ and the second date column was a list of consecutive dates on which it could have dialled in (I.E all the days in the analysis period). The loggers ‘dial in’ once a day to provide the data they collected in the previous 24 hours and so, by identifying mismatches between the two date columns, I could find the days that loggers had failed to dial in.
As a starting point for investigating signal intermittency, I wanted to know how regularly the two dates did not match each other, and how frequently a logger failed to dial in on consecutive days. So, I wanted a total count of mismatched dates per logger and a count of how many times the dates switched between matching and not matching, so that a run of consecutive mismatches was counted only once.
So, I asked:
‘In PostgresSQL, I have a table with a UID column and two date columns, called X and Y. When the two dates do not match we will call this a ‘mismatch’. I want to find how many mismatches are against each UID. I also want a column where mismatches on consecutive dates, as defined by column Y, are counted as one mismatch. This column should be called ‘outages’. lastly, I want to rank the UIDs by those with the fewest mismatches. What query should I use?’
To my amazement I received the following response:
WITH mismatches AS (
SELECT uid, x, y, x <> y AS mismatch
outages AS (
SELECT uid, x, y, mismatch,
CASE WHEN mismatch AND lag(mismatch) OVER (PARTITION BY uid ORDER BY y) THEN FALSE ELSE mismatch END AS outage
SELECT uid, count(mismatch) AS mismatches, count(outage) AS outages
GROUP BY uid
ORDER BY mismatches ASC;
While I didn’t use this exact query for a couple of boring technical reasons, this was an incredibly useful guide to the correct use of the Lag function, as well as helping to structure a query with multiple requirements.