AI in 2023: ‘Data from Star Trek’ or ‘Holly from Red Dwarf’?

What is AI?

For my latest blog, I’ve jumped on the technology topic of our times, Artificial Intelligence (AI).

You can barely swing a talking toaster for the number of sci-fi books and films that have taken the concept of AI as their key theme and back in the real world (loosely speaking), Elon Musk has claimed AI is more dangerous than nuclear weapons. There have been calls to put a global halt to the technology’s development until we AS A SPECIES can get to grips with quite how far into the future we are about to leap. Oh, and, an AI has already tricked a human into doing its bidding… Seriously.

So, what is it? AI is an umbrella term for a range of technologies that can address problems that historically required human intelligence, including reasoning, problem solving and recognising speech. That said, we are some way off dated cultural references to sentient androids and senile holograms and to title a blog about AI in 2023 by anthropomorphising it is extremely misleading. While large language models can respond in a convincingly human way, AI is still (for the moment) a tool and nothing more.

Now, you might ask, am I, Richard Francis, qualified to discuss AI’s usefulness relative to the threat it potentially poses to the human race?

Absolutely not.

However, I recently made limited use of a large language model in a work context for the first time and was pleasantly surprised to find a new, free tool at my disposal. So, we’ll start there.

The problem

To indulge in a little backstory, I recently spent a fortnight working from home during a period of convalescence. Working on a single laptop screen with a trackpad is no fun once you get used to three ‘proper’ screens and a mouse over years working in the office. I am sure I’m not alone in finding it exceptionally frustrating having to hunt for the correct program sat hidden under layers of other documents and Excel sheets and my week’s calendar was looking busy.

Apocalyptic warnings for the future and ‘great leaps forward’ are all well and good but “what can it do for me now?” I wondered, morosely, while sitting on the couch with a bruised and misshapen knee raised on pillows in front of me (my own, obviously).

We had recently begun a new project looking at the health of a rather large fleet of loggers for a client who was seeking to maximise their utility in the face of an approaching end-of-warranty date. With a huge amount of data to make sense of it was very much time to get stuck into some statistical analysis with the help of an SQL database and here I was trapped in the house with a piddly little laptop balanced precariously on my stomach.

I had another problem. My coding was rusty. I admit it. For the last year, at least, I have been focussed on producing materials for the Paradigm Academy and supporting the rollout of Paradigm analysis to various clients in one capacity or another. Neither of these involved the use of SQL and so my knowledge of syntax was hazy and muscle memory for useful queries had near deserted me as I planned my analysis.

Enter AI

I had read about the ability of some large language models to provide users with coding tips and so, having discovered a particular problem as I sought to analyse signal intermittency, decided to give it a go.

The problem I had was remembering how best to write a query that compared a row in my table with those that had gone before. I had a vague memory that the LAG function in PostgresSQL was the way to go about it, but the function’s correct use eluded me. Out of little more than curiosity at first, I turned to Microsoft’s new AI, ‘Copilot’ which is powered by ChatGPT4.

The first question I asked was: ‘In PostgresSQL, explain to me what the LAG function is and how I would use it in a query’ which returned a solid page of detail that I won’t reproduce here. Ultimately, I wasn’t hugely impressed. It seemed to me that I could have used a traditional search engine to find the same results, if a little slower as I would have to scroll through the returns until I found one pitched at the level I was after. That said, MS Copilot had neatly referenced its sources in what were, admittedly, well written instructions on the use of the function. I pushed on.

The table I was investigating has a unique logger reference and two dates on each row. One date column showed the last date on which the logger ‘dialled in’ and the second date column was a list of consecutive dates on which it could have dialled in (I.E all the days in the analysis period). The loggers ‘dial in’ once a day to provide the data they collected in the previous 24 hours and so, by identifying mismatches between the two date columns, I could find the days that loggers had failed to dial in.

As a starting point for investigating signal intermittency, I wanted to know how regularly the two dates did not match each other, and how frequently a logger failed to dial in on consecutive days. So, I wanted a total count of mismatched dates per logger and a count of how many times the dates switched between matching and not matching, so that a run of consecutive mismatches was counted only once.

So, I asked:

‘In PostgresSQL, I have a table with a UID column and two date columns, called X and Y. When the two dates do not match we will call this a ‘mismatch’. I want to find how many mismatches are against each UID. I also want a column where mismatches on consecutive dates, as defined by column Y, are counted as one mismatch. This column should be called ‘outages’. lastly, I want to rank the UIDs by those with the fewest mismatches. What query should I use?’

To my amazement I received the following response:

WITH mismatches AS (
 SELECT uid, x, y, x <> y AS mismatch
 FROM your_table
 ),
 outages AS (
 SELECT uid, x, y, mismatch,
 CASE WHEN mismatch AND lag(mismatch) OVER (PARTITION BY uid ORDER BY y) THEN FALSE ELSE mismatch END AS outage
 FROM mismatches
 )
 SELECT uid, count(mismatch) AS mismatches, count(outage) AS outages
 FROM outages
 WHERE mismatch
 GROUP BY uid
 ORDER BY mismatches ASC;

While I didn’t use this exact query for a couple of boring technical reasons, this was an incredibly useful guide to the correct use of the Lag function, as well as helping to structure a query with multiple requirements.

Conclusion

So, what’s the takeaway? Many will have already seen and played with large language models and AI powered apps given the wave of publicity the technology has generated in the last few years. For me, MS Copilot inarguably sped up my analysis time. It is more than just a souped-up search engine. Using a large language model to suggest how to query a database can be incredibly useful, if only to see alternate ways of going about something you know how to do already. If you can correctly phrase your question in plain text, Copilot is likely to be able to give you a useable response and it isn’t even one of the AI systems that is purposely designed to help people write code!

Large language models have incredible potential as a learning resource but it should be remembered that there is still no other guiding intelligence than your own. If you misphrase a question, or do not pay close enough attention to the recommendations being produced, you may end up with unintended consequences when you come to use your code. One way to mitigate against this risk is to ask your chosen large language model to provide pseudocode alongside its recommended query, which will helpfully explain what each step of a query is doing.

This blog is intended to highlight the usefulness of currently available, free AI powered tools in speeding up the process of data analysis in 2023. These are the early days of AI and I very much look forward to rereading this blog in ten to 15 years’ time in much the same way I wish I had written a blog about smartphones back in 2007 when the first iPhone was released. From here the pace of AI’s advancement in terms of being incorporated into our daily lives is likely to be very rapid.

Who knows where we will be by 2033.

Related articles