On Linguistic Analysis and Deception Detection by AI


The world of myths about AI is similar to Scheherazade’s stories; if written down, the book of those would likely also be enough for 1001 nights, offering almost 3 years of captivating, yet fictitious, tales. 

The myth that infuriates me the most is the one about linguistic analysis and deception detection done by… a generic LLM. This is something I personally combat whenever I can, mostly in private chats, although I also gave a presentation about it at the Beyond Europe conference in 2024 (it was the ‘Artificial Intelligence for Behavioural Analysis’ panel then). 

It’s been almost 1.5 years since then, so it might be time for a repetition – this time here. 

Natural Language Processing vs. Holistic View

LLMs do not ingest language like humans do. Current sub-word tokenisation algorithms (like BPE) are compression schemes driven by frequency. They focus more on language morphology than pragmatic meaning. They break what we say into words and further into sub-words when necessary (e.g., ‘un-‘, ‘-happi-‘, ‘-ness’), take each such sub-word and turn it into numerical values and analyse how close the words are to each other to derive the context and meaning. For example, when an LLM “sees” the word ‘friend’, it processes a numerical representation of that token’s statistical relationship to other tokens and then creates a vector for the phrase ‘My friend John’. 

So if you used to think that when you tell AI some of your ideas it can ‘see’ the concepts you share – sorry to break it to you like this – in reality, what AI ‘sees’ are complex vectors and numbers.   

When we humans process and produce language, it’s often in whole clusters of words. We think in utterances – words too, but also phrases and sentences. And when we listen to someone, we do not just process the semantic meaning, i.e., the actual word meaning. We, again, process whole word clusters and whole narratives.

This is crucial, because deception often lives in the pragmatic and holistic view of a statement, i.e, in a phrase that doesn’t quite fit the context. Yes, transformers are quite good at picking up inconsistencies within a passage when prompted correctly, but the training objective does not require consistent truth-conditional reasoning or contradiction detection as a primary goal in input analysis. And prompting correctly in this case relies on the user’s skills, not the model’s skills. Also, when the whole input is turned into vectors without a deliberate coherence check, the mismatch in stories can get blurred.

But what if we shifted toward higher-level semantic representations, i.e., phrase-level entities, or added it as a layer? I know it sounds like a step back from transformers to chunking architecture, but could it allow a model to better capture the incoherence in the whole statement which usually covers multiple topics? 

Recursive Generation vs. Real Analysis

The token- and integer-based logic is not over at the input; it works similarly for the output. LLMs generate text token-by-token in an autoregressive way. Bit by bit, based on the whole previous string of output, they “bead on” the next most likely thing to say (or the next most likely sub-word). It’s probabilistic, based on the input and based on the weights acquired during the learning process. And purely probabilistic analysis, not a specific process-based one, is by definition flawed. Also, regressive generation optimises for local coherence, not global truth consistency, so the output does not need to be accurate at all to meet the criteria of a good response. 

Linguistic analysis, on the other hand, requires not only sentence-by-sentence processing of the input, but also a proper report summarising the findings, which are then yet again synthesised to derive the level of veracity (or lack thereof). 

You may be wondering now where the verification happens, given that the outputs look consistent even when they are sometimes off here and there. The answer to this is ‘nowhere’. In the standard LLM architecture, there’s no separate logical auditor module. Some systems try to bypass this by having the model first write a draft and then “read it and correct any contradictions” (called Chain of Verification – CoVe). But in a regular chat, consistency is simply the result of the mathematical strength of the connections between the words in your question and LLM’s previous sentences being strong enough to keep the model on its toes. That, however, does not guarantee accuracy in deep analysis. 

So maybe, instead of the LLM “helpful assistant” persona, we need a discriminator, like in a GAN to differentiate between truthful and deceptive narratives within one statement. After all, the deception often resembles the randomly generated input… If we could add it on top of the phrase-level analysis, then the model could work differently. It still wouldn’t solve the report generation problem yet, but it would move us closer to a solution. At least, this is what I think. 

Politeness over accuracy

When I look for deceptive phrases, especially in relationship descriptions, I often look at three layers – the obvious descriptive (e.g., ‘we were friends’), the linguistic disposition (e.g., ‘she was fat and ugly’), and the behavioural (e.g., ‘I used to borrow money from her and she took care of my kids’). The obvious analytical move here is to compare those three together as one entity. Being a human, the image you likely see from those examples is closer to extortion than to a friendship. You likely have disregarded the relationship description and focused on the remaining two sentences. You are capable of holding those three puzzle pieces in your head as separate and comparable, while an LLM may actually prioritise the word ‘friend’.

LLMs are fine-tuned toward cooperative, non-confrontational responses, especially in ambiguous cases. If you give an LLM a text where someone is mentioned as a ‘friend’, the LLM may actually understand the rest of the input text through it (vector-wise) rather than flagging the inconsistency as a sign of a false relationship, unless prompted directly for it (but for that, you already need to know what to look for, i.e., you need to either be an SME or do the analysis first).

There are sentiment detection models, of course. Sentiment and emotion valence are easy to derive from text. The problem starts when we have contradictions between what the person says and how they behaved. Sentiment extraction from a description of behaviour is more difficult to do.

So what if we created a model that would map adjective sentiment against action-verb meaning (‘action taking’ was one of the emotion verbalisation forms I used for my thesis, btw)? Or better yet, a model which would extract relational triples or event representations and then check for behavioural and linguistic consistency across them? If the sentiment of the adjectives is consistently negative, or the described behaviour resembles extortion, while the label is ‘friend’, the system should fire an ‘internal contradiction’ alert. The internal contradiction is the definition of deception which I use in my work.

When built, it would likely still need a human to understand Why the contradiction happened, but at least we would have a system to flag it. A very high-level and rough idea.

Back to Top
Back to Top
Context Menu is disabled by website settings.