AI/LLM signatures

Gary Hinson
3 days ago
4 min read

Updated: 2 days ago

This morning I've been reading, thinking and writing about the things that reveal the true origins of a substantial proportion of the stuff posted on social media lately. It is clear to me that they are entirely or largely AI/LLM-generated, churned-out by the robots.

Some pieces are frankly terrible, as if the posters have simply dashed off their prompts and regurgitated the robots' raw, crude output without a care for the readers. I've seen factual errors, manipulative phrasing and biases-a-plenty. It appears they have not been proof-read at all, at least not competently. These are the worst of the bunch, adding to the rising tide of LLM slop. They have little to no value, perhaps even negative in the sense that we could have crafted our own prompts, reflecting our specific information needs rather than whatever the posters and robots dumped on us. Posters who put their own names to such pieces, failing to acknowledge or credit the robots, are essentially plagiarists.
Some are not so bad with signs that humans had more involvement in the process, perhaps refining the prompts, reviewing and revising the robotic output, knocking off the sharp corners at least. It is not easy to tell whether the robots or the posters took the lead on composing these pieces, although there are clues in the structure and nature of the writing. It is reasonable to expect posters to have fact-checked the content in the course of editing it, so the result presumably reflects their credentials and expertise ... but not necessarily. Maybe they were too busy to pay sufficient attention to the process or were misled by the robots' oh-so-convincing use of language. Maybe they were working in a foreign language, oblivious to the subtle emphases and phrasing that can materially affect the meaning of a piece. Maybe they were simply inept, distracted or careless. Maybe not. Still, on balance, these are more valuable than the first category.
The few that remain show little if any sign of AI/LLM involvement, hence their authenticity and value depend almost entirely on the credibility, credentials and expertise of the human authors. Even here, though, the posters may have consulted the robots for advice on the language, phrasing, flow and perhaps the claims or facts presented.

That analysis led me to think about indications of robotic involvement in a piece of prose. Clues such as these typically catch my beady eye:

'Not this, but that'-style phrasing, juxtaposing and contrasting alternatives. This is a version of the 'strawman' approach where the first clause is clearly meant to be knocked down, discounted or disregarded in favour of the second. As a rhetorical technique, it has its place ... except when the robots are building specious arguments, glossing-over the details and playing on human readers' tendency to accept the validity of the contrast at face value.
Capitalisation Of Every Word In Headings (Often Including Conjunctions): some human writers do this too, so it is not a completely reliable sign of robotic origins, just Another Possible Indicator.
Short paragraphs, some as little as single sentences. Or just a few words. This too is merely an indication. Guidance on plain English typically advises the use of short sentences since lengthy, multi-clause sentences (like this one) tend to be more complex and hence more difficult for readers of limited reading abilities to understand, and fair enough.
For adult readers of normal ability, however, It Soon Gets Annoying.
Bullet points, lots of bullet point lists - like this one, only more so, sometimes multi-level.
- Whereas humans typically prefer dots or dashes for bullets, the robots tend towards emojis:
  - Colourful little graphics that lightheartedly allude to various things.
- Further emojis are often scattered about like confetti.
  - They may be appropriate in informal social communications but are generally ill-suited to professional comms.
A strong preference for "like" over similar phrases ... like "such as", "for instance" or "for example" e.g. "endpoint devices like computers, smart devices, and routers". Like can also mean "Has an affinity towards", so maybe the robots are telling us that 'endpoint devices' are best friends with 'computers, smart devices and routers'. [Aside: ironically enough I just spotted this example in a line promoting an AI detection tool on Google: "[The product] accurately detects texts generated by the most popular tools, like ChatGPT, Gemini, and Copilot." Hmmm, I wonder ...]
Precise, correct grammar and accurate spelling: those of us what was bought up proper no how to rite well, and so to do robots. We're compliant rule-followers, seldom bending or breaking the rules ... except by mistake or to make a point. Like this.
Some say that em-dashes are more common in robotic than human content, although personally (thanks to an autocorrect rule in Word) I use them quite a bit to indicate a lengthier pause than a comma or semi-colon, whereas I mostly use n-dashes or hyphens to link words, as in the next sentence ...
Inaccurate citations, quotations or claims: again, we humans are error-prone too, but experts writing on their home turf are likely to trip over and correct the most egregious errors when proofreading and finalising their materials, whereas the robots just blast on through, regardless, strident as ever.
References to outdated standards, laws, methods etc. are a pretty strong clue, since present-day robots were mostly trained on material gathered months or years ago, missing out on recent updates - such as the 2022 release of ISO/IEC 27001 and 27002 with a completely reorganised set of information security controls from the 2013 release.

If a piece has just a few of those clues, chances are it falls into category 2 or 3 above. If it has several or many, it stands out as a category 1 terror. As far as I can tell, all the AI/LLM robots tend to over-use those techniques making their outputs somewhat similar, whereas human authors gradually develop their own unique and often characteristic styles - their preferred turns of phrase, stylistic choices and so on.

The point, finally, of this ramble/rant is that I have drafted new definitions for a few relevant terms to add to the hyperglossary, such as 'disambiguation' (not this, but that) and 'AI/LLM signature' (a set of robotic content indicators - an AImetric as opposed to a biometric).

Cybersecurity Hyperglossary

AI/LLM signatures

Recent Posts

Comments