Em dash

· Source: Simon Willison's Weblog · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Fundamental Awareness, quick

Summary

The author addresses accusations of using Large Language Models (LLMs) to generate blog content, asserting that their writing style generally lacks an "LLM smell." A specific exception is noted: the automated insertion of em dashes into posts. This functionality, implemented via a Python code snippet `s = s.replace(' - ', u'\u2014')`, has been part of the author's blog infrastructure since at least 2015. This predates the widespread public awareness and use of advanced LLMs, originating from a blog port from an older Django version to a new GitHub repository.

Key takeaway

For content creators concerned about their writing being mistaken for AI-generated text, review your publishing pipeline for automated stylistic modifications. Understanding the origins of such features, like the em dash insertion from 2015, can provide evidence against LLM accusations and maintain your authentic voice.

Key insights

Automated text processing features can inadvertently mimic LLM-generated content characteristics.

Principles

Method

The method for inserting em dashes involves a simple string replacement: `s.replace(' - ', u'\u2014')` within the blog's templating system, dating back to a 2015 Django port.

In practice

Topics

Code references

Best for: Software Engineer, General Interest

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.