This one will be short.
“Semantic line breaks” is a conventional use of line breaks when writing text in a markup language. It eases editing and review when using diff tools, but does not impact rendering of the final published work.
I started a new writing project, hoping to use GitBooks alongside GitHub again, but was thwarted by a limitation on how they handle semantic linebreaks. But then a random nudge sent me back into the rabbit hole. I was not the only user to see that GitBooks had a problem; I was encouraged to report my pain point. It’s either a bug to be fixed or a missing feature meriting a “new feature request”.
That’s not much of a rabbit hole. No, but I did search to see if more people were complaining about the issue. What did I find?
Semantic Linefeeds
There is an online article about Semantic Linefeeds in the context of Python documentation. It also reminded me that this is not just for markdown. Indeed, I had naturally been doing it for HTML many years ago. The author, Brandon Rhodes, talks about how he’s had this trick in his toolbox for a couple of decades. The big reveal is that he learnt about it from a memorandum by none other than Brian Kenighan, initially written in 1974.
That’s before my time in the industry. I do remember, though, how every time I edited a text document in the late 80s, I’d have to fix the word wrapping every time I made a change within a paragraph. Not all editors had auto-wordwrap in the early days. Rhodes even addresses this directly:
I encourage students to treat the files as private “source code” that they are free to format semantically. Instead of fussing with the lines of each paragraph so that they all end near the right margin, they can add linefeeds anywhere that there is a break between ideas.
Where to Break?
I’ve been doing this very simply. I’ve been adding my semantic line breaks at the end of a sentence. That’s easy, and helps immensely.
Sentences in technical documentation are short. You want to be concise. You do not want ambiguity or confusion. So, sentences become single clauses. If your sentence can’t fit on one line, it might be too complicated.
Reading the readme for a tool to inject semantic line breaks, I gained more insight.
> When writing text > with a compatible markup language, > add a line break > after each substantial unit of thought.
I’m going to reconsider my own usage of this peculiar writing format. I’ve been using it for C all my life, and that’s why Rust hurts me at times. For Markdown, it’s now been about 5 years. I’ve been doing it timidly, relying on periods, colons, and semicolons as my trigger to move to a new line. I’m going to confidently break at commas too now.
I’m going to try it out in LaTeX too. I’ll see what can be done in Confluence documentation, but, like in WordPress, I’m in a WYSIWYG mode by default and have not found out how to turn on source view mode.
There’s a Tool to Help
If there’s one tool, there will be more out there. I’m not sure how I feel about going back into all my “normal” text and converting it. I can do a search and replace for “. ", that is a period followed by a space, and add a carriage return. Any reasonable editor could do that. I could write a SED script. Yes, I really did say that.
But there is a tool out there that is a bit smarter. It uses machine learning to add appropriate line feeds. I alluded to this tool before when I referenced the readme. I’ve not used the tool, so I can’t vouch for it. I only mention it to catalyze your investigation into tools if you are thinking of making bulk changes.
Sembr is available on GitHub for Linux, Mac, and Windows. It’s available as a python library/package.
If you look through the readme, you will find a list which includes other projects.
You’ll Thank Me Later
I’ve handled many ugly diffs in markdown documents. Trying to see what changed in a paragraph that is contained in one single text line takes a few minutes. When using semantic breaks, you can see clearly which sentence was actually touched.
You’ll start catching run-on sentences. You’ll see awkward, meandering sentences. Cleaner, concise documentation helps those understand; surely that is why you wrote it in the first place, to share knowledge.
Better documentation, faster reviews with more willing reviewers, and easier maintenance of documentation. You win every way.
Conclusion
Yes, this was just an epilogue to my previous post. The main takeaways?
- This is an old technique, but still valid now, and maybe more meaningful with the prevalence of markup languages.
- There are tools to help convert legacy text.
- If you encounter software that doesn’t work, please tell the creator. They might not know the problem exists, and “it’s the squeaky wheel that gets the oil.”
Do you have a favourite hint for markdown documentation? Any comments, questions, or feedback are very welcome.
Re: “I can do a search and replace for
., that is a period followed by a space, and add a carriage return.”1. Perhaps allow for multiple space chars following the full-stop char. (I’m trained the old fashion way and always put two spaces behind the full-stop. For example, see the beginning of this two-sentence parenthesized paragraph.)
2. Need to think deeper for the char preceding the full-stop char.
3. Now, this is getting messier – what about quoted (single/double quotes) string of chars that contains full-stop chars?
LikeLike
Yes, looking for a pair is very naive.
Since I am mostly concerned about MD, I fail at the very first ordered list in my file.
1. First list item.
1. Second list item.
Adding breaks there is very undesirable.
When I’m in a comment block of code, and using a tool that does not stretch the space after a period, I too revert to . I learnt to type on an Adler portable typewriter in the 1970s. I remember when I first found a word processor that took care of that spacing for me. It would cheerfully swallow all whitespaces after the period and use the specified point length space. Grammarly catches me slipping into my old ways occasionally.
There are a lot of edge cases:
– numbered lists
– sentences ending in a quote
– sentences ending in question marks and exclamation marks (or, for my American colleagues, exclamation points.)
– or and brackets and parentheses.
I can see why the tool exists and why they struggled making a rules-based tool work well.
LikeLike
Hey – WordPress compressed my period-followed-by-two-space into a single space.
I want my money back! 😝
LikeLike
Nope, I am happy for WordPress to fix my occasional double space back to the correct single character, and render it as per the selected style.
My problem was with GitBooks and single line breaks (vertical space) in markdown being interpreted as paragraph breaks.
The specification defines lines, hard line breaks, and paragraphs clearly.
I am not happy about Markdown having dialects that render differently, especially within the same ecosystem. GitBooks used to match GitHub styling, but now it does not. I am assuming this is a temporary problem.
LikeLike