Improved hyphenation in Jekyll
The web is not known for its fine typography and layout. There have been improvements over the years: widespread UTF-8 support makes character set problems less of an issue; post-processing tools make typographically correct punctuation easier to use; advances in CSS provide alternatives to table-based layout and made responsive design possible.
In the print world, tools such as LaTeX and
Adobe InDesign give you a great deal of control over features such
kerning, tracking, ligatures, hanging punctuation, orphans and widows,
hyphenation, and justification. CSS has limited support for some of these
features (e.g., letter-spacing
, text-align
, justify-content
, various
OpenType features), but it doesn’t compare to the level of control afforded by
these other tools.
Though I’d love to learn how to implement all of these features on the web, it’s not something I choose to spend a lot of time on. One resource I’ve found useful is Matthew Butterick’s Practical Typography.1 While viewing the page source, I noticed he uses soft hyphens in the HTML. Soft hyphens indicate possible hyphenation points in the text. If the browser chooses to hyphenate the word at a soft hyphen point, it breaks the word and and inserts a visible hyphen. This can prevent awkward spacing in fully justified text and limit the raggedness of left-justified text.
I looked for a tool to hyphenate text with Jekyll and came across Aucor Oy’s hyphenate Liquid filter. which uses the Text::Hyphen library. Text::Hyphen is an implementation of hyphenation algorithm used in TeX and InDesign.2 The filter seemed simple enough to use and would provide better line breaks, especially for narrower column widths, such as when viewing the site on a mobile device.
Safari and Chrome don’t hyphenate by default. Here’s a sample:
After installing Aucor’s hyphenate plugin and using the default 2 characters to the left and right of the hyphen, we now have hyphenated text:
This makes the right margin less ragged, but leaves the ed of hyphenated alone at the beginning of a line, which isn’t very pleasing to the eye and arguably makes reading more difficult. Adjusting the character minimums can create a better reading experience. Here’s a sample increasing the left and right minimums to 3.
The right edge is still better than the unhyphenated example, and hyphenated is now split between hyphen and ated. Opinions may differ whether this is optimal, but at least now we have control over the experience.
Looking further at the effects of the plugin, I noticed some paragraphs
weren’t hyphenated. Upon closer inspection, I determined that any paragraph
containing sub-elements, such as a
, em
, strong
, or code
, is ignored.
This is a known issue. Given I often use sub-elements, it’s likely
I’d have more unhypenated paragraphs than hyphenated ones. I thought I’d try to
fix it. After all, it’s open source!
I cracked open the file and was struck at how little code there is. Nothing over-complicated. Straightforward use of Nokogiri to parse the HTML, Text::Hyphen to hyphenate the content, and a statement to register the filter with the Liquid templates used by Jekyll.
So, how to fix the bug? I didn’t see an easy way to test the existing code other than hack, regenerate the pages, and observe the output. In my experience, without a good test suite, I’d likely break good behavior trying to fix the bugs.
From the code I also saw that the last word in the paragraph is special-cased, in that the last word is not hyphenated. But I could see that it also ignored any other instance of the same word in the same paragraph as well.
I now have two bugs in code that I can’t easily test. Given how small the code
was (20 lines), I decided to extract the code into a gem to make
it easily testable, installable, and configurable via _config.yml
,
Jekyll’s global configuration file.
And that’s how the jekyll-hyphenate_filter gem came about. I fixed the issue with content containing sub-elements. As for special-casing the last word of the paragraph, I opted to remove it. I didn’t see an easy way to ignore only the last word. Admittedly, this is a trade-off. How often is the last word in a paragraph repeated elsewhere in the paragraph at a point where we’d want to hyphenate it?
Having Jekyll::HyphenateFilter as a gem means that I don’t have to copy the
file into the _plugins
directory. It also reads the Jekyll site configuration,
so source code doesn’t need to be modified. And the Test::Unit tests ensure
that I didn’t screw up the behavior I wanted while fixing the behavior I didn’t.
Overall, I’m happy with how it turned out. Aucor Oy’s code is clear and extracting the code was straightforward. The gem works like I want and I’m able to share the results with others.
- I considered using Butterick’s Pollen publishing system for this site, but ultimately chose Jekyll as it was faster to get set up, had enough of the features I valued, and is extensible via Ruby, a language I’m comfortable with. ↩︎
-
The more advanced Knuth-Plass algorithm used in both TeX and InDesign looks at the entire paragraph, choosing to break lines to balance the paragraph as a whole. Given the dynamic nature of a web page, Knuth-Plass can’t be applied beforehand.
Bram Stein has written a Javascript implementation which should be able to take this into acount. ↩︎