The web is not known for its fine typog­raphy and layout. There have been improve­ments over the years: wide­spread UTF-8 support makes char­acter set prob­lems less of an issue; post-processing tools make typo­graph­i­cally correct punc­tu­a­tion easier to use; advances in CSS provide alter­na­tives to table-based layout and made respon­sive design possible.

In the print world, tools such as LaTeX and Adobe InDesign give you a great deal of control over features such kern­ing, track­ing, liga­tures, hanging punc­tu­a­tion, orphans and widows, hyphen­ation, and justi­fi­ca­tion. CSS has limited support for some of these features (e.g., letter-spacing, text-align, justify-content, various Open­Type features), but it doesn’t compare to the level of control afforded by these other tools.

Though I’d love to learn how to imple­ment all of these features on the web, it’s not some­thing I choose to spend a lot of time on. One resource I’ve found useful is Matthew Butt­er­ick’s Prac­tical Typography.1 While viewing the page source, I noticed he uses soft hyphens in the HTML. Soft hyphens indi­cate possible hyphen­ation points in the text. If the browser chooses to hyphenate the word at a soft hyphen point, it breaks the word and and inserts a visible hyphen. This can prevent awkward spacing in fully justi­fied text and limit the ragged­ness of left-justi­fied text.

I looked for a tool to hyphenate text with Jekyll and came across Aucor Oy’s hyphenate Liquid filter. which uses the Text::Hyphen library. Text::Hy­phen is an imple­men­ta­tion of hyphen­ation algo­rithm used in TeX and InDesign.2 The filter seemed simple enough to use and would provide better line breaks, espe­cially for narrower column widths, such as when viewing the site on a mobile device.

Safari and Chrome don’t hyphenate by default. Here’s a sample:

Unhyphenated text
Unhyphenated text

After installing Aucor’s hyphenate plugin and using the default 2 char­ac­ters to the left and right of the hyphen, we now have hyphen­ated text:

Hyphenated text with a minimum of 2 characters to the left
and right of the hyphen
Hyphenated text with a minimum of 2 characters to the left and right of the hyphen

This makes the right margin less ragged, but leaves the ed of hyphenated alone at the begin­ning of a line, which isn’t very pleasing to the eye and arguably makes reading more diffi­cult. Adjusting the char­acter mini­mums can create a better reading expe­ri­ence. Here’s a sample increasing the left and right mini­mums to 3.

Optimized hyphenated text
Hyphenated text with a minimum of 3 characters to the left and right

The right edge is still better than the unhy­phen­ated exam­ple, and hyphenated is now split between hyphen and ated. Opin­ions may differ whether this is opti­mal, but at least now we have control over the experience.

Looking further at the effects of the plugin, I noticed some para­graphs weren’t hyphen­ated. Upon closer inspec­tion, I deter­mined that any para­graph containing sub-ele­ments, such as a, em, strong, or code, is ignored. This is a known issue. Given I often use sub-ele­ments, it’s likely I’d have more unhy­pe­nated para­graphs than hyphen­ated ones. I thought I’d try to fix it. After all, it’s open source!

I cracked open the file and was struck at how little code there is. Nothing over-­com­pli­cated. Straight­for­ward use of Nokogiri to parse the HTML, Text::Hyphen to hyphenate the content, and a state­ment to register the filter with the Liquid templates used by Jekyll.

So, how to fix the bug? I didn’t see an easy way to test the existing code other than hack, regen­erate the pages, and observe the output. In my expe­ri­ence, without a good test suite, I’d likely break good behavior trying to fix the bugs.

From the code I also saw that the last word in the para­graph is special-­cased, in that the last word is not hyphen­ated. But I could see that it also ignored any other instance of the same word in the same para­graph as well.

I now have two bugs in code that I can’t easily test. Given how small the code was (20 lines), I decided to extract the code into a gem to make it easily testable, instal­lable, and config­urable via _config.yml, Jekyll’s global config­u­ra­tion file.

And that’s how the jekyl­l-hy­phen­ate_­filter gem came about. I fixed the issue with content containing sub-ele­ments. As for special-­casing the last word of the para­graph, I opted to remove it. I didn’t see an easy way to ignore only the last word. Admit­tedly, this is a trade-off. How often is the last word in a para­graph repeated else­where in the para­graph at a point where we’d want to hyphenate it?

Having Jekyl­l::Hy­phen­ate­Filter as a gem means that I don’t have to copy the file into the _plugins direc­tory. It also reads the Jekyll site config­u­ra­tion, so source code doesn’t need to be modi­fied. And the Test::Unit tests ensure that I didn’t screw up the behavior I wanted while fixing the behavior I didn’t.

Over­all, I’m happy with how it turned out. Aucor Oy’s code is clear and extracting the code was straight­for­ward. The gem works like I want and I’m able to share the results with others.

  1. I considered using Butterick’s Pollen publishing system for this site, but ultimately chose Jekyll as it was faster to get set up, had enough of the features I valued, and is extensible via Ruby, a language I’m comfortable with. ↩︎
  2. The more advanced Knuth-­Plass algorithm used in both TeX and InDe­sign looks at the entire para­graph, choosing to break lines to balance the para­graph as a whole. Given the dynamic nature of a web page, Knuth-­Plass can’t be applied beforehand.

    Bram Stein has written a Javascript implementation which should be able to take this into acount. ↩︎