Short Attention Span Theatre

Conversion from ikiwiki to Hakyll

Ikiwiki is a great idea in theory. For those unfamiliar with the concept, it is basically a fully functional wiki integrated with a version control system (VCS), all done in such a way that the pages are served statically unless you are modifying the con­fig­u­ra­tion or one of the pages.

An excellent solution for those who need that level of in­ter­ac­tiv­i­ty, but it is a bit much for my purposes. It is a fairly complex system designed to be a wiki, but without some of the downfalls of other wikis which serve all pages dy­nam­i­cal­ly. It was this very complexity which led me to eventually realize that I do not actually want a wiki. I want website generation software with the following features:

There are quite a few website generation software packages available which provide most of the above features, so the big two ended up being new pro­gram­ming language and fun. Haskell has had my attention for a while, even if that attention was hidden behind thesis work, research, classes, promotion testing, family, children, and life. This seemed the perfect time to allow that attention to come to the fore, and learn a bit about a language which seems in­ter­est­ing.

For­tu­nate­ly, I am ap­proach­ing my defense, so this provided something for me to do when my brain was screaming, "STOP WITH THE RESEARCH STUFFESSES!"

Un­for­tu­nate­ly, I am ap­proach­ing my defense, so I had to take the "get it working" route instead of the "learn the language thor­ough­ly" route.

With that, here is an accounting of the conversion, lessons learned, mod­i­fi­ca­tions required, and result (which, with any luck, is what you are reading right now).

First, I discovered that cabal is not a package manager when Hakyll bumped from version 3.X to version 4.X. My experience involved fumbling about in the dark trying to figure out why my cabal install hakyll was not working properly, and nothing would compile, all the way to having to remove all ports beginning with hs- using pkg delete hs-. That resulted in starting over with portmaster devel/hs-haskell-platform, which led me to the second part of the conversion...

How the heck do I convert these 148 posts to the markdown with metadata format used by Hakyll, and determine the proper published and updated dates? This was com­pli­cat­ed by the fact that the website has gone through so many tran­si­tions, including converting from HTML (straight, no chaser) to Blosxom to WordPress to ikiwiki. The last conversion including mass conversion from a MySQL database to text files in Markdown format, with the entire thing added to git and metadata adjusted to reflect the actual date as recorded in the database. Not exactly impossible, but not something I wanted to tackle without a bit of script-fu. Un­for­tu­nate­ly there is no single, "one script to rule them all," to share with you, the audience. Instead it took a series of three to five line scripts which allowed me to extract my dates, convert all similarly formatted posts, and then manually handle the 20 or so posts which did not match the common format. The total conversion took about 2 hours, including building each script as needed and looking up metadata within the historical MySQL dump, which is not bad.

In the process, however, I discovered the dangers of depending on file system metadata to record modified dates when dealing with a DVCS: when you checkout a file, the file metadata changes!!! Yep... all those preciously guarded modified timestamps were lost whenever the file was checked out. That leads to my first lesson from the conversion:

Always store your metadata explicitly.

Following the extra time the lesson imposed in digging out metadata from the backups of the old website (which, thank goodness, I had), the next step was de­ter­min­ing a structure for the website.

Each example of a possible layout will be rep­re­sent­ing a post named, "Editing my navel," that was (notionally) published on the 12th of January, 2013 at 23:48:09 UTC.

The default structure Hakyll uses is as follows.

The default, being the default, seemed much less appealing on multiple grounds. The primary being that it violated the, "keep the files with the post," re­quire­ment, with another, more minor issue being the lack of speci­fici­ty: What if I wanted to make two posts in the same day? Or hour? Or minute? Would the two posts properly sort within the day (given that the file path is usually used to sort in Hakyll)? Or would they be sorted ar­bi­trar­i­ly? Three al­ter­na­tive layouts occurred to me, il­lus­trat­ed by the following three examples.

  1. posts/2013/01/12/23/48/09/
  2. posts/
  3. posts/2013/01/12/23/48/09/Editing_my_navel/

While all three would be ideal with regards to speci­fici­ty, they are awkward as regards to human access of a given post. This led to further navel gazing, and some searching, where I came across another Hakyll blog by Ian Ross which used a limited version of one of my schemes.

While this solved most of the problems I was an­tic­i­pat­ing (mostly from having run into them on other website platforms), I wanted something that resulted in clean URLs and allowed files that go with posts to be located with them on disk (not all files of the same day go with all posts of the same day), which led to the final version.

With this layout most of the sorting can be done at the file system level, files that go with posts can be located with the post on disk, and the URLs to refer to the posts do not require the un-site-ly (yuk) index.html at the end. As to clean URLs - I do not know why, but I have disliked having the *.html file at the end of the URL for years. This does not solve the speci­fici­ty problem, but as each post will have an explicit published metadata field this will do until I figure out how to sort using the published field.

Alright, the layout for posts has been figured, now what about the layout for projects? The projects layout was much easier, as projects are supposed to be longer, less time oriented pages, thus the date is omitted from the path and the following title might be a new HOWTO on navel editing.

After that, and with the example of the site.hs of the author, the only sig­nif­i­cant problem was designing a sort routine to keep the posts in chrono­log­i­cal order, which turned out to be a non-issue. The previously mentioned blog also uses a similar layout and had already rewritten the chrono­log­i­cal function. With that as an example, generating my own chrono­log­i­cal function was relatively simple, resulting in the following.

import System.FilePath (splitDirectories, joinPath)
import Data.List (sortBy)
import Data.Ord (comparing)

-- | Sort pages chronologically. This function assumes that the pages have a
-- @posts/year/month/day/title/ naming scheme.
chronological :: [Item a] -> [Item a]
chronological = sortBy $ comparing $
    joinPath . take 3 . drop 2 . splitDirectories . toFilePath . itemIdentifier

Note that I use this by importing Hakyll as follows.

import Hakyll hiding (chronological)

Of course, shortly after I was able to get the above function functional, Hakyll was upgraded to version 4.2.X, which au­to­mat­i­cal­ly sorts based on "pub­lished" metadata if it exists. Harumph.

The last item on the agenda was allowing access to the source files from within the website itself (copy the source files along with the compiled files). Making the original markdown files available turned out to be simpler than it originally seemed, requiring only addition of another match and spec­i­fi­ca­tion of the version when loading snapshots. Just make sure to define the postPattern.

postPattern = "posts/*/*/*/*/"

Copy the raw posts and define the version.

-- Make sure to copy the raw posts
match postPattern $ version "raw" $ do
route $ idRoute
compile $ getResourceBody

When loading snapshots later on, just make sure to refer to the snapshots not associated with a version... otherwise, the result will probably not be what you intend.

loadAllSnapshots (postPattern .&&. hasNoVersion) "content"

With that, all the basics were in place, with nothing to do except tweak and work on items that are not absolutely required (bread­crumbs being a good example of something desired but not required). As it is, you should be reading this on the new and improved site. I will put a TODO list up on the facets page to indicate what is left to be done.

Excellent example of WebGL programming » « Changing the expiration date of my GPG key
sast favicon