2013-01-23 03:25:21 -0500

Conversion from ikiwiki to Hakyll

Ikiwiki is a great idea in theory. For those unfamiliar with the concept, it is basically a fully functional wiki integrated with a version control system (VCS), all done in such a way that the pages are served statically unless you are modifying the configuration or one of the pages.

An excellent solution for those who need that level of interactivity, but it is a bit much for my purposes. It is a fairly complex system designed to be a wiki, but without some of the downfalls of other wikis which serve all pages dynamically. It was this very complexity which led me to eventually realize that I do not actually want a wiki. I want website generation software with the following features:

Allows me to write my posts and pages in Markdown.
Allows fully static pages (off site generation of pages, not "code as website").
Allows files associated with the posts to be with the posts without having to jump through too many hoops. The perfect example for this is a post which has pictures: unless the pictures are being referenced from an external gallery, they should be physically located in the same directory as the post.
Does not require any on line configuration (think blob of pages with no preferences).
Allows a hierarchical layout of the source files based on date.
Allows any VCS to be used, without requiring extraordinary efforts to be made.
Allows a portion of the site to be formatted as a blog, where the most recent post is on top, with the rest in reverse chronological order all the way down (or, if one desires, with separate pages for each chunk of posts).
Lets me play with a new programming language
- This is my comfort food: some people like brownies, but I like new programming languages... and brownies.
Felt fun.

There are quite a few website generation software packages available which provide most of the above features, so the big two ended up being new programming language and fun. Haskell has had my attention for a while, even if that attention was hidden behind thesis work, research, classes, promotion testing, family, children, and life. This seemed the perfect time to allow that attention to come to the fore, and learn a bit about a language which seems interesting.

Fortunately, I am approaching my defense, so this provided something for me to do when my brain was screaming, "STOP WITH THE RESEARCH STUFFESSES!"

Unfortunately, I am approaching my defense, so I had to take the "get it working" route instead of the "learn the language thoroughly" route.

With that, here is an accounting of the conversion, lessons learned, modifications required, and result (which, with any luck, is what you are reading right now).

First, I discovered that cabal is not a package manager when Hakyll bumped from version 3.X to version 4.X. My experience involved fumbling about in the dark trying to figure out why my cabal install hakyll was not working properly, and nothing would compile, all the way to having to remove all ports beginning with hs- using pkg delete hs-. That resulted in starting over with portmaster devel/hs-haskell-platform, which led me to the second part of the conversion...

How the heck do I convert these 148 posts to the markdown with metadata format used by Hakyll, and determine the proper published and updated dates? This was complicated by the fact that the website has gone through so many transitions, including converting from HTML (straight, no chaser) to Blosxom to WordPress to ikiwiki. The last conversion including mass conversion from a MySQL database to text files in Markdown format, with the entire thing added to git and metadata adjusted to reflect the actual date as recorded in the database. Not exactly impossible, but not something I wanted to tackle without a bit of script-fu. Unfortunately there is no single, "one script to rule them all," to share with you, the audience. Instead it took a series of three to five line scripts which allowed me to extract my dates, convert all similarly formatted posts, and then manually handle the 20 or so posts which did not match the common format. The total conversion took about 2 hours, including building each script as needed and looking up metadata within the historical MySQL dump, which is not bad.

In the process, however, I discovered the dangers of depending on file system metadata to record modified dates when dealing with a DVCS: when you checkout a file, the file metadata changes!!! Yep... all those preciously guarded modified timestamps were lost whenever the file was checked out. That leads to my first lesson from the conversion:

Always store your metadata explicitly.

Following the extra time the lesson imposed in digging out metadata from the backups of the old website (which, thank goodness, I had), the next step was determining a structure for the website.

Each example of a possible layout will be representing a post named, "Editing my navel," that was (notionally) published on the 12th of January, 2013 at 23:48:09 UTC.

The default structure Hakyll uses is as follows.

posts/2013-01-12-Editing_my_navel.md

The default, being the default, seemed much less appealing on multiple grounds. The primary being that it violated the, "keep the files with the post," requirement, with another, more minor issue being the lack of specificity: What if I wanted to make two posts in the same day? Or hour? Or minute? Would the two posts properly sort within the day (given that the file path is usually used to sort in Hakyll)? Or would they be sorted arbitrarily? Three alternative layouts occurred to me, illustrated by the following three examples.

posts/2013/01/12/23/48/09/Editing_my_navel.md
posts/2013-01-12-23-48-09-Editing_my_navel.md
posts/2013/01/12/23/48/09/Editing_my_navel/index.md

While all three would be ideal with regards to specificity, they are awkward as regards to human access of a given post. This led to further navel gazing, and some searching, where I came across another Hakyll blog by Ian Ross which used a limited version of one of my schemes.

posts/2013/01/12/Editing-my-navel.md

While this solved most of the problems I was anticipating (mostly from having run into them on other website platforms), I wanted something that resulted in clean URLs and allowed files that go with posts to be located with them on disk (not all files of the same day go with all posts of the same day), which led to the final version.

posts/2013/01/12/Editing_my_navel/index.md

With this layout most of the sorting can be done at the file system level, files that go with posts can be located with the post on disk, and the URLs to refer to the posts do not require the un-site-ly (yuk) index.html at the end. As to clean URLs - I do not know why, but I have disliked having the *.html file at the end of the URL for years. This does not solve the specificity problem, but as each post will have an explicit published metadata field this will do until I figure out how to sort using the published field.

Alright, the layout for posts has been figured, now what about the layout for projects? The projects layout was much easier, as projects are supposed to be longer, less time oriented pages, thus the date is omitted from the path and the following title might be a new HOWTO on navel editing.

projects/Navel_editing_HOWTO/index.md

After that, and with the example of the site.hs of the author, the only significant problem was designing a sort routine to keep the posts in chronological order, which turned out to be a non-issue. The previously mentioned blog also uses a similar layout and had already rewritten the chronological function. With that as an example, generating my own chronological function was relatively simple, resulting in the following.

import System.FilePath (splitDirectories, joinPath)
import Data.List (sortBy)
import Data.Ord (comparing)

-- | Sort pages chronologically. This function assumes that the pages have a
-- @posts/year/month/day/title/index.md@ naming scheme.
chronological :: [Item a] -> [Item a]
chronological = sortBy $ comparing $
    joinPath . take 3 . drop 2 . splitDirectories . toFilePath . itemIdentifier

Note that I use this by importing Hakyll as follows.

import Hakyll hiding (chronological)

Of course, shortly after I was able to get the above function functional, Hakyll was upgraded to version 4.2.X, which automatically sorts based on "published" metadata if it exists. Harumph.

The last item on the agenda was allowing access to the source files from within the website itself (copy the source files along with the compiled files). Making the original markdown files available turned out to be simpler than it originally seemed, requiring only addition of another match and specification of the version when loading snapshots. Just make sure to define the postPattern.

postPattern = "posts/*/*/*/*/index.md"

Copy the raw posts and define the version.

-- Make sure to copy the raw posts
match postPattern $ version "raw" $ do
route $ idRoute
compile $ getResourceBody

When loading snapshots later on, just make sure to refer to the snapshots not associated with a version... otherwise, the result will probably not be what you intend.

loadAllSnapshots (postPattern .&&. hasNoVersion) "content"

With that, all the basics were in place, with nothing to do except tweak and work on items that are not absolutely required (breadcrumbs being a good example of something desired but not required). As it is, you should be reading this on the new and improved site. I will put a TODO list up on the facets page to indicate what is left to be done.