I kept a log of how I converted my blog from Wordpress.com to Jekyll, in case others find it useful. The main hurdles I found were trying to get the imported posts and images together and fixing the hard coded links, but converting them all to relative links solved a lot of it.

I had issues where my initial import resulted in over 350 images being lost, due to the import placing all folders directly into /assets instead of /yyyy/mm/ subfolders like Wordpress does. I fixed the script and submitted a PR to bring it back into the Jekyll importer https://github.com/jekyll/jekyll-import/pull/436

Convert log:

  • Exported from Wordpress https://wordpress.com/support/export/
    • Wordpress XML export - dumps out a list posts with metadata (tags, etc).
    • Wordpress asset export - all images, etc you have stored.
  • Created new Jekyll blog
    Created a new empty site with theme/template via jekyll new davidblog
  • Jekyll WordpressDotCom import tool
    This helped take a standalone copy, as it brought in <img> I had linked from other sites, and correctly updated image path in the blog posts.
    • Found that 5 images in the screen log that had conflicts (searched for text that said “redownload”). Manually fixed those few import errors.
  • Merged exported assets
    Copied the “exported assets” into /assets, but didn’t overwrite. This kept the images that I imported with the links in the markdown, and brought in the other assets that I used and link in other ways.
  • Fixing hardcoded links to old blog
    I had a lot of links to other posts, and links to higher resolution images that were still pointing to original blog host.
    • Did a replace text for davidburela.wordpress.com and davidburela.files.wordpress.com with {{site.baseurl}}
  • Checked for broken links
    Ran a link checker tool against local jekyll server http://localhost:8080
    • I found some old embedded <img> that were broken. Used internet archive to grab copies and updated.
  • Optimised assets
    The /assets folder started at 700MB. Reduced it to 450MB after cleaning and optimising
    • Image optimisation:
    • Removed unused assets:
      • Create a list of all files in the /assets folder to search for in posts find * -type f > search.txt
      • Search through all posts for the assets grep -f search.txt ~/blogdownload/active/_posts/*.* -o -h | sort | uniq > result.txt
      • Show which files were not found in any post grep -v -f result.txt search.txt
      • delete all the files that were not found in any post
  • Fixed author metadata
    Globally replaced the author metadata on all posts as Minima theme didn’t like the format
  • Generated a favicon
    Used the tool on https://favicon.io/ to create a simple DB favicon