Converting my blog from wordpress to Jekyll
I kept a log of how I converted my blog from Wordpress.com to Jekyll, in case others find it useful. The main hurdles I found were trying to get the imported posts and images together and fixing the hard coded links, but converting them all to relative links solved a lot of it.
I had issues where my initial import resulted in over 350 images being lost, due to the import placing all folders directly into /assets
instead of /yyyy/mm/
subfolders like Wordpress does. I fixed the script and submitted a PR to bring it back into the Jekyll importer https://github.com/jekyll/jekyll-import/pull/436
Convert log:
- Exported from Wordpress https://wordpress.com/support/export/
- Wordpress XML export - dumps out a list posts with metadata (tags, etc).
- Wordpress asset export - all images, etc you have stored.
- Created new Jekyll blog
Created a new empty site with theme/template viajekyll new davidblog
- Jekyll WordpressDotCom import tool
https://import.jekyllrb.com/docs/wordpressdotcom/
This helped take a standalone copy, as it brought in <img> I had linked from other sites, and correctly updated image path in the blog posts.- Found that 5 images in the screen log that had conflicts (searched for text that said “redownload”). Manually fixed those few import errors.
- Merged exported assets
Copied the “exported assets” into/assets
, but didn’t overwrite. This kept the images that I imported with the links in the markdown, and brought in the other assets that I used and link in other ways. - Fixing hardcoded links to old blog
I had a lot of links to other posts, and links to higher resolution images that were still pointing to original blog host.- Did a replace text for
davidburela.wordpress.com
anddavidburela.files.wordpress.com
with{{site.baseurl}}
- Did a replace text for
- Checked for broken links
Ran a link checker tool against local jekyll serverhttp://localhost:8080
- I found some old embedded <img> that were broken. Used internet archive to grab copies and updated.
- Optimised assets
The/assets
folder started at 700MB. Reduced it to 450MB after cleaning and optimising- Image optimisation:
- Used PNGauntlet to optmise PNG
- Used Pingo to optimise JPG
- Removed unused assets:
- Create a list of all files in the
/assets
folder to search for in postsfind * -type f > search.txt
- Search through all posts for the assets
grep -f search.txt ~/blogdownload/active/_posts/*.* -o -h | sort | uniq > result.txt
- Show which files were not found in any post
grep -v -f result.txt search.txt
- delete all the files that were not found in any post
- Create a list of all files in the
- Image optimisation:
- Fixed author metadata
Globally replaced the author metadata on all posts as Minima theme didn’t like the format - Generated a favicon
Used the tool on https://favicon.io/ to create a simple DB favicon