Wordpress Export Format
I’ve worked on parsing a WordPress.com xml export file and convert it to markdown and I can’t say that I’m a big fan of the xml format. There are some issues with the format that really bugs me, it makes it tedious to work with the files. It feels a bit like they needed to have an export, to just make put a tick in a box, and not a feature that has been taken care of over time.
Before I start bashing on the format, I must say that it’s excellent that the platform allows you to export all your information. It’s becoming more and more difficult to do in today’s world of isolated silos! 👍
Lets just try and break some things down. I’ve omitted some data and replaced it with
... where I felt it necessary to make it readable.
So far so good, but this is where the behaviour I expected ends.
All of a sudden, there are multiple categories without an enclosing parent. I know it’s not necessary but it sure makes it easier to parse the data if you know that all upcoming elements are of one specific type. This behaviour is present troughout the file and is found in tags, items, comments, metadata and some other elements. It’s atleast consistent so you know what to expect.
The next thing that bothers me is
item, which is an entity that can be many different things, namely:
- nav menu
The type of an item is defined by the field called
wp:post_type on an
item. This makes it difficult to understand what each field represents for the different types.
Categories, metadata and comments are all pushed directly into the
item element without any enclosing element.
Why would you have use this kind of format?
- Is it to reduce the size of the XML-file?
- Is it to make it difficult to parse the file, and thereby keep people in the ecosystem?
- Is it for legacy reasons? The format developed from just having items listed to, having multiple types listed. No tought more than to complete the task
Probably it’s not just one of these, but a combination of different choices made over the years that’ve made it what it’s today.
A Suggestion to improve the format
A good practice is to not talk complain over something if you can’t give some well structured feedback or ideas for improvement. I’ll try to suggest, what I believe is, an improvement over the existing xml format!
My changes include:
- Splitting items into multiple different element types ( posts, images, attachments, pages and similar )
- Discard the flat structure in favour of a nested one
The drawback with this approach is that it would create a lot of extra bytes in the file but I believe can motivate the change with:
- Making it easier to read
- Making it easier to parse
- The import and export are not time critical operations, so the extra bytes doesn’t matter
A completely different approach would be to move to have the export format based on JSON instead. I feel it would be a little bit more with the times.
Wordpress is a highly succesful project, that I’ve used myself, but it seems that it’s having some difficulty keeping up with the times. I suppose the idea behind Wordpress is to enable basic tasks and let plugins handle the more advance/polished used cases.