How cleanup HTML exported from Google Docs for Wordpress


When writing a blog post it is sometimes convenient to do the first draft in Google Docs, and then export the contents of the post to be used in WordPress. However, Google Docs adds a lot of  formatting to the documents which may override the default look and feel of your WordPress page.

In this blog post we show how to remove this extra formatting from the HTML created by Google Docs, by using a text editor that supports regex find-and-replace functionality.

We have tested the following instructions with Sublime.


In the Google Doc select the contents that you wish to paste into your WordPress document, paste into in the “Visual” editor tab  in WordPress, then click on the HTML tab to see the HTML. Now select the HTML text and copy it into Sublime or another editor that supports regular expression find-and-replace functionality.

In Sublime, click on Find->Replace. This will bring up a box at the bottom of the screen. In this box ensure that the regular expression and wrap buttons are selected. The following will remove all of the <span> tags that have been added by google docs, and will leave just the contents that are wrapped inside the <span> tags.

Find: (?s)<span.*?>(.*?)</span>
Replace: $1

After executing the above on the entire document with “Replace All”, select the contents from the sublime document, copy it into the WordPress HTML tab, and then switch back to the Visual editor view. Your document should now look like the rest of your WordPress documents.


Many text editors support built-in regular expression find-and-replace functionality. We have demonstrated how this functionality can be used to help clean-up HTML created by Google Docs.