Scrub-a-Doc.

Scrubber ✦ Blog

Convert Word to Markdown: Four Methods That Keep Your Structure

Markdown has become the lingua franca of modern content tooling. Documentation sites run on it. Static-site generators consume it. AI tools like ChatGPT, Claude, and Gemini prefer it over Word for context-window efficiency. Version control treats it as text and diffs it line by line, which is something .docx files cannot do. The format that started as a 2004 plain-text shorthand is now the connective layer between writers and the systems that consume their writing.

The catch is that most writing still starts in Microsoft Word or Google Docs. The rich-text editor is genuinely faster for drafting prose, and many teams write in Word for reasons that have nothing to do with technology. So the conversion from Word to Markdown happens constantly. Documentation teams migrate years of .docx files to a static-site generator. Marketing teams hand off Word drafts to engineers who paste them into Markdown CMS systems. Writers paste their AI prompts in from Word and lose all the structure on the way.

This guide covers four ways to convert Word to Markdown, ranks them by use case, and walks the edge cases that trip every tool. The fastest path for one-off conversions is to upload the .docx to Scrub-a-Doc and download the cleaned Markdown. The rest of this post explains when that is the right call and when one of the heavier options earns its complexity.

Why people are converting Word to Markdown right now

The shift is not random. Three forces push it.

AI tooling. Modern language models train on text. They understand Markdown structure (headings, lists, code blocks) without needing the visual cues that Word documents carry. Pasting a Word document into a chat interface often loses formatting along the way; pasting Markdown preserves it. Anthropic’s documentation on prompt engineering recommends structured input for exactly this reason.

Version control. Git diffs .docx files as binary blobs. Two writers editing the same Word document and merging through Git is a nightmare. Two writers editing the same Markdown file and merging through Git is a normal Tuesday. Documentation teams that moved from Word to Markdown cite this single property as the reason.

Static-site generators. Hugo, Jekyll, Astro, MkDocs, and Docusaurus all consume Markdown. Teams running these systems convert their Word source once, then never touch Word again for the same content.

If any of those describe your workflow, the conversion is worth getting right. Sloppy conversion drops your headings, mangles your lists, and forces you to redo the work in Markdown anyway.

Method 1: A browser-based cleaner

For a one-off conversion or a small batch, a browser-based tool is the right choice. You upload the .docx, you download the .md, you move on.

Scrub-a-Doc does this specifically. It accepts a .docx upload or pasted text, runs the conversion client-side in your browser, and outputs clean Markdown you can copy or download. Because the conversion runs in your browser, the document never touches an external server. That property matters when the content includes anything sensitive.

The structure that survives the conversion includes headings, lists (both ordered and unordered), bold and italic, hyperlinks, blockquotes, and tables. The structure that gets stripped includes Word’s hidden styles, the <o:p> tags Word inserts, font and color information, and the comments and tracked changes that should not appear in the output.

When this method falls short: documents over a few hundred pages, batch jobs (you have fifty .docx files to process), and workflows that need to integrate into a build pipeline.

Method 2: Pandoc on the command line

Pandoc is the standard command-line tool for document conversion. It handles .docx to Markdown cleanly, supports both GitHub Flavored Markdown and CommonMark as output formats, and produces some of the most accurate conversions available.

The basic command:

pandoc input.docx -o output.md

For better results, specify the Markdown variant explicitly:

pandoc input.docx -f docx -t gfm -o output.md

gfm produces GitHub Flavored Markdown, which most static-site generators expect. Use commonmark if you need stricter spec compliance. The full list of supported Markdown variants lives in the Pandoc User’s Guide.

Pandoc handles tables, footnotes, hyperlinks, and embedded images correctly. For images, the default behavior is to keep references to the source images embedded in the .docx. To extract them to a folder, use the --extract-media flag:

pandoc input.docx --extract-media=./images -o output.md

When this method makes sense: you process documents weekly or daily, you want the conversion scripted, or you need to integrate the conversion into a build pipeline. Pandoc also supports Lua filters if you need to transform the document mid-conversion.

The cost is the install and the learning curve. For a single document, the browser tool is faster.

Method 3: A developer pipeline (Mammoth plus Turndown)

If you build software that converts .docx to Markdown programmatically, the standard pipeline runs through two libraries.

Mammoth reads the .docx and produces clean HTML. Turndown takes that HTML and produces Markdown. Together they form a pipeline that runs entirely in JavaScript and works in both Node and the browser.

A minimal Node example:

const mammoth = require("mammoth");
const TurndownService = require("turndown");
const turndown = new TurndownService();

mammoth.convertToHtml({path: "input.docx"})
  .then(result => {
    const markdown = turndown.turndown(result.value);
    console.log(markdown);
  });

This pipeline is the engine behind several browser-based Word-to-Markdown converters. The Mammoth side strips Word’s hidden styling and produces semantic HTML; the Turndown side converts that HTML to Markdown.

For Python developers, python-mammoth plus markdownify is the equivalent stack.

When this method makes sense: you build a CMS, an internal tool, or a SaaS that needs to accept Word uploads and store them as Markdown. Embedding the conversion in your code is more reliable than depending on an external service.

Method 4: Online conversion services

Several websites offer free Word-to-Markdown conversion. Some run client-side; many run server-side. The convenience is real, but the privacy property is not.

A server-side converter receives your document, processes it on its server, and returns the result. Whether the service logs your content, retains it, or trains a model on it is often unclear. For public material, this is fine. For internal documents, customer data, or anything covered by an NDA, it is not.

Before using an online converter on anything sensitive, check the privacy policy. Look specifically for whether the service stores uploads, whether they are encrypted, and how long they are retained. If the policy does not say, assume the worst.

Scrub-a-Doc processes documents client-side specifically to avoid this problem. The document never crosses a network boundary. If you use a different online converter, verify the privacy property before sending anything you would not post publicly.

The edge cases worth testing

Most converters do well on simple documents. The differences show up at the edges.

Tables. Pandoc handles complex tables, including merged cells and column spans, better than most browser tools. If your documents lean on tables, Pandoc is worth the install.

Images. Word embeds images inside the .docx ZIP. Pandoc can extract them to a folder and rewrite the Markdown to reference the extracted files. Most browser tools either embed images as base64 (which works in Markdown but bloats the file) or drop them entirely. Pick the tool whose image handling matches your workflow.

Footnotes. Pandoc preserves them. Mammoth preserves them. Many simpler converters do not.

Code blocks. If your Word document uses a monospace font to denote code, most converters will not pick that up as a code block automatically. You will likely need to add the triple-backtick fences manually after conversion.

Headings inside lists or tables. Markdown does not handle this case the way Word does. Some converters output it as best they can; others give up. Test before committing.

Track changes and comments. Drop them before publishing. Most converters do this automatically; verify on your specific tool.

Pick a method, then automate

The right tool depends on volume.

If you convert a document once a week, use a browser-based cleaner. Scrub-a-Doc is fast, free, and runs client-side.

If you convert documents daily, install Pandoc and write a one-line shell script that processes a folder of .docx files in a single command.

If you build a product that needs to convert Word uploads on the fly, embed Mammoth and Turndown in your code.

What format does your team draft in today, and how often does that drafting choice cost you time downstream?

Text formatting doesn't have to be hard

Scrub-a-Doc helps you preserve text structure between the platforms and tools you use everyday.

Start Now

Learn more about text formatting online

Convert Word to HTML Without the Hidden Junk

Microsoft Word’s “Save as Web Page” feature has a deserved reputation. It produces some of the worst HTML you will encounter outside of a 1998 GeoCities archive. Anyone who has ever pasted a Word document into a CMS and watched the ...

Convert Word to Markdown: Four Methods That Keep Your Structure

Markdown has become the lingua franca of modern content tooling. Documentation sites run on it. Static-site generators consume it. AI tools like ChatGPT, Claude, and Gemini prefer it over Word for context-window efficiency. Version control treats it ...

Convert HTML to Markdown: How to Strip the Tags and Keep What Matters

HTML to Markdown is the most common format conversion in modern content work, and most people doing it are doing it for the same handful of reasons. They are migrating a website from one CMS to another. They are extracting an article to feed into an ...

Convert Markdown to HTML: Three Methods From Plain Text to Publish-Ready Output

Markdown to HTML is the most-rendered conversion on the internet. Every time you read a Reddit post, view a GitHub README, open a Notion page, or scroll a Slack channel, software is converting Markdown to HTML in real time so your browser can render ...