Word format is a bad choice

by Allen B. Downey

For distributing a read-only document (that is, if the recipient doesn't need to be able to edit the document), Microsoft's Word format is almost always a bad choice.

Reason #1:

Word format is secret, which means that other programs generally can't read it. In reality, there are programs that can, but using them is often inconvenient for the recipient and the results are unpredictable.

For more information on this point, see this essay by Richard Stallman.

Many governments are becoming concerned about the dangers of storing important documents in proprietary formats. The Economist has an interesting article on this topic.

The state of Massachusetts has committed to use only nonproprietary document formats, according to this article. In particular, they endorse the OpenDocument format.

Reason #2:

Word format is a bad choice for distributing documents electronically because it makes documents much bigger than necessary, which wastes network bandwidth, disk space, and email processing time. For example, the minutes of the July 22 faculty meeting were distributed in a Word document that takes 36,864 bytes. The content of the message is simple text with very little formatting. In plain text, it takes 3674 bytes, less than one tenth the size of the Word document!

There is a related problem with embedded images in some formats. According to this Microsoft support page:

...a Microsoft Word 2000 document that contains a JPEG graphic that is saved as a Word 2000 document may have a file size of 45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95 (*.doc) or as Rich Text Format (*.rtf), the file size may grow to 1,289,728 bytes (1.22MB).

...This functionality is by design in Microsoft Word.

Reason #3:

Word format often includes additional information about the author and the document that the sender may not intend to share. Many users do not realize that Word often stores previous versions of a document, and that the recipient may be able to read text that appears to be deleted. Unintentional information leaks can be very harmful.

Here is a page that explains how to remove some of this extraneous information from your documents.

Here is an amusing report about a company that included a little too much information in a press release. WARNING: coarse language.

And here is a recent article in the Chronicle of Higher Education reporting that the academic peer review process, which is supposed to be anonymous, is often unblinded by tags embedded in Word documents.

Reason #4:

Word format is notoriously unreliable. Documents produced with one version of Word often look different when viewed with other versions. Old versions of Word may not be able to read documents produced with new versions. For documents in a permanent archive, it is important to choose a format we will be able to read in the future.

Reason #5:

Word documents spread viruses. Alternatives like plain text and PDF don't.

For information about Word viruses, see this article from a security company or this report from CERT (a widely recognized authority on computer security) or this report from Symantec.

Reason #6:

Word documents are not searchable. You can search through plain text with any number of programs, including mail handlers. Only applications that understand Word format can search Word documents.

As the number of documents you have increase, so does the importance of search.

Reason #7:

According to the findings of fact of the US District Court for the District of Columbia, Microsoft enjoys a monopoly in the office application market, and uses this monopoly to the detriment of consumers. This monopoly is partly the result of a "network effect" in which the value of using MS software increases because other people use it. An example of this mechanism is when someone is required to use MS software in order to read something distributed in a proprietary MS format. Using open exchange formats mitigates the detrimental effects of the MS monopoly.

For more information on this topic, see the Department of Justice's website.

Alternatives:

For messages with simple formatting, plain text is usually the best choice. HTML and RTF provide basic formatting, but not complete control over the appearance of a document. Postscript and PDF are the best choices for documents with detailed graphics and formatting.

Some people think HTML is not a good choice either. This link explains why.

Most text editing applications, including Word, are able to produce documents in all of these alternative formats.

An excellent web page that presents a similar argument is here.

Another excellent web page which makes many of the same points, and provides additional information and links, is here.

And yet another page that makes some of these arguments, and more, is here.