Thought I'd post this although it's far from finished. I've been trying to pull text data from various formats using as few external libraries as I can.
The word document format (Word 95 up to Word 2003) is a pretty complex format, especially when compared to the Microsoft Works WPS format (code to read this has been posted previously - this code also has the OleDocument class needed for the code below).
Basic code follows below, but I need to stress that this code is NOT 100% yet. The way I deal with text, both 16 and 8 bit isn't great and I'm sure there's more nasties that the format can add to the text as well as how it encodes tables and hyperlinks. As I'm going to have to leave this code for a while before I improve it (other priorities) I thought I'd post it as-is in case I forget to do so later.