Page 1 of 1

MS Word a Shithole....

Posted: 2003-08-19 02:15pm
by MKSheppard
http://news.bbc.co.uk/1/hi/magazine/3154479.stm

Usually with Microsoft Word, what you see is what you get.

If you make a change to a document, then that is what you see when it gets printed out.

But in fact, in many cases it is what you cannot see at first glance that proves more interesting.

The UK government has now largely abandoned Microsoft Word for official documents and has turned to documents created using Adobe Acrobat which uses the Portable Data Format (PDF).

ā€œIā€™m not sure many people check Word documents before they go out or are published,ā€ says Mr Spenceley.

He says he knows of a case in which someone found previous versions of an employment contract buried in the Word copy he was sent. Reading the hidden extras gave the person applying for the job a big advantage during negotiations.

Sometimes the mistakes are even more public.

During the hunt for the Washington sniper the police allowed the Washington Post to publish a letter sent to the police that included names and telephone numbers.

The newspaper tried to hide these details using black boxes which were easily removed and the sensitive details exposed for all to see.

But it is not just governments, businesses and newspapers that can be embarrassed in this way.

You could be too.

There is a function in many versions of Microsoft Office programs, which includes Word, Excel and PowerPoint, that means that fragments of data (which Microsoft refers to as metadata) from other files you deleted or were working on at the same time could be hidden in any document you save.

This could be embarrassing for any home workers whose colleagues find out that they have been applying for jobs while working at home or being less than complimentary about their co-workers.

With the right tools this hidden data can easily be extracted.

Unix and Linux users can turn to tools such as Antiword and Catdoc to turn the document, including its formatting information, into a simple text file.

Sensitive data was exposed during the hunt for the Washington sniper
He gathered about 100,000 Word documents from sites on the web and every single one of them had hidden information.

In a research paper about the work Mr Byers wrote that about half the documents gathered had up to 50 hidden words, a third up to 500 words hidden and 10% had more than 500 words concealed within them.

The hidden text revealed the names of document authors, their relationship to each other and earlier versions of documents.

Occasionally it revealed very personal information such as social security numbers that are beloved of criminals who specialise in identity theft.

Also available was useful information about the internal network the document travelled through, which could be useful to anyone looking for a route into a network.

Mr Byers wrote that the problem of leaky Word documents is pervasive and wrote that anyone worried about losing personal information might want to consider using a different word processing program.

Alternatively he recommends using utility programs that scrub information from Word documents or following Microsoft's advice about how to make documents safer.

"Microsoft is aware of the functionality of metadata being stored within Word 97 documents and would advise users to follow the instructions laid out in [the Microsoft Knowledge Base - see Related Internet Links]," says a spokesperson. "However, Microsoft do not wish to comment on how customers use the functionality within our software."

Posted: 2003-08-19 02:42pm
by phongn
This is actually ancient news - IIRC the problem has been around at least since Word 95, and some reports say it was reported first in 1993.

The old reccomendation was to disable 'Fast Save' (which IIRC just appended the changes to the end of the file) and, if neccessary, to use 'Save As,' which wipes out all that junk.

As for the UK government switching to PDFs, I can't imagine them actually using it for editing and such but rather for distribution which isn't exactly unusual.

Posted: 2003-08-19 08:09pm
by DarthBlight
Funny, I've used Word for some time and I've never had something like that happen to me. Then again, I'm usually not working on multiple documents at any given time.

Posted: 2003-08-19 08:28pm
by Pu-239
[nelson]Haha[/nelson], OO(Linux) has print to PDF builtin (probably dumps it into ghostscript though) :P .

or I can use kprinter or gs to do so at the expense of poorer font quality (font's become rasterized).

Posted: 2003-08-19 09:12pm
by Einhander Sn0m4n
Once again we have a case of Microsoft deliberately designing a feature of extremely dubious utility yet poses a massive security risk. Is it me or does all this shit do nothing but lend weight to the rantings of conspiracy theorists?

Once again, IBM - I Blame Microsoft.

Posted: 2003-08-19 09:25pm
by phongn
GhostScript PDF isn't that great.

Posted: 2003-08-19 10:30pm
by Pu-239
phongn wrote:GhostScript PDF isn't that great.
Yeah. Can't figure out how to make it not rasterize truetype fonts, so you get grainy fonts if you zoom in a lot. Type1 fonts like Helvetica look fine as PDF, but horrible onscreen, while the reverse is true for truetype. You can't use links or stuff, only convert printer output to pdf like distiller, but *shrugs*. The PDF feature is pretty much useless for me, since I don't distribute files. What irritates me is that PDF files are much larger than their OO or HTML equivelants, and people put them on the web, bloating what would be 100KB into 1MB, which takes 8 fucking minutes to download.

On second thought, the OO pdf export might not use PDF after all, since quality is slightly better (TT fonts aren't rasterized, except for bitstream vera for some reason(I've made them the default for everything, since they look even nicer than Arial)).

Posted: 2003-08-19 10:39pm
by Pu-239
Funny... this thread in pdf is 174kB, while the html, graphics and all, is 254kB.
Then again, tarred and gzipped html beats gzipped pdf(91 html, 141 pdf(funny, the bzipped versions were slightly larger), so HTML is still superior to PDF for web usage for downloadable documents (consider modem compression, etc) that you just want people to read and don't care about forcing them to a font style, etc.

Posted: 2003-08-19 10:46pm
by Sektor31
Huh? Is this stuff in Word 2000?

Yet another thing to worry about...

Posted: 2003-08-19 11:15pm
by phongn
Pu-239 wrote:The PDF feature is pretty much useless for me, since I don't distribute files. What irritates me is that PDF files are much larger than their OO or HTML equivelants, and people put them on the web, bloating what would be 100KB into 1MB, which takes 8 fucking minutes to download.
They want true a WYSIWYG document. OO/HTML doesn't give you that: only a Postscript file (which most can't open) or PDF (superset of PS) can. And PS certainly isn't editable like PDF is by the vast majority of tools.
On second thought, the OO pdf export might not use PDF after all, since quality is slightly better (TT fonts aren't rasterized, except for bitstream vera for some reason(I've made them the default for everything, since they look even nicer than Arial)).
Vera's nice, but I like having true italics rather than mere oblique fonts. Say what you want about them, but Microsoft's typography is good.

Posted: 2003-08-19 11:19pm
by phongn
Pu-239 wrote:Funny... this thread in pdf is 174kB, while the html, graphics and all, is 254kB.
Then again, tarred and gzipped html beats gzipped pdf(91 html, 141 pdf(funny, the bzipped versions were slightly larger), so HTML is still superior to PDF for web usage for downloadable documents (consider modem compression, etc) that you just want people to read and don't care about forcing them to a font style, etc.
I somehow doubt that said PDF is very well optimized. Use Acrobat if you want to make decent PDFs, not GS.

Re: MS Word a Shithole....

Posted: 2003-08-19 11:24pm
by Slartibartfast
MKSheppard wrote:The newspaper tried to hide these details using black boxes which were easily removed and the sensitive details exposed for all to see.
I don't want to comment about the shitiness of Microsoft products right now, but I want to say that trying to cover text with BLACK BOXES in a word processor is FUCKING STUPID!

Posted: 2003-08-20 12:12am
by Crayz9000
phongn wrote:Vera's nice, but I like having true italics rather than mere oblique fonts. Say what you want about them, but Microsoft's typography is good.
Most of Microsoft's fonts are done by AGFA Monotype Corporation...

Posted: 2003-08-20 12:38am
by phongn
Crayz9000 wrote:Most of Microsoft's fonts are done by AGFA Monotype Corporation...
D'oh, that's right. I forgot about that (but still, the bundled fonts are good).

Doesn't Apple include a bunch of Adobe's fonts?

Posted: 2003-08-20 01:57am
by Pu-239
What's wrong with oblique fonts? I only notice poor font quality when using SciTE (no AA).

Posted: 2003-08-20 02:00am
by phongn
Pu-239 wrote:What's wrong with oblique fonts?
I prefer true italics, that's all. Matter of preference - I tend to be picky about fonts.

Posted: 2003-08-20 02:05am
by Slartibartfast
And I repeat: putting BLACK BOXES on top of computer text is almost as stupid as using white corrector fluid on your computer screen.

Posted: 2003-08-20 05:53pm
by The Yosemite Bear
Until some strange graphics fuckup I used MS word for all of my fanfics, but I knew better then to use fast save....