Finding all the same words in a text file.

brianeyci · Post by **brianeyci** » 2005-11-29 09:12pm

You know what's really really annoying. Looking for similar words when you're writing essays or fanfiction or stories. It's a common problem -- sometimes writers use the same word several times in a page or three or four pages without noticing. Usually this gets caught before going out, but my professor said to his horror one of his published books had this!

I need a Windows based self-executable or requiring JRE program that takes in a block of inputted text, searches for similar words and displays the line numbers. There should be a database you can add to of common words like "said" "he" "she" "the" to avoid, also editable. I am talking large, large files. It should take in plain text documents either cut and pasted into the window then at a press of a button tell you all the occurances of a certain word, which line they occur on and highlight them preferably. A customizable dictionary of words to avoid is crucial because I don't want words like "the" or "said" or character's names highlighted, but I do want all adjectives and verbs that are repeated highlighted. Something like notepad or textpad but with a "find all similar words" function avoiding all the database words. Note : I am not talking about find and replacing a word, I am talking about searching through text for all occurances of words that happen twice or more without knowing the word beforehand. Without knowing the word beforehand you can't use Find and Replace.

This program should be fairly easy to write; if it exists, can someone be kind enough to put in a link. I use OpenOffice and cannot find a similar function, nor do I know of a similar function in Word (I'm no professional and have never taken a course for word). I would do it myself but the last programming course I took was first year and I've forgotten everyfuckingthing lol! Also note, I am not a computer science major and not submitting this as a project, this is not homework, this is because I'm really really pissed at using one word then the same word again a few paragraphs later and not noticing through 10 different drafts!

If nobody has written such a program or has the time to write it, I'll relearn Java and do it myself LOL but I thought I might ask before I do it myself, and also because writers would find this a really useful tool (why the fuck isn't there a "find all same words" in Word or OpenOffice?)

Thanks,
Brian

brianeyci · Post by **brianeyci** » 2005-11-29 10:23pm

Destructionator XIII wrote:I don't know of any thing to do that off the top of my head, but writing a program to do that should be trivial. I'll get on it as soon as I have some spare time. Programs I write generally don't look pretty, but they work, and I'll have somethning in about a week at most.

Yes, it's very trivial (which is why I asked it hardly seems worth relearning all the syntax just for such a fucking trivial thing), and I don't care as long as it works. Thanks destructionator, if you have a paypal I can send a donation. Or better yet release it under GNU GPL

.

Brian

phongn · Post by **phongn** » 2005-11-29 11:02pm

Regular expressions for teh win!

Jew · Post by **Jew** » 2005-11-30 05:06pm

Here is a web page that lists a number of text analysis programs: http://academic.csuohio.edu/kneuendorf/ ... a/qtap.htm

I think the word frequency feature you want is in several of the programs, including the one called HAMLET. However, I didn't see an easy way in HAMLET to filter out common words like "the" and "a".

andrelem · Post by **andrelem** » 2007-02-02 09:11am

Hello,

I am also looking for any programme that can find occurrences of words that happen twice or more in a text without knowing the word beforehand. So, if someone can help me, that would be nice.
André, Belgium