Page 1 of 1

Anti-Stupid filtering software research

Posted: 2007-11-10 04:43am
by The Grim Squeaker
Source wrote:OMG!!! The end of online stupidity?
Finally, software developers are building a filter that blocks unintelligible comments, writes Fortune's Josh Quittner.

(Fortune Magazine) -- Internet veterans have long complained about the steady erosion of civility -- and worse, intelligence -- in online discourse. Initially the phenomenon seemed to be a seasonal disorder. It occurred every September when freshmen showed up for college and went online. Tasting for the first time the freedom and power of the Internet, the newbies would behave like a bunch of drunken fraternity pledges, filling electronic bulletin boards with puerile remarks until the upperclassmen could whip them into shape.

Things took a dramatic turn for the worse in 1993, when AOL (Charts, Fortune 500) loosed its tens of thousands -- and then millions -- of users onto the Net. The event came to be known as the Endless September, and true to its name, it continues to this day.

It's a serious problem. Fools and bandwidth hogs have a way of driving traffic away from the most successful online destinations, a phenomenon that could ruin the emerging social networks and user-generated aggregators like Digg.

But there's still hope for intelligent life on the Internet. A team of software developers is hard at work on a "stupid filter" that promises to do to idiotic online comments what a spam filter does to junk and unwanted e-mail: put it in a place where it can't hurt anyone anymore.


That's the mission, anyway, of the cadre of techies toiling under the leadership of Gabriel Ortiz, a 27-year-old systems administrator in Albuquerque. Ortiz's team is readying a free, open-source version they hope to release by year's end and make available as a standard plug-in on the popular Firefox browser by early next year.

How does it work? Say a user wants to post a really, really dumb comment on, for example, cnnmoney.com, where some of you might be reading this now.

If cnnmoney had the filter installed on its servers, it would intercept the comment just before it was published and flash a little alert at the author that reads: "This comment is more or less unintelligible. Please try to restate it."

The writer would get another crack at it, and another, until at last he was able to muster a few words of intelligence, or in frustration wandered off to inflict those LOL!!!!!s and OMG!!!!s on some more tolerant site.

From a programming standpoint, not to mention a social one, building a piece of software that can separate intellectual wheat from chaff is tricky; it's far more difficult than building a spam filter, says Ortiz. That's because spam filters tend to do relatively simple pattern matching, searching e-mail for words that pop up frequently in junk mail.

Your spam filter sees V*I*A*G*R*A and without rolling its eyes flicks the offending missive into the junk folder, where it can be deleted along with the rest of its filthy brethren.

But thanks in part to irony and its sneering cousin, sarcasm, stupidity is tougher to spot. "Smart people are often ironic," says Ortiz, noting that irony, to a computer anyway, can sound stupid. Writers who are otherwise intelligent will intentionally misspell words or break the ironclad rules of grammar to make a point.

The stupid-filter team is trying to accommodate this behavior with a variety of rules of thumb. For instance, Ortiz, who studied linguistics as an undergrad, recently noticed a pattern in the way some writers use letter repetition. The clueless tend to repeat consonants: "This video is amazinggggg!!!" By comparison, says Ortiz, "when you repeat a vowel, you're being sarcastic -- 'Yeaaaaaah.' We'll be using several different methods to try to mediate this."

The first line of defense is context -- using well-established markers of standard English to judge a piece of writing. For instance, if the rest of the sentences in a comment are grammatical, and difficult words are spelled properly -- Ortiz mentioned "zucchini," which I had to look up -- the message ought to get by the filter. If the rest of the comment is unintelligible, it will be screened.

Perhaps the most interesting -- and ironic -- aspect of the project is the way Ortiz's team is tapping into the wisdom of crowds to debug its filter. They are encouraging readers to visit their site, http://stupidfilter.org/main/, where you can help them rate on a scale of one to five a selection of potentially dumb posts culled from -- where else? -- YouTube.

Ortiz has clearly hit a nerve. Offers of help have been rolling in from all over the world ever since the project was unveiled. He thinks there might even be a business in it, since staying current with pop culture and maintaining the corpus of stupidity is more or less a full-time job. To which I'd add, Yeaaaaaah.
Lolz, awesome dudez!!!1Shift1! :P

Posted: 2007-11-10 02:29pm
by Hawkwings
Awesome. Applied for a mod position there.

Posted: 2007-11-10 04:44pm
by Phantasee
But thanks in part to irony and its sneering cousin, sarcasm, stupidity is tougher to spot. "Smart people are often ironic," says Ortiz, noting that irony, to a computer anyway, can sound stupid. Writers who are otherwise intelligent will intentionally misspell words or break the ironclad rules of grammar to make a point.
I thought of Stark when I read that.

Posted: 2007-11-10 05:04pm
by aerius
Too bad it can't kick the retarded poster in the nuts.

Posted: 2007-11-10 05:06pm
by Phantasee
But if we can't hear the retarded posters, do they even exist?

Depending on the answer, there might be no nuts to kick.

Posted: 2007-11-10 05:24pm
by Stark
Phantasee wrote:But if we can't hear the retarded posters, do they even exist?

Depending on the answer, there might be no nuts to kick.
It's all elitism man this will deny the internet to anyone who doesn't TOE THE LINE y'know it'll be used to REPRESS unpopular or DANGEROUS opinions man they're trying to destroy THE MESSAGE.

In any case, it will never work. 8)

Posted: 2007-11-10 10:51pm
by Hawkwings
Stupidfilter FAQ wrote:Isn't filtering stupidity elitist?

Yes. Yes, it is. That's sort of the whole point.

Posted: 2007-11-10 10:54pm
by Stark
Do you have a point? If you think this software won't be misused, as current language filters are, you're retarded.

And I repeat, it will never work anyway.

Posted: 2007-11-10 11:03pm
by Hawkwings
From what I've read, this project plans to take on Stupid by going after the format and structure of the sentence itself, not keywords and similar methods.

How do you misuse that? It doesn't care about what words you say, unless your words contain 18 letters in a row and multiple exclamation points or something.

Posted: 2007-11-10 11:17pm
by Stark
Right, and it won't be configurable or run off rules at all, right? You totally won't be able to say 'remove anything that says hurtful things about Christ', no sir.

I bet they said the same things about profanity filters, and JMSpock uses those to turn 'bullshit' into 'reasonable'. :lol: :lol:

Not that it'll ever work.

Posted: 2007-11-10 11:39pm
by Hawkwings
so, "Only remove stupid that also includes the words 'Jesus Christ' + derivatives and(list of bad words)"?

I suppose you could make an add-on that would let this filter recognize certain words, but that's just a word filter again. This thing is meant to filter structure, not language.

Posted: 2007-11-11 12:09pm
by Sarevok
PHP, Myspace, Hotmail and so many other sites spent money in vain to stop people typing hacker scripts into their input boxes. They can't filter something like <script or javascript: from reaching browsers on a client PC about to be hacked. The number of possible ways to obfuscate <script or javascript: as some other word to hide from filters is limited yet they keep failing this simple task. Dumb ass 15 year olds continue to find ways to get around filters on billion dollar corporate owned sites. <script banned ? No problem ! 4 years ago <s<scriptcript in an incoming email would get through hotmail filters. Filters removed <script from <s<scriptcript resulting in <script which is what you want if you want to hack someone's email account ! With people there is ridiculously greater flexibility since unlike browsers as long people get a hint the message will go through. LOL banned ? L0L is the answer. The idiots who brought us SMSspeak and ASCII art will not even notice there is a filter.

In short throw this in the same dustbin as Freedomship and Flying Car by 2015 articles.

Posted: 2007-11-11 12:23pm
by SilverWingedSeraph
Sarevok wrote:PHP, Myspace, Hotmail and so many other sites spent money in vain to stop people typing hacker scripts into their input boxes. They can't filter something like <script or javascript: from reaching browsers on a client PC about to be hacked. The number of possible ways to obfuscate <script or javascript: as some other word to hide from filters is limited yet they keep failing this simple task. Dumb ass 15 year olds continue to find ways to get around filters on billion dollar corporate owned sites. <script banned ? No problem ! 4 years ago <s<scriptcript in an incoming email would get through hotmail filters. Filters removed <script from <s<scriptcript resulting in <script which is what you want if you want to hack someone's email account ! With people there is ridiculously greater flexibility since unlike browsers as long people get a hint the message will go through. LOL banned ? L0L is the answer. The idiots who brought us SMSspeak and ASCII art will not even notice there is a filter.

In short throw this in the same dustbin as Freedomship and Flying Car by 2015 articles.
The thing is, if the finished product is anything like what is described in the article, all of those things you suggested are more likely to be blocked than a sentence full of nothing but accurately spelled curse words. It's an anti-stupid filter. Not a word filter. It analyses the composition of sentences, it doesn't scan for naughty words.

Thus, using leetspeak, would, theoretically, result in "You make no fucking sense, go back and type it properly, dickwad", or something to that effect. At least, that's how it would work in my ideal world.

HOWEVER! Given that this is not my ideal world, I would not be overly surprised if this thing turns out to be nothing more than a glorified wordfilter piece of shit. Honestly, I wouldn't even be moderately surprised.

Posted: 2007-11-11 12:25pm
by Hawkwings
Once again, this thing doesn't ban a wordlist. If your fancy script thing doesn't look like common written english, it's not going to be let through. And how the hell do you plan on ASCII art getting through? The filter will see that "Oh, this post contains a vast majority of non-standard characters. Block." Problems with AIMspeak? The filter sees "Oh, the vast majority of words in this post are misspelled, there's a string of 20 of the same character, and there's random capitalization. Block."

Edit: ahh, ninja'd.

Posted: 2007-11-11 12:38pm
by DogsOfWar
Unless they get better moderators, their project won't even get past the "compile and rate" stage. Just looking at their Random Stupidity page shows far too little consistency in their ratings - you can tell some of their mods are rating comments based on what the person is saying rather than how they say it (the latter being what the project is about).

Posted: 2007-11-11 02:13pm
by Sarevok
Hawkwings wrote:Once again, this thing doesn't ban a wordlist. If your fancy script thing doesn't look like common written english, it's not going to be let through. And how the hell do you plan on ASCII art getting through? The filter will see that "Oh, this post contains a vast majority of non-standard characters. Block." Problems with AIMspeak? The filter sees "Oh, the vast majority of words in this post are misspelled, there's a string of 20 of the same character, and there's random capitalization. Block."

Edit: ahh, ninja'd.
How do you define what's 1337 speak and what's not though ? Banning anything that is not in the dictionary will also ban my real name, a lot of user names, scifi terms, many real science terms, names of places etc. If you use rules instead of a database like xss filters you are open to same exploits - some script kiddy will figure out the rules and post how to get around them. Untill we have something like an AI reading every input this won't work.

Posted: 2007-11-11 03:16pm
by Hawkwings
That's one of the main problem they're dealing with. Right now it seems like they want to put some sort of limit to the leet-speak before blocking. As for proper nouns, I imagine something capitalized, such as a name, would get a different flag than random-blab. Scientific terms are trickier. I have no idea how they might structure that in.

As for the rules... people already know how to get past them: use proper spelling, grammar, punctuation, and words. Instead of saying "lolz ur ideas suckkkkkkkkkk gtfo noob!!!!1!!11!" you could say. "You ideas suck. Get out of this discussion, because you obviously don't know what you're talking about."

And if they can't write something like that, well, what a shame to not have them in the discussion.