Slashdot wrote:"Carnegie Mellon University has taught a computer how to read and learn from the internet. According to Dennis Baron at the Oxford University press blog, the computer is called NELL and it is reading the internet and learning from it in much the same way that humans learn language and acquire knowledge. Basically by soaking it all up and figuring it out.
NELL is short for Never Ending Language Learner and apparently it is getting brainier every day."
Photography Genius is always allowed some leeway, once the hammer has been pried from its hands and the blood has been cleaned up.
To improve is to change; to be perfect is to change often.
I've very curious how it treats different languages and slang. Is it treating all human languages as a single big language, or many separate languages? What about languages that are separate, but have a lot of similar elements or loan words?
I'm have to make a note to look up NELL and see how it is doing in the future.
I have to wonder what the criteria for an allowed website would be. This is nothing something you just let go lose on the Internet. Reason being that if it hit a foreign language website without a translation, that could seriously screw up the program.
Also, you don't want your program leaving it's 'digitial footprints' on websites you'd rather not have to explain to any backers the project might have.
Finally, you have the possibility of websites with malicious code. I'm quite sure a program like NELL wouldn't be particularly vulnerable to them (it's a text reader, after all), you still want to limit potiental damage to the program.
I would not be surprised to learn that NELL actually is limited to some very specific websites right now, with a team deciding what is and is not allowable for security/practically reasons.
I've been asked why I still follow a few of the people I know on Facebook with 'interesting political habits and view points'.
It's so when they comment on or approve of something, I know what pages to block/what not to vote for.
Burr Settles: Currently, NELL reads indiscriminately. Of course, it tends to learn about proteins and cell lines mostly from biomedical documents, celebrities from news sites and gossip forums, and so on. In future versions of NELL, we hope it can decide its own learning agenda, e.g., "I've not read much about musical acts from the 1940s... maybe I'll focus on those kinds of documents today!" Or, alternatively, we could say we need it to focus on a particular document. Previous successes in "machine reading" research have in fact relied on a narrow scope of knowledge (e.g., only articles about sports, or terrorism, or biomedical research) in order to learn anything. The fact that NELL learns to read reasonably well across all of these domains is actually a big step forward.
It has been interesting to hear the public's response to NELL. There are many jokes about what will happen when it comes across 4chan or LOLcats, for example. But the reality is, those texts are already available to NELL, and it is largely ignoring them because they are so ill-formed and inconsistent.
There are many jokes about what will happen when it comes across 4chan or LOLcats, for example. But the reality is, those texts are already available to NELL, and it is largely ignoring them because they are so ill-formed and inconsistent.
Which only proves that the computer is already demonstrating that it is far more intelligent than the average internet user.
Sorry, you don't get to call 'intelligent' for ignoring 4chan and LOLcats. It most likely cannot understand such websites because it's incapable of processing the visual and cultural stimuli that accompany such things.
That would be like a man who hates opera stating that a man who was blind and deaf was proving that he was much smarter than everyone else for ignoring opera houses as irrelevant.
CaptainChewbacca wrote:Dude...
Way to overwork a metaphor Shadow. I feel really creeped out now.
I am an artist, metaphorical mind-fucks are my medium.
I'm sure it can process them. The problem is that there is nothing but sub-culture in-jokes and memes on those sites, so it simply doesn't understand the information, and when it delves further it probably finds that the sites offer little advice on how to, so it just moves on.
To use your example, it would be like a man who knows nothing of opera and has no prior opinions of it whatsoever walking into an opera house, and when he doesn't know how to react to the piece being performed, those he asks for advice simply say more things he doesn't understand. He'll likely shrug and leave.
No I am pretty sure the complex nature of sites like 4chan is what prevents them from being deciphered by computer programs. The day artificial intelligence can do such a thing would be an indication computers can approximate human thinking. Lets use an analogy here. An image recognition program would have great sucess if detecting a target pattern set against a green studio background. But picking out a camoflouged tank in a forest ? That remains subject of research. I am loath to abuse the words signal to noise ratio in a subject (computer science) I know little about but I suspect something similar applies here. In a paragraph written in 1337 speak or sms speak the real meaning is hidden under a fog of random noise.
I have to tell you something everything I wrote above is a lie.
Danger, danger, AI hype alert. This is essentially a web crawler connected to a classic symbolic AI propositional logic engine via rule based NLP and a basic statistical belief support generator. Essentially they are saying 'twenty years of domain experts putting carefully crafted entries into Cyc's knowledge base failed to produce a useful AI, let's try filtering the web instead'. They are only about the three hundred and fifteenth group to think of this approach, some of whom replicated the expert supervision and some of whom went more for 'crowdsourcing' idea (see: MindPixel).
The research may be useful for the usual proposed semantic web applications and marginal improvements to rule-based NLP in general. However the classic words-as-atomic-tokens-in-a-semantic-net approach to symbolic AI has been thoroughly and rightfully discredited as a means of producing general artificial intelligence.