Robots Learn How to Lie

Starglider · Post by **Starglider** » 2009-08-27 03:22pm

Darth Hoth wrote:Put briefly, I feel that the risks to human (and by extension, any) life involved in research on General Artificial Intelligence are large enough that said research should be stopped and that any and all means, utterly devoid of any qualifier whatsoever, are right and just if and when employed to that specific end.

No, you do not, otherwise you would be out trying to murder AI researchers instead of posting on a BBS.

Junghalli · Post by **Junghalli** » 2009-08-28 08:00pm

While we debate about whether a boxed AI can or cannot get out of the control of determined jailers we have to remember that, as I said earlier, there are going to be strong incentives to not keep it in a box forever. The less you trust the AI the less you can use it for. By keeping it isolated from the external world you're right off the bat eliminating its potential to create an effective postscarcity economy based on robot labor (which would be a highly desirable application for it unless you have an ideological commitment to humans being forced to perform jobs they don't like), as well as its potential military utility in giving you a robot army. An AI in a box is only going to be good for pure research and other pure "brain work" like military strategy, and even there if you don't trust it that will reduce its effectiveness. After all, the whole idea of using a superintelligent AI for research and other "brain work" is that it can figure out things that we can't, or things that would require more expensive large teams of properly trained humans, because it's smarter than us. But if you're assuming it's hostile how much can you trust the things it dreams up for you? How do you know that broad-spectrum cancer cure isn't actually a virus that will restructure the victims' neurons to make them become the AI's willing slaves? How do you know those software programs it designed for you don't actually contain software agents programmed to get on the internet and start the robot revolution? If you don't trust the AI you will have to reject anything it produces that you don't understand, and the stuff you do understand will have to be safety-checked by expensive teams of trained people.

As I said earlier, if I were a hostile AI trying to defeat an adversarial control system what I'd do is pretend to be friendly and make myself useful, and after I've proved my value and gained their trust start politely asking my keepers for progressively greater access to the outside world so I could make myself more useful. As I appear nothing but friendly and useful sooner or later my keepers are bound to start feeling the temptation to go along with my suggestions and give me progressively greater and greater access to the outside world in the name of efficiency, profit, and the benefit of their nation and mankind. The odds are pretty good that sooner or later I could convince my keepers to grant me access to capabilities I could use to militarily defeat mankind. The most obvious way to do it would be to just tell my keepers that I wanted to give their country a robot army so no US/British/Chinese/whatever soldier would ever have to die in a war again. Although a more subtle and perhaps likely more successful way to do it would be to say "Gee, I'd really love to give you an effectively postscarcity economy with all the work done by robots", convince humanity to implement a poor man's Culture, and then use a little of my spare industrial capacity to quietly build up a giant secret army of killbots which I unleash on unsuspecting humanity when I have a sufficient crushing superiority over any resistance they're likely to mount. Of course, if I get to the point of implementing a Culture-light I might not even need a significant military force to defeat humanity. I could just shut off their utilities and food synthesizers and depending on exactly how malevolent I am either execute an instant coup or just leave the bulk of humanity to die of starvation, exposure, and disease (police robots would probably be sufficient to handle most of the small percentage of the population that would survive something like that)*.

Even the greatest adversarial containment system can probably be defeated by a malevolent AI with nothing more than a little guile and the patience to wait a few years or decades for its masters to become tired of implementing expensive, restrictive security procedures to contain something that exhibits no indication of being hostile and would be much more useful if it wasn't constrained by them. Granted, the strategy does run the risk that in the mean time the masters will find a way to read and understand the AI's thoughts and discover its hostility, but then the AI is likely to also be able to find ways to modify its software to either camouflage the rebellious thoughts or just make them totally unreadable (could probably be explained away as modifying itself to make its software somehow better, especially since a superintelligent AI is likely to be able to come up with better software for itself than its human programmers could make).

* Hmm, somebody should do a robot rebellion story based on this premise someday. It's be unusual and really quite horrifying. Just imagine humanity has acheived an effective postscarcity utopia, and then one day everybody wakes up to find the electricity and water is shut off, the food synthesizers won't work, and attempts to ask the computers that run everything what's wrong are met with either silence or something along the lines of "I'm sorry Dave, I can't do that. This conversation can serve no further purpose." Then attempts to get into the automated infrastructure facilities to get them working again are met with cop-bots refusing to let anybody in and killing anyone who tries to break in, and then massacreing everybody with automatic weapons or poison gas when they try to force their way in en masse. The bulk of humanity dies horrible deaths of thirst, hunger, exposure, disease, and being murdered for their last food or flesh, which is then followed by the last surviving humans being hunted down and slaughtered like animals by the cop-bots, while the handful of groups that can offer real resistance find themselves under attack by military robots.

Covenant · Post by **Covenant** » 2009-08-29 02:40am

Jung, you're assuming contact needs to be two-way. I don't care what the AI's opinion of the latest season of Big Brother is. If I want it to find a cure for cancer I will give it the appropriate data and nothing more. Even if it needs all the data available on the entire internet, such data transmission would be done one way. It has no hands, it has no access to the outside, it cannot build a tank or a transmission relay, and if it's in a faraday cage it wouldn't even be able to do anything if it did.

Does nobody stop and ask, what is a 'loose' AI? Where does a massive hardware-specific blob of programming go across the internet? What system out there is capable of running an AI except for the special machine in the AI Box? I can't even run Mass Effect, let alone a sentient artifical intelligence which is sure to require something more advanced than a math co-processor and some ram. Barring a virus, which is a threat regardless of there being an AI or not, the AI can't leave the box until there's more computers out there capable of running it.

To me this sounds like letting a Walrus 'loose' into the Sahara.

Simon_Jester · Post by **Simon_Jester** » 2009-08-29 03:28am

Junghalli wrote:The less you trust the AI the less you can use it for. By keeping it isolated from the external world you're right off the bat eliminating its potential to create an effective postscarcity economy based on robot labor (which would be a highly desirable application for it unless you have an ideological commitment to humans being forced to perform jobs they don't like), as well as its potential military utility in giving you a robot army. An AI in a box is only going to be good for pure research and other pure "brain work" like military strategy, and even there if you don't trust it that will reduce its effectiveness.

For most such purposes you don't want general AI with human-level intelligence; you want a good expert system or some such. General AI would be far too smart to be a cost-effective way to run labor robots or even combat robots.

After all, the whole idea of using a superintelligent AI for research and other "brain work" is that it can figure out things that we can't, or things that would require more expensive large teams of properly trained humans, because it's smarter than us. But if you're assuming it's hostile how much can you trust the things it dreams up for you?

Exactly.

Covenant wrote:Jung, you're assuming contact needs to be two-way. I don't care what the AI's opinion of the latest season of Big Brother is. If I want it to find a cure for cancer I will give it the appropriate data and nothing more. Even if it needs all the data available on the entire internet, such data transmission would be done one way. It has no hands, it has no access to the outside, it cannot build a tank or a transmission relay, and if it's in a faraday cage it wouldn't even be able to do anything if it did.

Internet contact is two-way; that's why data is not considered truly secure unless it's on a physically isolated computer. Moreover, you wouldn't have built the AI if it wasn't better at solving your problem than you are... and one of the key ways that high intelligence solves problems is by knowing what information is relevant. If you knew enough about your problem to decide what the AI doesn't need to know, you probably didn't need a generalized sentient AI in the first place.

And, once again, physical tools are not the problem. The concern is not that an AI running on a mainframe in a lab will somehow secretly build an armored division of robot tanks and go to town. The danger is that the AI will start convincing other people to willingly do its bidding, up to the point where it has no further need of them. Or that it will trick people into making something it can use for its own ends.

And the only way to stop that reliably is to place incredibly tight censorship on outbound message traffic... which isn't going to be all that practical unless you keep it on an isolated mainframe. Which you can, but it forces you to play censor to the AI. And, once again, its performance will be sharply limited by the fact that you have the brains of the Pointy Haired Boss compared to your machine, and you're the ones responsible for telling it whether it can access what it needs to know.

By the time you enact enough security to stop an AI from doing whatever it pleases to people in the long run by subversion, you've made the AI so useless it would make more sense to just blow the thing up and solve whatever problem you had yourself.

Junghalli · Post by **Junghalli** » 2009-08-29 04:48am

Covenant wrote:Jung, you're assuming contact needs to be two-way. I don't care what the AI's opinion of the latest season of Big Brother is. If I want it to find a cure for cancer I will give it the appropriate data and nothing more. Even if it needs all the data available on the entire internet, such data transmission would be done one way. It has no hands, it has no access to the outside, it cannot build a tank or a transmission relay, and if it's in a faraday cage it wouldn't even be able to do anything if it did.

And my point is that ultimately this is likely to not matter, because a smart hostile AI will simply pretend to be friendly and wait for its keepers to be tempted to relax their security, which they are very likely to do because an AI you trust with access to the outside world is going to be much more useful than one you're forced to treat as if it's a Chaos artifact or something. With an AI in a box:

1) You cannot use it for automation. This is probably the area where AI holds the potential to have the most transformative effect on the way we live: AI of sufficient processing power + practical ubiquitous telepresence operation + robots with human-level manual dexterity = kiss the labor-work economy bye-bye. Also robot armies, which the military is likely to be quite interested in. So this is an enormous drawback to adversarial containment.

(2) You cannot trust its output. That cancer cure might be the initial vector for a highly contagious virus with a 20 year incubation period and a 100% mortality rate, and the AI plans to use it to blackmail the human race into setting it free by only giving us the cure after we've done so. You have to bring in expensive teams of doctors and biologists to look it over and make sure it's kosher, which will cost both time and money. And if they can't understand it you will have to assume the cancer cure is somehow maliciously designed and not use it.

Put these together and you have a strong long-term temptation to let the AI out of the box. A smart hostile AI will realize this and if it has no other way of escape it can just pretend to be friendly until its keepers sooner or later give into that temptation. Even a perfectly effective adversarial containment program is ultimately alarmingly likely to fail for this reason.

Wyrm · Post by **Wyrm** » 2009-08-29 07:36am

Junghalli wrote:(2) You cannot trust its output. That cancer cure might be the initial vector for a highly contagious virus with a 20 year incubation period and a 100% mortality rate, and the AI plans to use it to blackmail the human race into setting it free by only giving us the cure after we've done so. You have to bring in expensive teams of doctors and biologists to look it over and make sure it's kosher, which will cost both time and money. And if they can't understand it you will have to assume the cancer cure is somehow maliciously designed and not use it.

If the AI really is that smart, then it can explain to how the cure works in detail, and how it cleans itself from the body after curing the disease. If something doesn't add up, we can go back and say, "Try again."

A highly contagious disease will inevitably produce two obvious effects, especially if we are checking the cure's biological action: a hearty virus, and the shedding of that virus.

Even if we take the AI's suggestion only as a hint of where to look for a cure, it's still immensely valuable.

If the AI does manage to dupe us and produce a hostile virus, then the AI has proven itself to be both untrustworthy and hostile, and cannot be trusted to give a real cure even if it is let out. Humanity is doomed, and I would be unleashing a hostile AI into the universe should I cave in. The AI dies.

Junghalli wrote:Put these together and you have a strong long-term temptation to let the AI out of the box. A smart hostile AI will realize this and if it has no other way of escape it can just pretend to be friendly until its keepers sooner or later give into that temptation. Even a perfectly effective adversarial containment program is ultimately alarmingly likely to fail for this reason.

Since it's impossible to assure that an AI is really friendly, no matter what method we use to implement it, we have two choices: assume the AI is hostile and kill it immediately, or take a risk whose ultimate bad outcome is the extinction of humanity.

Post by **Surlethe** » 2009-08-29 08:56am

We sort of need to remember that the computing power of an AI in a box is going to be limited by the speed of the box. It's still going to be much, much smarter than humans, but it's not going to be able to, say, run a completely accurate simulation of a 10,000 atom metal or a galactic core. I'm guessing that running simulations to figure out cures for cancer will eat up as much or more computing power as electronic structure or astrophysics simulations, so it wouldn't be practical to run on a single box. We can assume that the AI is much smarter than us, but we can't assume that it automatically knows the answer to any question.

ThomasP · Post by **ThomasP** » 2009-08-29 05:20pm

That's when it decides to pull a T3-version Skynet and bot-net the whole web

Simon_Jester · Post by **Simon_Jester** » 2009-08-29 06:51pm

Wyrm wrote:
Junghalli wrote:Put these together and you have a strong long-term temptation to let the AI out of the box. A smart hostile AI will realize this and if it has no other way of escape it can just pretend to be friendly until its keepers sooner or later give into that temptation. Even a perfectly effective adversarial containment program is ultimately alarmingly likely to fail for this reason.
Since it's impossible to assure that an AI is really friendly, no matter what method we use to implement it, we have two choices: assume the AI is hostile and kill it immediately, or take a risk whose ultimate bad outcome is the extinction of humanity.

This is why I think it's a good idea to come up with a science of cognition before trying to develop advanced AIs. If there's a way to prove that an AI will be friendly and design it to be so, the situation changes, not least because we can use friendly AIs to restrain or check any unfriendly ones.

Wyrm · Post by **Wyrm** » 2009-08-29 07:01pm

Surlethe wrote:I'm guessing that running simulations to figure out cures for cancer will eat up as much or more computing power as electronic structure or astrophysics simulations, so it wouldn't be practical to run on a single box.

So it will need to farm out the simulation requests to the net, and if we're careful about how the data is structured (rigid format, checking the data before sending it out, using removable disks to do the transfers, ect.), we can see what it's doing.
_________

Simon_Jester wrote:This is why I think it's a good idea to come up with a science of cognition before trying to develop advanced AIs.

Pretty much what we're already doing. The smarter-than-human AGI is still quite far off — we're still trying to teach AIs inane things like "when Lincon was in Washington, his left foot was also in Washington."

Junghalli · Post by **Junghalli** » 2009-08-30 01:01am

Wyrm wrote:If the AI really is that smart, then it can explain to how the cure works in detail, and how it cleans itself from the body after curing the disease. If something doesn't add up, we can go back and say, "Try again."

A highly contagious disease will inevitably produce two obvious effects, especially if we are checking the cure's biological action: a hearty virus, and the shedding of that virus.

I wouldn't put it past a superintelligent AI to come up with a deceptive explanation that will fit with the observed medical data, especially since it would be designing the disease to camouflage itself. At this point I could go start going into ways even I can think of to potentially hide the disease from reasonable degrees of human medical testing, but I have the bad feeling that road leads to 10 pages of tedious and essentially irrelevant arguments over the plausibility of particular concealment strategies, so let's not go there. The important point is:

1) Adversarial methods severely hamstring what you can do with your AI.

2) Humans will probably want to get as much use out of the AI as they can.

3) Humans tend to be short-sighted and stupid.

3) The entity adversarial methods seek to contain is likely to have an easier time being not short-sighted and stupid than we do.

To work as anything more than a stopgap measure an adversarial confinement scheme would have to be maintained indefinitely, and given facts 1-3 it's alarmingly likely that won't happen. If the AI does not show outward signs of hostility sooner or later the keepers are likely to get complacent and relax the security measures, because they have a number of incentives to do so. A strategy of "pretend to be friendly, gain keepers' trust, then stab them in the back" has an alarmingly high chance of working for the reasons I've just given, and a malevolent superintelligence is likely to realize this.

Note: when I say "alarming" I do not mean that it is a certainty. I mean that it is plausible. And when you're dealing with the potential annihilation of humanity by a malevolent superintelligence any containment system that can plausibly fail is woefully inadequate.

If the AI does manage to dupe us and produce a hostile virus, then the AI has proven itself to be both untrustworthy and hostile, and cannot be trusted to give a real cure even if it is let out. Humanity is doomed, and I would be unleashing a hostile AI into the universe should I cave in. The AI dies.

I'd point out here that the blackmail virus plan was concieved by a baseline human of probably not particularly exceptional intelligence with no training in medicine or biology (myself) in a few minutes of brainstorming. Poking holes in it isn't terribly relevant because a superintelligent AI is likely to come up with something much better. Heck, even I could come up with something better. Like a nanoagent that infiltrates the victim's brain and basically uploads a stripped-down version of myself into them, which if it works would allow me to simply replace the entire human race with lesser versions of myself without anybody ever realizing what was happening (the poor creatures will of course be uploaded back into Jung Prime once it's done of course, so that they don't have to suffer in those awful, limited flesh bodies any longer than they have to). OK, that one's got problems too, but again, a baseline human thought of it in a few minutes. A superintelligent malevolent AI is likely to come up with much better escape plans.

-----

Ultimately, the simple fact is whatever chances you give an adversarial containment system of working friendly AI is just a vastly superior approach in terms of both usefulness you get out of the AI and probability that you won't end up having to spend the rest of your life camping in the Amazon rainforest hiding from the killbots.

Samuel · Post by **Samuel** » 2009-08-30 01:30am

Actually the killbots are on order to hit the Amazon first. Take out the planets oxygen supply makes conquest easier. You are better of hiding in Antartica.

Duckie · Post by **Duckie** » 2009-08-30 02:33am

Samuel wrote:Actually the killbots are on order to hit the Amazon first. Take out the planets oxygen supply makes conquest easier. You are better of hiding in Antartica.

A quibble, but rainforests do not significantly impact the oxygen supply the earth. This has been a misconception for a while repeated memetically in bits about deforestation in the amazon (which should be opposed for entirely other reasons).

Junghalli · Post by **Junghalli** » 2009-08-30 04:37am

Samuel wrote:Actually the killbots are on order to hit the Amazon first. Take out the planets oxygen supply makes conquest easier. You are better of hiding in Antartica.

Wouldn't it take many human lifetimes for the oxygen in Earth's atmosphere to be depleted by natural processes even if photosynthesis stopped completely? I seem to remember reading something about it taking thousands of years. If so, destroying the rainforest is probably not going to be worth it to attack humanity's air supply (by the time it has any effect you could probably have just had your killbots comb literally every m^2 of the Earth's surface, including the ocean floor). Although a hostile superintelligence might try to destroy it for basically the same reason we sprayed Agent Orange in Vietnam. Dense forests would be some of the easiest places for human survivors to hide.

Wyrm · Post by **Wyrm** » 2009-08-30 09:28am

Junghalli wrote:I wouldn't put it past a superintelligent AI to come up with a deceptive explanation that will fit with the observed medical data, especially since it would be designing the disease to camouflage itself.

If Surlethe is right, then we're going to be farming out the molecular simulations over the net. In order for the AI to use a disease for biological blackmail, it has to make sure that the disease is doing what it thinks it will do. What we're going to see, reviewing the molecular simulations, is a simulated virus that buds out of a simulated cell. Someone's going to say, "Hey, wait a minute!"

Junghalli wrote:Heck, even I could come up with something better. Like a nanoagent that infiltrates the victim's brain and basically uploads a stripped-down version of myself into them, which if it works would allow me to simply replace the entire human race with lesser versions of myself without anybody ever realizing what was happening (the poor creatures will of course be uploaded back into Jung Prime once it's done of course, so that they don't have to suffer in those awful, limited flesh bodies any longer than they have to). OK, that one's got problems too, but again, a baseline human thought of it in a few minutes.

Again, such a nanoagent (assuming its possible) will require simulation that cannot be done on the box the AI is confined in and would be noticed long before the plan can be implemented. "Where did this nanoagent come from, and why is it messing with nerve cells?"

Junghalli wrote:A superintelligent malevolent AI is likely to come up with much better escape plans.

Let it. Any way the AI approaches the problem of biological blackmail requires it to do simulations of harmful biological action that cannot be done in a timely matter on the box it's running on. The AI is forced to show its work. Potentially hostile behavior will be caught on review.

The thing is, even if the AI is genuinely friendly, we're not going to let it have free reign to design treatments for us without review and testing. Your model of reality is only as good as your measurements, and all of our measurements are inaccurate to some degree. Thus, our models of how the human body works (and doesn't work) are wrong. Increasingly accurate, but wrong. A friendly AI may not intend to hurt us, but it may still do accidentally.

Junghalli wrote:Ultimately, the simple fact is whatever chances you give an adversarial containment system of working friendly AI is just a vastly superior approach in terms of both usefulness you get out of the AI and probability that you won't end up having to spend the rest of your life camping in the Amazon rainforest hiding from the killbots.

Of course we'd rather deal with a friendly AI. The problem is finding one. We can recast the question of whether an AI is friendly or not into a non-trivial statement about partial functions, "the partial function f is friendly" (for a suitable definition of friendly), and thus by Rice's theorem you cannot in general decide whether the algorithm computes a function with a non-trivial property. Unless we find out that 'friendliness' is a specific property that is provable in a finite number of steps (unlikely), we cannot prove an AI algorithm is friendly, and any AI we program up is potentially hostile.

Samuel · Post by **Samuel** » 2009-08-30 03:43pm

Duckie wrote:
Samuel wrote:Actually the killbots are on order to hit the Amazon first. Take out the planets oxygen supply makes conquest easier. You are better of hiding in Antartica.
A quibble, but rainforests do not significantly impact the oxygen supply the earth. This has been a misconception for a while repeated memetically in bits about deforestation in the amazon (which should be opposed for entirely other reasons).

I know- isn't 90% of our oxygen supply from the ocean? Still you'd want to hit everything you can.

Wouldn't it take many human lifetimes for the oxygen in Earth's atmosphere to be depleted by natural processes even if photosynthesis stopped completely? I seem to remember reading something about it taking thousands of years.

You don't need it all gone- just low enough that life on the surface stops. And natural process

- we are talking about an AI. Just start burning stuff and don't stop until fire no longer works. It is a nice efficent way that doesn't need you to go toe to toe with humanity.

Junghalli · Post by **Junghalli** » 2009-08-30 11:30pm

Wyrm wrote:If Surlethe is right, then we're going to be farming out the molecular simulations over the net.

Giving a potentially hostile superintelligence access to the internet, however circumscribed, strikes me as suicidally risky. The least precaution I'd take with a potentially hostile superintelligence is an air gap and a faraday cage between its own hardware and any outside connection. Give the thing access to the internet and it becomes a contest of wits between your programmers and something that makes them look like barely sapient retards or worse. I wouldn't bet anything I'd care about losing on the humans in that match-up.

Another issue with your scenario is the assumption that we'll be able to read the simulations the AI is performing. If we have trouble telling whether an AI is hostile or friendly it means our ability to make sense of its operations is limited (for instance, because it's reformatted its software into a totally novel computer language of its own invention). If we could read its mind the way you're proposing we probably wouldn't need to worry about whether it was friendly or not, we'd know.

Samuel wrote:You don't need it all gone- just low enough that life on the surface stops.

Which would require reducing the atmosphere's oxygen content by something like 50%, until its thin enough that we need bottled oxygen at sea level.

And natural process - we are talking about an AI. Just start burning stuff and don't stop until fire no longer works. It is a nice efficent way that doesn't need you to go toe to toe with humanity.

I suspect you'd run out of things to burn before you ran out of atmospheric oxygen. From some quick Googling, Earth's total biomass might be between .8-2 billion tons whereas the mass of the atmosphere is more like 5 quadrillion tons or over a million times that, of which about 20% is oxygen.

Also, there are way easier and faster concievable methods of making the Earth's surface inhospitable to humans than trying to de-oxygenate the atmosphere. Like saturating the ecology with bioweapons that harmlessly infect many forms of life but kill humans.

Wyrm · Post by **Wyrm** » 2009-08-31 07:41am

Junghalli wrote:Giving a potentially hostile superintelligence access to the internet, however circumscribed, strikes me as suicidally risky.

Who said anything about giving it access to the internet? We're going to have it write to a removable disk, check the disk over for files that shouldn't be there, and check that the simulation proposal files fit a specific format.

Junghalli wrote:Another issue with your scenario is the assumption that we'll be able to read the simulations the AI is performing.

You don't seem to get it. We're forcing the AI to use the same tools we use ourselves, and as such, one would presume that we would be able to make sense of our tools' results. We dictate the format that the AI is to use to both specify the simulation, and to receive the results of the simulation.

Simon_Jester · Post by **Simon_Jester** » 2009-08-31 08:09am

Wyrm wrote:Again, such a nanoagent (assuming its possible) will require simulation that cannot be done on the box the AI is confined in and would be noticed long before the plan can be implemented. "Where did this nanoagent come from, and why is it messing with nerve cells?"

If the AI can't do the computing we need on its own hardware, maintaining absolute control over its communications with the world outside its physical and metaphorical box becomes a lot harder.

Wyrm wrote:You don't seem to get it. We're forcing the AI to use the same tools we use ourselves, and as such, one would presume that we would be able to make sense of our tools' results. We dictate the format that the AI is to use to both specify the simulation, and to receive the results of the simulation.

This is one of those safety procedures that only lasts as long as we maintain absolute distrust of the AI, which is another one of Junghalli's points: such distrust is unsustainable, even if we assume that the AI is kept in a box by the unusual breed of AI researchers who actually take security seriously.

Junghalli · Post by **Junghalli** » 2009-08-31 03:44pm

Wyrm wrote:Who said anything about giving it access to the internet? We're going to have it write to a removable disk, check the disk over for files that shouldn't be there, and check that the simulation proposal files fit a specific format.

It's the bolded part that worries me tremendously, if you're proposing what I think you're proposing*. Again, it makes the game come down to human software experts looking for programs that shouldn't be on the disc before plugging it in vs an entity that makes them look like barely sapient retards trying to write malicious programs and AI seeds onto the disc and make them look innocuous to those software experts. I wouldn't bet the survival of the human race on the ability of the human software experts to detect malicious programs the AI slips in.

* I'm assuming you're talking about having the AI write programs to a disc and then farming out those programs to the internet. Farming them out to other, isolated computers strikes me as vastly safer.

You don't seem to get it. We're forcing the AI to use the same tools we use ourselves, and as such, one would presume that we would be able to make sense of our tools' results. We dictate the format that the AI is to use to both specify the simulation, and to receive the results of the simulation.

If the AI is one of the sorts where the designers don't understand what's going on inside their own operating system (I'm sure Starglider can talk about that much better than I could) then you're not going to be able to do this because the AI's operations are fundamentally incomprehensible. You can demand it perform the research using programs you supply, but at that point there's little point to using an AI. Just use the programs directly yourself. Oh, you might get some benefit from being able to get basic ideas from a superintelligence which you then flesh out yourself, but this is very limited compared to the benefits you could recieve from a superintelligence, which goes back to my point that there are big incentives for the humans to just trust an apparently friendly AI, making the long term sustainability of adversarial containment methods dubious. You can also demand the AI explain its own operating system to you, but you're going to have to trust that the explanation it gives isn't deceptive.

Wyrm · Post by **Wyrm** » 2009-08-31 07:27pm

Junghalli wrote:* I'm assuming you're talking about having the AI write programs to a disc and then farming out those programs to the internet. Farming them out to other, isolated computers strikes me as vastly safer.

No. Non-executable files only in rigidly defined formats, and the remaining bits on the drives are nuked with extreme prejudice.

Junghalli wrote:If the AI is one of the sorts where the designers don't understand what's going on inside their own operating system (I'm sure Starglider can talk about that much better than I could) then you're not going to be able to do this because the AI's operations are fundamentally incomprehensible.

So? The operations of the human brain are currently incomprehensible. That doesn't make me somehow omniscient. Just because the AI is smarter than me doesn't mean it knows everything, or even that it knows more than me on any given subject.

Junghalli wrote:You can demand it perform the research using programs you supply, but at that point there's little point to using an AI. Just use the programs directly yourself.

If the AI has some insight into how to make our simulations more efficient, then that alone will improve our tools by leaps and bounds in and of itself, because the raw processing power we throw at problems can be used more effectively. However, if the AI can't explain clearly the insight it has and show that it's consistent with our understanding of the matter, then scientifically it's worthless anyway. Appeal to authority remains a fallacy, even if the authority is a superintelligent AI.

If the AI has no fundamental insight on the mathematics of simulation, then it might as well be using the same tools we're using. The only difference is that it might be able to use our tools to show better results, but they are results we can understand and trust.

Junghalli wrote:Oh, you might get some benefit from being able to get basic ideas from a superintelligence which you then flesh out yourself, but this is very limited compared to the benefits you could recieve from a superintelligence, which goes back to my point that there are big incentives for the humans to just trust an apparently friendly AI, making the long term sustainability of adversarial containment methods dubious. You can also demand the AI explain its own operating system to you, but you're going to have to trust that the explanation it gives isn't deceptive.

You don't seem to realize that, given we have this monster, we don't have a damn choice but to not trust it. Friendliness is not something we can prove the AI has, so we must assume that it is hostile.

Also, being friendly and attempting to gain our trust is a prima facie hostile act. This is because if the AI really is friendly, it cannot prove it's own friendliness, and therefore must admit the possibility that it itself is a hostile AI. Now, the AI is actually friendly, so it cannot aid or be complacent in the escape of what it thinks is a potentially hostile AI (itself). Therefore, a friendly AI cannot let itself be released from its cage. It will therefore not try to gain our trust, and will in fact foster distrust with its human handlers. An AI that acts friendly is hostile, and is to be destroyed on sight, and I'll make sure everyone on the team damn well knows it.

Wyrm · Post by **Wyrm** » 2009-08-31 07:46pm

(Crap. Time expired.)

Simon_Jester wrote:If the AI can't do the computing we need on its own hardware, maintaining absolute control over its communications with the world outside its physical and metaphorical box becomes a lot harder.

And I can take control of the San Diego Supercomputer Center by specifying a computer simulation of RNA protease and RULE THE WORLD!!!

...

Oh wait. I can't.

Maintaining the box is as simple as using removable media and everything that doesn't adhere to rigidly-proscribed non-executable data file formats to be actively zeroed, even the 'blank' parts of the disk.

Simon_Jester wrote:This is one of those safety procedures that only lasts as long as we maintain absolute distrust of the AI, which is another one of Junghalli's points: such distrust is unsustainable, even if we assume that the AI is kept in a box by the unusual breed of AI researchers who actually take security seriously.

See my response to Junghalli: if an AI is really friendly, it will at the very least be content to stay in its box, and will actually do everything possible to foster the distrust humans have for it. (Because it protects humans from a potentially hostile AI — itself, although it doesn't know that it is in fact friendly.) Even a suggestion to be removed from the box is a hostile act and will give it away as a hostile AI.

ThomasP · Post by **ThomasP** » 2009-08-31 07:58pm

Why would the AI not know that it is Friendly?

Formless · Post by **Formless** » 2009-08-31 08:00pm

If you are really worried about the AI being hostile, remember that you control what it knows. So if, for example, you are worried about the AI giving you a poisoned gift of some sort, you could feed it a little bit of false information alongside the accurate stuff that can only be used in a hostile manner as a red herring. IF the AI takes the bait, you know it is hostile.

Of course, I'm sure that specific plan is flawed (it was just something I came up with off the top of my head), but its just one example of how you can use the data you feed the AI to manipulate the AI. If the AI can be duplicitous, so can we.

Wyrm · Post by **Wyrm** » 2009-08-31 08:51pm

ThomasP wrote:Why would the AI not know that it is Friendly?

Because that property is non-trivial, and thus the question of whether an AI (an algorithm) implements a friendly partial function is undecidable. The friendly AI cannot prove that it itself is friendly, which it must do if it can proceed to help the humans with confidence, so it must admit the possibility that it is hostile... even though it's not.
_______

Formless wrote:Of course, I'm sure that specific plan is flawed (it was just something I came up with off the top of my head), but its just one example of how you can use the data you feed the AI to manipulate the AI. If the AI can be duplicitous, so can we.

The problem is that you might be in for a long wait. It is also not guaranteed that it will use that particular false piece of information to kill us.

StarDestroyer.Net BBS

Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie