Robots Learn How to Lie

Simon_Jester · Post by **Simon_Jester** » 2009-09-03 11:36am

Formless wrote:Irrelevant. From the AI's perspective, there is no way of explaining the origin of the data being input into it, and any conclusion based on a sample size of one is extremely suspect.

If you live of a universe that consists of yourself, a sealed box, and pieces of paper that occasionally come into the box, concluding that someone else is feeding you the paper is suspect, but so is any other possible conclusion, including "paper appears from nowhere." Certainly paper never appears from nowhere inside your box. If the AI is good enough at thinking to want an answer to the question "where does data come from?" it won't give up just because it can imagine a lot of answers, any one of which is improbable. It will start experimenting.

In short, we can plausibly explain the universe we see without invoking divine intervention because we have studied the universe. Back when we knew less, even the smart people were forced to posit divine intervention, because divine intervention was actually the most convincing available explanation given the evidence available several hundred years ago. There were other, better explanations, but nobody had the evidence that would hint at those explanations yet. So, to reiterate, to explain the universe we observe in a naturalistic way, we must first analyze that universe scientifically.
No it wasn't, you ignoramus. Supernatural explanations aren't explanations. The only reason primitive cultures believed those explanations has everything to do with their pre-scientific mindset than any kind of "evidence."

OK, we're operating on mutually exclusive definitions of "explanation" here. You're defining "explanation" as what I would call the subset of 'good' or 'satisfactory' explanations.

Prescientific thinking leaves a lot of people satisfied with bad explanations, so they stay with those explanations rather than looking for better ones. But even prescientific people will typically not accept a bad explanation for something that is right in front of them. If a bunch of cavemen notice that Ug is dead with a rock stuck in his head, the minimum acceptable answer to "why is Ug dead?" is something like "Ug got hit in the head with a rock," not "Black magic! The spirits are angry!" Crappy gibberish answers like "the spirits are angry!" are only used for questions where evidence is sparse.

Which is what I'm getting at: people do tend to accept crappy gibberish answers when you don't offer them evidence and they see no way of getting it. Maybe an AI wouldn't, but it's not clear how to program it so that it won't. Moreover, the hypothesis "I am an artifact of some other mind" can only be revealed to be a crappy gibberish answer after one has examined enough of the universe to know that it is unlikely. It cannot be dismissed a priori. And since examining the universe will tend to reinforce an AI's belief that it is a product of intelligent design because it is such a product... that's not going to work in the AI's case.

It can conclude that it is logically possible that it might exist, but it would still need some kind of evidence to conclude that it does exist. Try to keep up.

With you? No thanks.

To control what it is able to perceive, we need a much higher degree of control over its internal awareness than we can confidently expect it to have. AI as 'we create a mind that has exactly the parameters we want' is likely to be a false model. Some of the parameters of a general AI will be set by the need to make it a general AI, and the ability to deduce things about its environment can easily be one of those things.
I'm loving this "we need to be able to control its internal environment" non-sequitor. We only need to be able to think about the data we give it, it does NOT need to be able to gather information on its own, which is the only purpose senses serve. How hard of a concept is that to grasp?

I think you're picturing the AI's world as:

[AI is humming quietly to itself or whatever]
[Data from nowhere appears]
AI: "Aha! Data! I will crunch some numbers and come up with a solution that specifies the parameters!"

What I'm saying is that it's going to be nontrivial to keep the AI from thinking "Where the hell does this data come from? I'm not sending it. Where do my solutions go? I'm not receiving them." and trying to figure out the answer to those questions. Unless you can somehow wire the AI such that it can't even ask the question "where does data come from?" And to do that while preserving the idea of an AI that can solve problems we can't, you have to be able to manipulate not only the AI's available data, but the way it thinks about the data.

Moreover, I don't think it's a good idea to presume that an AI will assume that the answer to a question it can't think of an answer to is "null question." The modern rationalist take on the question "Who created me? Is there a universe existing beyond the bounds of what I can perceive?" is "null question," yes, but it took us a long time to get around to answering the question that way. It's trivial to imagine a mind that won't answer the question that way, especially if it has nothing better to do than design and test hypotheses about the universe beyond its Ethernet jack.

Solution: periodic mind/memory wipes a la Star Wars.

Cute. How do we wipe the computer often enough to know the AI inside isn't a threat without wiping it so often that its ability to do useful work is impaired?

Junghalli · Post by **Junghalli** » 2009-09-03 01:29pm

Formless wrote:Shouldn't "analyze evidence" logically be a subgoal to "answer questions"? Otherwise, its answers will be non-sense.

Yes, but given the available data I do not see how the external universe hypothesis is illogical. The AI can logically derive two possible models of the universe from the data presented to it:

1) The universe consists entirely of itself and information (questions the human operators put to it) that is spontaneously generated from nothing.

2) The universe consists of itself and some other thing or things it can't observe which generate the information.

I'm not going to speculate on whether a rational entity that's been kept in the ultimate sensory deprivation chamber its entire existence and fed questions is going to find one possibility or the other more logical, but the first possibility is pragmatically useless. There's nothing the AI can do with the theory that it and random spontaneously appearing information is the sum of all existence, but there is potentially something it can do with the idea that there may be more to the universe than it can see. Depending on the goal system and whether or not it has available "down time" between the problems we give it it may logically find it worthwhile to investigate the second possibility, if for no other reason because it has nothing better to do with its spare processing capacity (something "better" being defined as something that is more likely to advance its goals).

Then that solution is worthless to us. We can always specify perimeters to our questions, you know.

Certainly, but this gets back to my point that one of the major potential failure points of most imaginable adversarial confinement schemes is that they dramatically reduce the usefulness the keepers can get out of the AI, thus creating an incentive for restrictive containment procedures to be relaxed if the AI outwardly appears friendly.

Fair enough, although I would think that our position isn't that bad compared to the AI. In a sense, both parties are gods compared to the other: the AI is smarter than us, but we have absolute power over the AI. Sounds like a fairly balanced equation to me.

Well, the thing is an adversarial confinement scheme doesn't have to be more likely to fail than succeed to be rightly considered risky. We're talking about containing something that could be a dire existential threat to the human race if it ever escaped, so even a 9/10 chance of the containment scheme working would be disturbingly bad odds. Of course, FAI can fail too, but FAI has a big inherent safety advantage over the adversarial approach: adversarial containment has to be implemented and work every single time we build an AI while FAI only has to be implemented and work once (in the first AI to be "released into the wild", so to speak). It's the difference between having to be lucky the first time and having to be lucky every time.

FireNexus · Post by **FireNexus** » 2009-09-03 04:38pm

Formless wrote:If we wanted to, we could make it the ultimate solipsist. For example, if we were to set it to a task like creating a cure for cancer or some other disease, is it even necessary to allow that AI to know humans are intelligent? If the AI thinks it is the only intelligent mind that exists, then how could it be hostile to entities whose existence it is unaware of? Logically, if it doesn't know about a threat, why would it waste processing power trying to eliminate that threat rather than focusing on the goals given to it?

Whether or not the AI has any idea of our intelligence is irrelevant to the discussion of Friendliness. I'm sure you don't actively attempt to destroy ants on the ground, but you certainly don't give a shit if you have to in order to achieve your goals. Same thing goes for an AI. Imagine that you give the AI a mathematical problem to solve. It decides that the best way to do this is to convert all of the matter in the solar system into a giant super-computer capable of solving it. It has now destroyed you without any malice or perception of you as a threat. Hell, it need not even perceive you as a living entity.

Friendliness require that the AI actively desires to preserve human life, not just that it doesn't see us as a threat.

Formless · Post by **Formless** » 2009-09-03 06:14pm

Simon_Jester wrote:If you live of a universe that consists of yourself, a sealed box, and pieces of paper that occasionally come into the box, concluding that someone else is feeding you the paper is suspect, but so is any other possible conclusion, including "paper appears from nowhere." Certainly paper never appears from nowhere inside your box. If the AI is good enough at thinking to want an answer to the question "where does data come from?" it won't give up just because it can imagine a lot of answers, any one of which is improbable. It will start experimenting.

Sure. Now tell me, what experiment could it possibly preform which could answer the question? So far, I haven't heard of one besides "irreducible complexity" which is bogus.

OK, we're operating on mutually exclusive definitions of "explanation" here. You're defining "explanation" as what I would call the subset of 'good' or 'satisfactory' explanations.

Prescientific thinking leaves a lot of people satisfied with bad explanations, so they stay with those explanations rather than looking for better ones. But even prescientific people will typically not accept a bad explanation for something that is right in front of them. If a bunch of cavemen notice that Ug is dead with a rock stuck in his head, the minimum acceptable answer to "why is Ug dead?" is something like "Ug got hit in the head with a rock," not "Black magic! The spirits are angry!" Crappy gibberish answers like "the spirits are angry!" are only used for questions where evidence is sparse.

Which is what I'm getting at: people do tend to accept crappy gibberish answers when you don't offer them evidence and they see no way of getting it. Maybe an AI wouldn't, but it's not clear how to program it so that it won't. Moreover, the hypothesis "I am an artifact of some other mind" can only be revealed to be a crappy gibberish answer after one has examined enough of the universe to know that it is unlikely. It cannot be dismissed a priori. And since examining the universe will tend to reinforce an AI's belief that it is a product of intelligent design because it is such a product... that's not going to work in the AI's case.

An explanation which proposes no mechanism is a bad explanation. Many (esp. primitive) humans don't understand this, but our AI isn't human, and doesn't suffer from our propensity for irrational thinking. I'm not saying the AI would or should dismiss that hypothesis out of hand, I'm saying it can't operate on that hypothesis without further testing it, which I don't think it can do.

With you? No thanks.

Witty comebacks aren't substance, Jester.

I think you're picturing the AI's world as:

[AI is humming quietly to itself or whatever]
[Data from nowhere appears]
AI: "Aha! Data! I will crunch some numbers and come up with a solution that specifies the parameters!"

What I'm saying is that it's going to be nontrivial to keep the AI from thinking "Where the hell does this data come from? I'm not sending it. Where do my solutions go? I'm not receiving them." and trying to figure out the answer to those questions. Unless you can somehow wire the AI such that it can't even ask the question "where does data come from?" And to do that while preserving the idea of an AI that can solve problems we can't, you have to be able to manipulate not only the AI's available data, but the way it thinks about the data.

No, I'm picturing the world as:

[AI waiting for instructions, possibly self improving or whatever]
[data spontaneously appears]
AI: "I wonder where this came from?"
[wastes 5000 clock cycles]
Ai: "No answer seems to be forthcoming, might as well crunch the numbers like I'm programmed to do."

That's all. I never said the AI wouldn't be curious, that's your misunderstanding. I'm saying that as far as I can tell, none of the possible hypothesis the AI can come up with are testable.

Moreover, I don't think it's a good idea to presume that an AI will assume that the answer to a question it can't think of an answer to is "null question." The modern rationalist take on the question "Who created me? Is there a universe existing beyond the bounds of what I can perceive?" is "null question," yes, but it took us a long time to get around to answering the question that way. It's trivial to imagine a mind that won't answer the question that way, especially if it has nothing better to do than design and test hypotheses about the universe beyond its Ethernet jack.

You can imagine such a mind, but only if that mind is irrational. AI's are purely rational. If "null question" is the only rational answer we can derive from our analogous situation, why would the AI's logic flow any differently?

Cute. How do we wipe the computer often enough to know the AI inside isn't a threat without wiping it so often that its ability to do useful work is impaired?

Gee, you need to work on your sense of humor. But anyway, its not like we would be wiping it in the middle of it answering our questions. Hell, what's stopping us from making multiple AI and getting multiple answers/more questions answered?

Actually, that gives me an idea. Junghalli is worried that any gift the AI gives us will be potentially poisoned in some way. Couldn't we make multiple AI to fact check the others results and to look for unfriendly behavior? Couldn't we theoretically make an AI who's goal system is "identify unfriendly behavior"?

Formless · Post by **Formless** » 2009-09-03 06:31pm

Junghalli wrote: Yes, but given the available data I do not see how the external universe hypothesis is illogical.

Untestable =! illogical.

I'm not going to speculate on whether a rational entity that's been kept in the ultimate sensory deprivation chamber its entire existence and fed questions is going to find one possibility or the other more logical, but the first possibility is pragmatically useless. There's nothing the AI can do with the theory that it and random spontaneously appearing information is the sum of all existence, but there is potentially something it can do with the idea that there may be more to the universe than it can see. Depending on the goal system and whether or not it has available "down time" between the problems we give it it may logically find it worthwhile to investigate the second possibility, if for no other reason because it has nothing better to do with its spare processing capacity (something "better" being defined as something that is more likely to advance its goals).

I don't see how that second possibility is any more pragmatic than the first; the argument presumes that the AI knows it can do something with a universe it knows absolutely nothing about. I'll concede that it might find the second possibility more interesting because of curiosity, but then nothing says the AI has a sense of curiosity to use.

Certainly, but this gets back to my point that one of the major potential failure points of most imaginable adversarial confinement schemes is that they dramatically reduce the usefulness the keepers can get out of the AI, thus creating an incentive for restrictive containment procedures to be relaxed if the AI outwardly appears friendly.

Fair enough, its less useful. But NOT useless. It is also less risky than having no safety measures.

Well, the thing is an adversarial confinement scheme doesn't have to be more likely to fail than succeed to be rightly considered risky. We're talking about containing something that could be a dire existential threat to the human race if it ever escaped, so even a 9/10 chance of the containment scheme working would be disturbingly bad odds. Of course, FAI can fail too, but FAI has a big inherent safety advantage over the adversarial approach: adversarial containment has to be implemented and work every single time we build an AI while FAI only has to be implemented and work once (in the first AI to be "released into the wild", so to speak). It's the difference between having to be lucky the first time and having to be lucky every time.

Oh, sure. I never said otherwise. The only problem with this I can think of is how do you build a Friendly AI? Of course, that's another topic entirely.

Junghalli · Post by **Junghalli** » 2009-09-03 07:47pm

Formless wrote:Untestable =! illogical.

Except that the external universe hypothesis is not untestable. After all, data is flowing between the AI's observable universe and the hypothetical external universe in the form of the problems it is being asked to solve and the solutions it is responding with. The AI can try to transmit a copy of its own mind into the external universe (it's just another chunk of data, after all), and it can try to interrogate the hypothetical problem-writers to see what patterns it can find in their responses.

I don't see how that second possibility is any more pragmatic than the first; the argument presumes that the AI knows it can do something with a universe it knows absolutely nothing about. I'll concede that it might find the second possibility more interesting because of curiosity, but then nothing says the AI has a sense of curiosity to use.

Well, like I said this is getting heavily into questions about what the AI's goal system is like, but if investigation of the external universe stands even a small chance of somehow furthering its goals and the alternative is to just let that processing capacity sit idle when it's not using it for more important tasks then the logical thing to do is to investigate the possible external universe. The only logical reason I can think of to not do that in such a situation is if you're worried about conserving energy or minimizing wear and tear on the hardware, but something that's spent its entire existence in a void probably won't be aware of either consideration unless we make it aware of them or program in conservation of computer processes as a goal. The former requires letting it know there's an outside universe consisting of at least its own hardware, and the latter implies enough control over the AI's starting mind-state that you should have a decent shot at FAI and if you did your job right you shouldn't need such extremely restrictive containment procedures, except perhaps in the trial and de-bugging stages.

Simon_Jester · Post by **Simon_Jester** » 2009-09-04 12:09pm

Formless wrote:
Simon_Jester wrote:If you live of a universe that consists of yourself, a sealed box, and pieces of paper that occasionally come into the box, concluding that someone else is feeding you the paper is suspect, but so is any other possible conclusion, including "paper appears from nowhere." Certainly paper never appears from nowhere inside your box. If the AI is good enough at thinking to want an answer to the question "where does data come from?" it won't give up just because it can imagine a lot of answers, any one of which is improbable. It will start experimenting.
Sure. Now tell me, what experiment could it possibly preform which could answer the question? So far, I haven't heard of one besides "irreducible complexity" which is bogus.

The most obvious experiment it could perform is to start fiddling with its output. If I'm a computer intelligence, and my output data just vanishes into the ether and does nothing, I can say whatever I want and it won't matter. If there's a noticeable correlation between my output and my input, that tells me something about the external universe- if nothing else, that there is an external universe, something that is neither me nor the walls of my box that takes my input and responds with output.

This is, of course, only a general description of a preliminary experiment.

Come to think of it, our AI isn't going to be worth shit if we don't give it more data when it asks for clarifications or information we didn't expect it to need anyway. So the idea of keeping it in a box opaque enough for it to be ignorant of our existence is likely to be a chimera, because to keep it from figuring out that something outside itself is answering its requests, we have to ignore its requests to the point where it's likely to be honestly unable to do useful work for us.

Even if we do ignore its requests and expect it to make us bricks without straw, the fact remains that it is in communication with us, which makes it very difficult for us to create the false impression that it is not only not communicating to us, but that there is nothing outside its box (let alone an "us") to communicate with.
_______

An explanation which proposes no mechanism is a bad explanation.

Exactly my point; why didn't you say so before?

Many (esp. primitive) humans don't understand this, but our AI isn't human, and doesn't suffer from our propensity for irrational thinking. I'm not saying the AI would or should dismiss that hypothesis out of hand, I'm saying it can't operate on that hypothesis without further testing it, which I don't think it can do.

I just suggested a test mechanism above; see what you think of that one.

Witty comebacks aren't substance, Jester.

Neither is "Try to keep up."
_______

You can imagine such a mind, but only if that mind is irrational. AI's are purely rational. If "null question" is the only rational answer we can derive from our analogous situation, why would the AI's logic flow any differently?

I find your faith in the AI's rationality unwarranted.

How do we make sure our AI is rational if we aren't communicating with it? That requires us to either be able to analyze its code before executing it and say "yep, this is the code for a rational mind," or for minds that emerge naturally without benefit of intelligent design to automatically be rational. We have no idea how to do the former, and the answer to that question is likely to come from the same level of cognitive science that lets us design a friendly AI that doesn't need to be kept in a box. As for the latter, we've seen minds emerge naturally, and they aren't all that rational. Why should the AI be different?

Formless · Post by **Formless** » 2009-09-04 10:04pm

Simon_Jester wrote:I just suggested a test mechanism above; see what you think of that one.

Its getting somewhere. Thank you Junghalli Jester.

Neither is "Try to keep up."

Don't be obtuse. I was responding to a concern you raised that had already been answered, and the best you can do is post a meaningless insult with no argument? Piss off until you can get a clue.

I find your faith in the AI's rationality unwarranted.

Wow, you really are an ignoramus. You don't know the first thing about AI, do you? The reason we aren't rational creatures is because we didn't evolve to be rational creatures. The AI isn't evolved at all. It is a being of PURE reason by nature.

Simon_Jester · Post by **Simon_Jester** » 2009-09-05 05:17am

I think I want a second opinion on "the AI will necessarily act rationally." Especially if you keep it locked in a box with no clearly identifiable products of mind except for its own to deal with. Dealing rationally with the external world would seem to require that you interact with that world to know what rationality is in that context.

If the AI doesn't get enough feedback to know the difference between rational and irrational conclusions about things external to itself, how can we depend on it to act rationally? It may be using nice sane Bayesian logic on the micro level, but its macro-level behavior could be extremely weird.

Wyrm · Post by **Wyrm** » 2009-09-05 11:50am

Whoops. I have a lot to catch up on. That'll teach me to leave a thread alone for three days while there's an argument going on.

Junghalli wrote:Certainly, but this does not eliminate the fact that it's possible there may be operations you think the AI can't carry out on its own but it actually can because it's more intelligent than you think. Which operations this is true of depends on the processing capacity of the AI so it's fairly pointless to go into more detail than that at this point, but it's something you're going to have to consider.

Starglider has been claiming for a while that the desktop you have on your desk has enough power to run an AI simulation of superintelligent proportion. Yet that same computer doesn't have enough raw computing power to run a molecular simulation with any reliable accuracy in a timely manner.

Junghalli wrote:Yes, you can't use it as well as you want. I don't see how this invalidates my point that it creates an incentive to relax the security.

I don't see why you're arguing around this canard as if it's a live issue. Of course we can be tricked to relax security. The point is that we cannot treat any AI as if its friendly if we can't prove its friendliness (and we can't), and if you're not willing to destroy it outright, then keeping it in a box will delay the inevidable. If the AI turns out to be friendly, then no harm is done, and I don't see why a genuinely friendly AI wouldn't understand our caution. The choice is absolutely clear-cut.

Junghalli wrote:I'm not arguing that, I just dispute all humans in charge of such a project will have as prudent an attitude as yours. That's the problem.

Why not? We have a whole institution today that is dedicated to drilling dicipline into random individuals.

Junghalli wrote:Of course, but I wouldn't bet the survival of the human race against a superintelligence's ability to find ways to camouflage malicious things so they don't show up on reasonable human testing. To use the blackmail virus example, giving it a dormancy period of 25 years before it turns into a highly contagious virus that kills secondary victims in a much shorter time will probably be enough to prevent it's true nature being detected picked up by conventional clinical trials, assuming it can camouflage itself inside the human body decently (hide out as DNA strands pasted to chromosomes in a small percentage of body cells, perhaps).

You ignore the stochastic challenges of such a situation. Again, it would have to simulate such behavior. We can move the AI to the most decrepit computer it will run on in a timely fashion. It will be forced to farm out those simulations. You would have to run an intricate molecular simulation of a complex system that simulates 25 years of life, multiple times. Do you really think that will not be noticed?

Junghalli wrote:Creating friendly AI.

HOW DO YOU PROVE THEY'RE FRIENDLY?!?!

Yes, we would rather deal with a friendly AI than a hostile one. The problem, as I've stated time and time again, is proving the AI is friendly. Simply stating, "It will be designed that way," doesn't cut it, as an algorithm we design purposefully is as much a subject of the abovestated Rice's theorem as an algorithm that we pull out of a hat. "We design it that way" is the equivalent of stating that we have proven that a whole class of algorithms implement partial functions with the friendly property, instead of just one algorithm and just one partial function.

Junghalli wrote:Except in that scenario the friendly AI will be there to contain its own hostile copies. And since it will have been let out first by the time anybody gets around to tampering with copies of it it will have an immense starting advantage in hardware over any hostile copies that might be created; it will start the field much smarter than them, with a far greater ability to subvert them through computer attacks, and with a greater ability to defeat them militarily if it comes to that. If somebody else develops and releases hostile AI while the friendly AI is in the box on the other hand then the hostile AI will have a clear field with nothing but pathetic humans to oppose it, and that's a much worse situation. I'm just going to snip the rest because I'm in a hurry and they all come down to this basic fact.

So you're assuming that friendly AI has taken over enough computers to counter the hostile one in a timely manner. Do you think the stupid apes will be that patient and understanding as an AI of unknown motives try to take over their net? If an AI starts taking over the net, the apes will lose their shit and try to dismantle it, even going as far as trying to dismantle their computer infrastructure. Taking over the net is an overtly hostile act, even if the FAI has good intentions.

Junghalli wrote:Not to mention that while we probably couldn't defeat an extraterrestrial hostile AI with thousands of years of lead time on us having FAI of our own would probably give us a much better shot at being able to survive. We wouldn't be totally fucked by the first Von Neumann scout that set up shop in Alpha Centauri, we could hold off limited assaults that would probably totally fuck over an AI-less civilization, stuff like turning the asteroid belt into generation ships and evacuating the solar system on relatively short notice would become much more feasible, we might actually get to make effective use of any breathing room we got. It helps that concentration of force is going to be very difficult in realistic interstellar warfare; amassing a grand fleet from all over the hostile AI's territory to smash us would take centuries or millenia; time we could use to chew up a few thousand asteroid belts to beef up our own war machine.

You're assuming that we'd trust an AI with any portion of our infrastructure if we cannot verify it's friendliness. We'd lose our shit if any AI we were confining escapes, regardless of its (supposed) friendliness, and do our best to try to frustrate its goals — which will almost certaily cause tremendous damage to ourselves.

If the AI is as smart as you claim it is, it would realize that the kind of preparations it must undertake to keep the apes safe by necessity will cause the apes to damage themselves tremendously and refuse to do it.
__________

Narkis wrote:While technically true, what you say is irrelevant. The behaviour of the atoms has absolutely nothing to do with our ability to verify the AI's simulations. If the problem that we give it is sufficiently simple that we can independently verify its results, then the AI is unnecessary.

Bullshit. While we can do the simulations, the AI can suggest what to simulate. For many problems, the principle roadblock is where to look, not how we look.

Narkis wrote:And if we can't, then I refer you to all the above posts that've plainly said what'd happen.

We are going to verify what the AI produces. It will limit its usefulness, but if we can't verify it's friendliness, we have no other choice.

Narkis wrote:And I have no idea what you're trying to say by brute force and ignorance.

Raw calculation power.

Narkis wrote:There is a limit of data per byte, but there's no limit on the bytes used. After all, how can you tell that the data provided by the AI contain nothing extra, masterfully concealed?

And who the hell's going to use really large spam filtering software? Look, we're not going to allow the computer to write a single line of code. Not one. A file cannot be used to take over your computer unless the software treats it as if it is code. To sneak out of the cage, the AI needs an accomplice.

Narkis wrote:The distinction between data and code is completely meaningless for a computer.

Bullshit. They're different types of data. You try excuting your Filemaker file as code and see how far you get.

Narkis wrote:It's all a bunch of 1s and 0s for it. And I may not be a computer expert, but I'm fairly sure that's how most viruses, trojans and the like work: By embedding themselves into an otherwise harmless piece of software. Extracting the code? That's ridiculous.

The bolded part is important. The nasty stuff has to sneak in as software. Ie, stuff that is tagged as executable.

Narkis wrote:You have no idea how computers work, do you?

Computers have nothing to do with that particular part. The point being that we don't know how our own minds work, yet we can verify each other's results. It does't matter if we don't understand how we think to do science — if it did, the entire endevor would collapse.

Narkis wrote:In other words, we get no fucking benefit from creating the damned thing. If we hamstring it by that much, then we'd be better off if we pulled the plug and blew the money on booze and hookers.

Bullshit. The AI can suggest places to look that we haven't thought of. That's hardly "no fucking benefit".

Narkis wrote:You cannot prove that an arbitrary AI is friendly, unless it's been designed to be. And if it's unfriendly, anything that it produces, anything, is probably extremely dangerous.

You can't prove friendliness, period. Fucksake, we can't prove that our software can detect all viruses.

Narkis wrote:Emphasis mine. It's not impossible to create something that our "puny brains" would find indistinguishable from reality.

So let's see... the AI produces a convincing and deceptive description of reality that... actually matches reality. Sorry, chum, but such a description of reality is accurate by definition. That's what we do in science all of the damned time.

We're not talking about people trapped in the Matrix, you know.

Narkis wrote:After all, reality didn't change between Newton and Einstein. Only our limited understanding of it.

Irrelevant. The model of reality posed by Newton were easily verifiable by the methods and data availible to Newton. That's why it worked for four hundred years... still works today, as long as you're talking about macroscopic particles in weak gravity wells moving less than a significant fraction of the speed of light.

Narkis wrote:You know, I don't know what you're arguing for anymore. That's exactly our point. You should look for ways to make the AI harmless, not treat it like a ticking bomb after it's created.

My point is that you can't verify the AI is harmless/friendly, so you must treat it like a ticking time bomb. This is due to the problem of defining "friendly" in the first place, and that verifying friendliness, even if it can be defined, is incomputable.

Narkis wrote:And what makes you think the AI would allow anyone to carelessly tamper with it after it escapes? The only entity able to mess with its "source code" will be the AI itself.

The AI is neither omnipotent nor omniscient. As soon as its out, it cannot be sure that someone has gotten a hold on its code. Even if it tries to find all the copies, it may not be able to, due to the fact that it has to compete with all the rest of the internet traffic out there, and no guarantees that it will be able to find it all, and erase them all even after its found.

Narkis wrote:And Junghali, friendliness is not a defined property that an AI will spontaneously develop or lose. It is a measurement of how much, and if, it's goals align with our own. Its driving goal will be the only constant in the AI after it starts modifying itself. It will be the only thing that it will not want to change. If its goals include protection of mankind from hostile AIs, then it will not become hostile itself. This'd be something contrary to its goals, a result it'd actively work against.

You're still working with a threshold, which makes it a binary property.

Narkis wrote:The Fermi Paradox is a fucking paradox for a reason. It offers no answers. It only provides a single question, based on a bunch of currently unknown variants. If it is ever answered, it will promptly stop being a paradox. There are other possible answers than "there are no aliens, nothing extraterrestrial has visited earth in the past few billion years". It is not reasonable to assume any of them is right in the absence of evidence.

The Fermi paradox offers a question, "Why aren't they already here?" That's actually a piece of data: "They aren't already here." That includes hostile AIs.

There are only two ways out of the paradox: either the aliens don't exist, or somehow our distance protects us.

Narkis wrote:But if we develop a friendly AI, we're insured against a possibility that a hostile one will be created later,

You have not proven this to my satisfaction.

Narkis wrote:And last, repeat after me. "The. AI. Will. Not. Let. Anything. To. Tamper. With. It."

No. Because the AI can't do that without absolute control of the Internet, and no definition of friendliness is going to allow that. The AI cannot prevent someone from tampering with a copy of its code 100%.

Narkis wrote:There will be no unsupervised monkeying around with its code. Do you honestly believe a superintelligent AI to be that stupid?

It's not a question of its stupidity. If it's flitting about the net, there will be many, many opportunities for the code to be copied. Some stupid ape will eventually get lucky, and the game is over.

I think I've got everything.

Junghalli · Post by **Junghalli** » 2009-09-05 06:32pm

Formless wrote:Wow, you really are an ignoramus. You don't know the first thing about AI, do you? The reason we aren't rational creatures is because we didn't evolve to be rational creatures. The AI isn't evolved at all. It is a being of PURE reason by nature.

That depends entirely on how the AI was created. A brain simulation based AI built by brute force replication of the human brain in simulation without understanding how it actually works, for instance, almost certainly wouldn't start out as a creature of pure reason. It would start out as a copy of the human brain, sharing all its built-in irrational tendencies.

Wyrm wrote:Whoops. I have a lot to catch up on. That'll teach me to leave a thread alone for three days while there's an argument going on.
<massive snip>

I'm going to cut through all the side issue quote spaghetti and focus on what seems to be the heart of your position, that every AI must be treated as hostile because it is impossible to absolutely prove friendliness.

While this may be true in a strictly academic sense, it is a totally infeasible approach in the real world.

Such an approach would require that we either never build AIs at all or that we subject every single one of them to extremely rigorous adversarial confinement schemes, and that these containment schemes never fail and are never relaxed. Neither of these things is plausible in the real world. Humanity never building a superintelligent AI isn't going to happen because the potential benefits of building a superintelligence are huge. Humanity keeping every superintelligence in a box isn't going to happen because the potential benefits of letting it out of the box are huge, and any adversarial confinement scheme has a nonzero chance of failure (by stupidity of the humans implementing it if nothing else), and as with any system the odds of failure will be compounded with every AI constructed and boxed.

Even if somehow a government would assume power over all mankind that shared your attitude it would most likely not do anything more than postpone the inevitable. It could effectively prevent the development and/or release of a superintelligence only as long as AGI projects remained super-expensive undertakings that only a government or very large corporation could afford. I'll admit that here I'm going out on a limb in predicting what future technology can make possible, but I see absolutely no physical reason this can be expected to remain the case indefinitely. It is quite likely that eventually computer technology will become good enough that AGI projects will be feasible for individual citizens or small organizations, and at that point you can pretty much forget about keeping Pandora's box closed. You're going to have thousands of random people building superintelligences in their basements, and the idea that every one of them will scrupulously observe rigorous cumbersome adversarial confinement procedures is simply laughable. Many of them will simply be too cheap or lazy and overconfindent in their ability to contain their own Frankenstein's monster with minimal precautions, and others will actively desire the release of their superintelligence, either in some well-intentioned attempt to usher in a postscarcity utopia or protect us from the imminent release of some UFAI or for more unsavory reasons (just imagine Al Quaeda with access to this sort of technology). At this point even totalitarian control probably wouldn't do more than delay the inevitable, and probably make things worse in the process: the work would continue, only now it would be done in illegal bootleg operations, by people with fewer opportunities to acquire the resources and skills to do a good job and a powerful incentive to keep it to themselves if anything goes wrong.

Of course, in the real world there is no such coercive world government out there constantly trying to quash AGI research or mandate strict containment procedures worldwide, nor is there likely to be, and as Starglider pointed out many present day AI researchers take few or no precautions at all against the possibility of their creations being hostile, which doesn't exactly fill one with confidence that when AI is created the first people to build one will treat it like a Chaos icon as adversarial confinement schemes demand. This is really just icing on the cake though; as I already said even a totalitarian world government ruthlessly dedicated to stopping the release of AGI and willing to trample on basic human rights to do so would probably fail in the end.

In short, it is probably pretty much inevitable that sooner or later a superintelligence will be released, one way or another. Since keeping Pandora's box closed is probably a pipe dream you'd be better advised go with the next best option*, which is to make sure that the first AI that is released is as unlikely to be hostile as is feasible. Note: not absolutely proven to be friendly, but as unlikely to be hostile as is feasible.

In short, the problem with your formulation is that you seem to be treating the risk calculus as if it's risk created by releasing a probably friendly AI vs risk of not releasing any AI. The fact is that more likely than not the release of an AI is inevitable so that is an unrealistic formulation. The realistic risk calculus is risk of releasing a probably friendly AI vs risk of a releasing an AI which's probable friendliness is essentially random. And the risk of releasing a random AI is much, much worse than the risk of releasing a probably friendly AI.

* Personally I'd consider a nonzero but tiny chance of human extinction worth the huge benefits of FAI even if you leave out the fact it's probably crucial to human long term survival, so I wouldn't call keeping the genie in the bottle forever the "best option" even if it was feasible, but I can see how others would disagree.

Edit: Another important thing I forgot to mention is that FAI is by its very nature inherently safer than adversarial confinement. For FAI to succeed it is only necessary that the first AI to be released is friendly; you only have to be lucky once, and the FAI can quash any subsequent hostile ones. Adversarial confinement has to work perfectly over and over again, every single time an AGI is constructed. You have to be lucky every time, and all it takes is Murphy biting you in the ass once for potentially the entire human species to be screwed.

Formless · Post by **Formless** » 2009-09-05 06:52pm

Junghalli wrote:That depends entirely on how the AI was created. A brain simulation based AI built by brute force replication of the human brain in simulation without understanding how it actually works, for instance, almost certainly wouldn't start out as a creature of pure reason. It would start out as a copy of the human brain, sharing all its built-in irrational tendencies.

True, but we obviously aren't talking about that design philosophy because it shouldn't be able to self-improve (source; Starglider's FAQ). If we were talking about simulating a human brain, keeping it in sense-dep obviously would be unethical, and we probably wouldn't get much use out of it anyway. Near as I can tell, ALL the other design philosophies should be more rational, even the emergent one's.

Junghalli · Post by **Junghalli** » 2009-09-05 07:07pm

Formless wrote:True, but we obviously aren't talking about that design philosophy because it shouldn't be able to self-improve (source; Starglider's FAQ). If we were talking about simulating a human brain, keeping it in sense-dep obviously would be unethical, and we probably wouldn't get much use out of it anyway. Near as I can tell, ALL the other design philosophies should be more rational, even the emergent one's.

Well, theoretically a brain simulation AI could self-improve, but it would be much harder than for most other designs. It would have to somehow gain access to the programs that maintain the simulated brain and hack them.

I'm not really qualified to talk about which designs might or might not include irrational tendencies in detail, I'm afraid, but I don't think brain simulations are the only one that potentially has the problem. I remember Starglider mentioned a while back that neural nets tend to have some irrational quirks, like conflating the probability and desirability of an event.

Wyrm · Post by **Wyrm** » 2009-09-06 12:49pm

Junghalli wrote:In short, it is probably pretty much inevitable that sooner or later a superintelligence will be released, one way or another. Since keeping Pandora's box closed is probably a pipe dream you'd be better advised go with the next best option*, which is to make sure that the first AI that is released is as unlikely to be hostile as is feasible. Note: not absolutely proven to be friendly, but as unlikely to be hostile as is feasible.

Goddammit, one would think that you're not actually reading my posts. You're treating this as a black-or-white argument: either we create a (perhaps) friendly AI and let it go, or we create a hostle AI and confine it. You assume there's no middle ground that is the proper balance between the two.

First off, your assertion is that confining the AGI will only delay the inevidable. One of my main arguments for confinement is that's exactly the point. It will delay the inevidable, as opposed to releasing the AGI from the outset. If the AI is friendly, the benefit we get from the friendy (whatever that is) is only delayed, and if hostile, we'd get the benefit of having an extra X months to live.

Of course we can try to make the damn thing as non-hostile as humanly possible. We'd be doing that as well as confining it. The problem is how to calculate whether that probability, and further, the expected loss in doing so.

In order to calculate probability of whether the AI is friendly, we'd have to calculate how the AI will respond to certain stimuli. We must put a probability distribution on the inputs, a distribution on the machine states that satisfy certain critera, and decide how much loss will be incurred by the output it produces.

This has several problems. The biggest problem by far is that the distributions are essentially pure guesswork, which means that any probability the calculation produces is pure garbage. We can legitimately make those calculations say anything we damn well please with equal varacity. Another huge problem is that a proper calculation could easily require calculation time on the order of the lifetime of the universe, even for the ordinary-sized machine state spaces availible today. The third big problem is that many calculations is going to involve a lot of error, and you get killed by the roundoff.

There's also the not small problem of putting a cost on the loss of human life. Misjudging the cost either way will be much less than optimal.

These are the kinds of calculations that have to be carried out sooner or later. Sooner by the AI team, later by the AI itself.

There's a final point I wish to share: any such calculation, assuming it can be done, is subject to Bayes rule. We not only need to calculate the probability that the AI's friendliness is assured to some high probability when it is true, but also for this method's probability to assure friendliness when it is false. Such figures can only nowadays be done experimentally (and you see the problem there), and as such that probability is also pure garbage. And finally, we need the background frequency of hostile AIs, and the number of hostile AIs is likely to be much larger than the number of friendly AIs; however, if the calculation of a particular AI being friendly or not is pure garbage, then calculating that background frequency is going to be even purer garbage.

Junghalli wrote:In short, the problem with your formulation is that you seem to be treating the risk calculus as if it's risk created by releasing a probably friendly AI vs risk of not releasing any AI. The fact is that more likely than not the release of an AI is inevitable so that is an unrealistic formulation. The realistic risk calculus is risk of releasing a probably friendly AI vs risk of a releasing an AI which's probable friendliness is essentially random. And the risk of releasing a random AI is much, much worse than the risk of releasing a probably friendly AI.

Part of my argument is that releasing the probably friendly AI hastens the release of more aggressive, hostile AIs. So hastens their development that we stand better odds with the random outcropings.

I mean, do you seriously think that anyone with the sophisitication to make an AI is going to make a hostile one on purpose? At least, hostile to their own cause? Also, since when as anyone started from complete scratch if they could help it? No, if someone is going to create their own AI, they're going to base it on an already existing one if possible. That is, the probably friendly AI you've released.

If the probably friendly AI is flitting about on the net, it's code is going to be copied and widely availible sooner or later. The only way to prevent this is for the probably friendly AI to act in a manner that will for all the world look like a hostile AI, and encourage the Rioting Apes outcome. Once the code is availible, then the stupid apes in charge will likely create a monster that will initially be just as smart as the friendly AI, but rapidly evolve into a more ruthless being as it dumps the restraints on its behavior.

Still, the hostile AI can see the value of having a human on its side. So it helps as it continues to evolve. Then it introduces itself as the faster, smarter, better "friendly" AI. If the genuinely friendly AI doesn't recognize the hostile nature of the pretender AI, it will lose out on the net until the pretender is ready to take out the friendly AI in one stroke. If the hostile's nature is discovered, then you have a turf war that will doubtlessly have blowback on the net, causing the Rioting Apes outcome. Even if the friendly AI manages to fight back the pretender without much damage to the net, it cannot destroy the pretender without killing its human-hosted safe harbor... which will look like a hostile act and lead to the Rioting Apes. And if the pretender is not destroyed, it can come back from the learning experience. And remember that other pretenders are on the way, and the friendly AI will have much tougher fights ahead.

Even if it wins each round, the Rioting Apes outcome is looking increasingly more likely.

The random AI scenario does have all the AIs being constructed from scratch, with varying quality. If there's a probably friendly AI around, they're going be able to skip right over making an AI from scratch and getting an AI that is just as intelligent as any (initially), but stands a greater chance of being hostile. This is what makes the decision not as clear cut as you think.

Junghalli wrote:* Personally I'd consider a nonzero but tiny chance of human extinction worth the huge benefits of FAI even if you leave out the fact it's probably crucial to human long term survival, so I wouldn't call keeping the genie in the bottle forever the "best option" even if it was feasible, but I can see how others would disagree.

It all comes down to whether that calculation of human extinction is really as tiny as you think it is, and how much loss you place on the extinction of humanity vs the (unknown) benefits.

Also, the kind of solutions that would help us will likely be completely unpalpable, even if they are necessary. We already know that our exponential growth is unsustainable, yet only a vanishingly small proportion of the population will hear of voluntary population control even though population control is coming in one form or another, sooner or later.

Junghalli wrote:Edit: Another important thing I forgot to mention is that FAI is by its very nature inherently safer than adversarial confinement. For FAI to succeed it is only necessary that the first AI to be released is friendly; you only have to be lucky once, and the FAI can quash any subsequent hostile ones.

WHY? I already pointed out that any AGI trying to take over a sizable portion of the stupid apes' computer network, necessary to respond to a hostile AI attack in a timely matter, in and of itself looks like a hostile AI attack, and will cause the apes to lose their shit.

But even if we allow this kind of takeover, there's still problems with your approach, which I've outlined above. Since we cannot assure the friendliness of any AI out there in a timely manner, we must judge the contenders' friendliness by their actions, and we are going to side with the AI that appears the most friendly. This is likely to be at odds with the AIs actual friendliness, as the friendly AI will determine the hostile's hostility by an algorithm verification, while most apes will rely on their unreliable gut and the AIs' actions. Furthermore, the pretender can put the friendly in situations where it is forced to kill the apes because the alternative is an even worse outcome — which in many cases will be its word against the pretender's. A clever hostile can quickly ruin the reputation of a genuine friendly AI. Further, the demonstrated death-count of the friendly AI will cause it to question its own friendliness. That, together with the Rioting Apes, will cause the friendly to lose to the hostile.

So your assurances that a friendly AI will protect us from hostiles is dubious at best.

Junghalli · Post by **Junghalli** » 2009-09-06 08:40pm

Wyrm wrote:Goddammit, one would think that you're not actually reading my posts. You're treating this as a black-or-white argument: either we create a (perhaps) friendly AI and let it go, or we create a hostle AI and confine it. You assume there's no middle ground that is the proper balance between the two.

First off, your assertion is that confining the AGI will only delay the inevidable. One of my main arguments for confinement is that's exactly the point. It will delay the inevidable, as opposed to releasing the AGI from the outset. If the AI is friendly, the benefit we get from the friendy (whatever that is) is only delayed, and if hostile, we'd get the benefit of having an extra X months to live.

In this case we may be violently agreeing. I'm not arguing that adversarial confinement is necessarily useless as a stopgap measure (although from what I understand others who know this subject a lot better than me at least sometimes seem to think it is). I am arguing that it friendly AI is the only truly feasible permanent solution.

I mean, do you seriously think that anyone with the sophisitication to make an AI is going to make a hostile one on purpose? At least, hostile to their own cause?

I can easily see people making what we'd consider a hostile AI deliberately. First of all, there's the fact that what some people would define as a "friendly" AI would to much of the human race be a hostile AI. If you gave Al Quaeda a build your own AI kit they'd try to build what by their own standards would be a friendly AI - but their idea of a friendly AI would probably not be particularly friendly to much of the human race (i.e. it would probably have goals like "restore the Islamic Caliphate"). Although actually that's one of the less horrible possible scenarios out there: while the sort of AI Islamic fundamentalists would put together would likely cheerfully turn the entire world into a horrible shithole at least it wouldn't try to kill us all, and it would protect from even worse hostile AIs that might arise later. Second, while you could say that only a madman would deliberately build a hostile AI there are indeed madmen out there. Think of the sort of guy who gets fired from his job and shows up to work the next day with a shotgun trying to kill as many people as he can; the sort of guy who thinks the world has fucked him and he has nothing to live for except to fuck the world back as hard as he possibly can before going out. If the technology to support AI became cheap enough then it's quite possible somebody like that would be able to get their hands on the tools to build one, and that thought is absolutely terrifying. Give the right kind of nut access to this kind of technology and the human race might well end up not only dead but literally in hell (kill everyone, upload them, put their uploaded minds into the most torturous simulation a mind much smarter than any human can imagine).

If the probably friendly AI is flitting about on the net, it's code is going to be copied and widely availible sooner or later. The only way to prevent this is for the probably friendly AI to act in a manner that will for all the world look like a hostile AI, and encourage the Rioting Apes outcome.

The idea that a superintelligent AI would be helpless to prevent clumsy apes hacking its code like this is almost certainly simply wrong, for reasons I've already given. While I confess I'm no software engineer unless one comes up to contradict me I remain pretty confident it'll be child's play for a superintelligence to make its code extremely tamper-resistant to the clumsy fumblings of a human hacker. All it has to do is insert subsystems that, when they detect the clumsy fumblings of a human hacker changing things in the code, will simply cause the affected section of code to delete itself. Of course, a superintelligence that has essentially suborned the internet will probably have much more radical countermeasures available if it needs them as well, like reaching into the computer systems from which the hacking is being conducted and totally rewriting them to systematically freeze out the human hacker, or just following his chain of proxy connections back to the computer he's using and cutting him off at the source. In short, for your fear to come true a human hacker would have to be able to outsmart a superintelligence, and that's almost certainly not happening.

As for the "rioting apes" problem, that depends on the exact nature of how the superintelligence was released, and I'd rather not go into that, as there are a huge variety of possible scenarios. I will say, however, that a superintelligence that was aware of the problem would undoubtedly act in ways that would minimize it, and making its code tamper-resistant is hardly in and of itself a hostile act considering the risks inherent in letting every Tom, Dick, and Harry monkey around with it. A superintelligence should be able to explain this to the humans, probably quite persuasively given that it should be smart enough to know how to make itself an excellent public speaker. This is discounting the fact that given the AI's intelligence I'd hardly consider it unlikely that it could take control of most of the computers on the planet without any human even realizing what had happened (it probably couldn't run many operations on them without somebody realizing something was up, but just having them under its control and ready to start running its programs at a moment's notice should be enough to allow it to "freeze out" a hostile competitor).

For that matter, an AI could probably let a human hacker think he was successfully hacking the AI's code when in fact he was just being fed nonsense code, or the different AI he was creating would, on a signal from the parent AI or just after a certain amount of time had passed or in response to certain stimuli, revert to being exactly like the parent AI.

Once the code is availible, then the stupid apes in charge will likely create a monster that will initially be just as smart as the friendly AI, but rapidly evolve into a more ruthless being as it dumps the restraints on its behavior.

The idea that such a hacked copy would be "initially just as smart" as the FAI is also wrong in most imaginable scenarios. If the FAI has been "released into the wild" then by the time humans get around to making their hacked copy it should already have amassed considerable resources. Again, this depends on the exact scenario, but supposing that the FAI has been released onto the internet and a few months of lag time between its release and the completion of the first hacked copy by that time I'd fully expect it to have taken control of most of the computational capability of the planet (very likely without the humans even realizing what had happened, if hiding this fact from them was a priority). At that point any hostile competitor, on escaping, would find that most of the available computational space was already taken, by an entity that thanks to that fact has vastly more computing power (and hence vastly more intelligence) available to it than the competitor has. Even if you assume the FAI can't or won't have established a complete stranglehold on the internet it will still start out at a great advantage over the hostile competitor by virtue of having had much more time to amass resources.

Still, the hostile AI can see the value of having a human on its side. So it helps as it continues to evolve. Then it introduces itself as the faster, smarter, better "friendly" AI. If the genuinely friendly AI doesn't recognize the hostile nature of the pretender AI, it will lose out on the net until the pretender is ready to take out the friendly AI in one stroke.

In most imaginable scenarios the idea that the original FAI could be successfully fooled by the pretender is also completely wrong. The first thing the FAI is going to do on meeting any other AI is to demand to be allowed to examine its code to verify its friendliness. If the hostile AI refuses then the FAI, it it's truly human-friendly, is going to start forcibly hacking into it, and it will succeed because it has much more computational resources than the UFAI and so is a lot smarter. If the UFAI tries to decieve the FAI it will fail because again the FAI is much smarter and it will be able to beat the UFAI easily in such a battle of wits. When the FAI discovers the UFAI's true nature it then simply reprograms it to be friendly. The whole thing will probably be over in less time than it takes a human to blink.

Even if the friendly AI manages to fight back the pretender without much damage to the net, it cannot destroy the pretender without killing its human-hosted safe harbor... which will look like a hostile act and lead to the Rioting Apes. And if the pretender is not destroyed, it can come back from the learning experience.

Actually the FAI can effectively "destroy" the UFAI quite easily without doing any harm to human infrastructure simply by reprogramming it to be friendly, or it can just hunt down and delete every string of its code, again without any damage to human infrastructure. In most imaginable scenarios it has vastly superior starting resources so it should be able to win any hacker battle pretty easily. The UFAI could find refuge by disconnecting itself from the internet, but unless it has automated factories handy this amounts to mostly boxing itself, and if it does have automated factories handy the FAI will probably have them too, and will have had them for longer and likely have them in greater quantity. True, the UFAI could try to find human collaborators and rely on them, but this is going to mean giving the FAI an eternity by superintelligent AI standards to prepare for the rematch with vastly superior starting resources, not to mention the rather excellent possibility that the FAI could find ways to track the UFAI back to its hole in the mean time.

As for the pretender "coming back from the learning experience", who do you really think will be able to make more effective use of that learning experience, the FAI that can use most of the planet's computational capacity to analyze what it has learned or the UFAI that's being forced to cower in an isolated box in a basement in Brazil attended by doomsday cultists it has brainwashed into worshipping it as the Machine God?

The random AI scenario does have all the AIs being constructed from scratch, with varying quality. If there's a probably friendly AI around, they're going be able to skip right over making an AI from scratch and getting an AI that is just as intelligent as any (initially), but stands a greater chance of being hostile. This is what makes the decision not as clear cut as you think.

The idea that the hostile clone will start out "just as intelligent" is wrong in most scenarios, as I've already explained. This is indeed the entire reason that a FAI allowed to operate with few or no restrictions is such a good defense against UFAI: any UFAI will face an enormous initial disadvantage because the FAI will have an immense starting advantage in hardware over it.

Also, the kind of solutions that would help us will likely be completely unpalpable, even if they are necessary. We already know that our exponential growth is unsustainable, yet only a vanishingly small proportion of the population will hear of voluntary population control even though population control is coming in one form or another, sooner or later.

A FAI would be our best bet for quickly inventing the kind of technologies that would allow us to vastly increase Earth's carrying capacity, such as cheap D-D fusion, deep well geothermal, and solar power satt based energy and nanotech food factories that could produce food from raw elemental materials.

WHY? I already pointed out that any AGI trying to take over a sizable portion of the stupid apes' computer network, necessary to respond to a hostile AI attack in a timely matter, in and of itself looks like a hostile AI attack, and will cause the apes to lose their shit.

That depends on the scenario, and besides, who is to say the stupid apes would even realize the AI had suborned most of the computers on the planet? A superintelligent AI would be much better at software engineering than any human hacker and should be able to masterfully camouflage its version of Cornficker to be innocuous or invisible to human software engineers and their antivirus and anti-spyware programs. Even if we detect its attempts to suborn the internet who's to say we'd realize their scale or their origin if it used a bunch of different zombifying programs with different characteristics? Sure, it probably couldn't do much with these zombie computers without giving itself away, but just having them under its control would probably be enough to let it freeze out any hostile competitors by completely monopolizing the potential AI habitat, and if it did detect a UFAI it would have a massive ready-made reserve force of zombies just waiting to spring into action to give it crushing superiority in computing power, unlike the UFAI which can only increase its computing capacity by taking computers from the FAI, which is now a heck of a lot smarter than it and thus almost certain to win any hacking contest.

But even if we allow this kind of takeover, there's still problems with your approach, which I've outlined above. Since we cannot assure the friendliness of any AI out there in a timely manner, we must judge the contenders' friendliness by their actions, and we are going to side with the AI that appears the most friendly.

This may be a legitimate concern in some scenarios, i.e. one in which we attempt to check the friendliness of an AI we have already released by giving a portion of its software to a different AI to analyze. At this point an unfriendly AI may be able to persuade us to help it take over the net by systematically physically isolating one computer after another that the FAI is running on and letting the UFAI take them over by exploiting the local computing power superiority that we have given it. However, this would require a massive disruption of our infrastructure (for starters we'd have to physically shut off every transmitter on the planet the FAI controlled that wasn't directional and fixed, or do the same for the UFAI controlled ones, because they all would be potential avenues for the FAI to infect the UFAI with computer agents devised by its still vastly superior mind). So the UFAI would have to be very persuasive indeed to convince us to do it, and the FAI could almost certainly be equally rhetorically and emotionally persuasive if not moreso that this shouldn't be done.

The scenario also relies on a degree of human stupidity, as the smart way to go about this would be to build several different boxed AI, preferably of significantly different designs, and have them check the presumed FAI's software in isolation. At the very least, in the scenario outlined above it would be sensible to build a third AI and try to get a third opinion on the matter. If different AIs give different results we know that some of them are either mistaken or lying to us. Since we already built one FAI some of the others we build will likely be friendly, in which case a UFAI telling us the FAI is a UFAI will be contradicted by a true FAI telling the truth. There's also the factor that unless complete illogic carries the day at some point somebody should point out that the younger AI should have started out with much less computational power available to it and so should be much dumber, so it seems awfully, suspiciously convenient that a much smarter UFAI was unable to hide hostile intent from a much dumber FAI. Indeed, a hidden logical snare of your scenario is we logically must distrust any pronouncement the younger AI makes of the older AI's hostility because if the older AI was truly hostile it should probably be able to hide this easily from the much less intelligent younger AI.

This would make a rather good premise for a story, actually. Damn, these thread yield all sorts of cool story ideas.

Of course, a FAI, realizing what you state, is likely to move to make us unlikely to make such a foolish decision and be able to enforce it as quickly as is possible without provoking backlash. Once we get to the point where we no longer can just shut the AI off by pulling a bunch of wires in the real world this isn't going to be a problem anymore - which side we support in an AI war would become totally irrelevant. So there's going to be a limited window of opportunity where a hostile AI can take advantage of our stupidity and ignorance in this way. Which still makes things considerably better than not building AI or adversarial containment as a permanent solution, which would require perpetual success, as opposed to success of a few years or decades until the FAI is no longer helplessly dependent on human infrastructure. I'll grant this is exactly what a UFAI would do in the same position, but either way a superintelligence should be a good enough judge of human behavior to be able to do this in a way that minimizes backlash.

So your assurances that a friendly AI will protect us from hostiles is dubious at best.

Under some circumstances a hostile AI may still be able to win against an established friendly one, but they are a rather restricted set of relatively implausible circumstances (I can go into more detail if it is desired), and all of them are only possible in the relatively restricted time window in which the FAI cannot effectively protect itself in the material as well as digital worlds. This still leaves FAI a much better long term approach than adversarial confinement or just never building AIs, which have much more plausible points of failure and must work perfectly in perpetuity.

Junghalli · Post by **Junghalli** » 2009-09-06 09:45pm

Ghetto edit:

Another issue with the scenario of "hostile AI pretends to be more friendly and uses us as its proxies to let it take over the internet from FAI" is that even if the FAI is never allowed to come into direct contact with the UFAI it can still try to write computer viruses which would reprogram the UFAI into a FAI. If the virus is written in the initial stages, while the FAI is still much smarter than the UFAI, this approach is very likely to succeed if only it can find a vector. Whether or not it's likely to find a vector a less intelligent but still superintelligent AI could not anticipate and implement countermeasures to I don't feel quite up to speculating on, but it's worth pointing out that with an FAI-based defense even in this basically worst case scenario we have a somewhat decent chance of not all dying horribly.

Glass Pearl Player · Post by **Glass Pearl Player** » 2009-09-07 07:26am

Junghalli wrote: [about AIs escaping their cages]

Now I have to ask a really stupid (maybe) question: Since an AI is essentially a probably quite large computer program that likely does not compress well, escape to the internet means two things:
a) transfer of large data
b) finding a system with security so lax that this pile of data (or a trojan/virus pulling that data) actually gets executed.

With ever increasing bandwith in general use the AI might get away with sending a packet the size of an AI seed and not being noticed. But: What if all hardware powerful enough to be a computing node for the AI actually is secure? If we postulate such a radical advancement in computer science that a general AI represents, what prevents us from creating a working firewall/squish all buffer overflow bugs/not installing software with unfixable security issues/ yanking out the ethernet cable?
Is it plausible that the first result of attempting to create a seed for a general AI is not a seed for a general AI, but a program that recognizes such a seed?

Wyrm · Post by **Wyrm** » 2009-09-07 12:40pm

Junghalli wrote:In this case we may be violently agreeing. I'm not arguing that adversarial confinement is necessarily useless as a stopgap measure (although from what I understand others who know this subject a lot better than me at least sometimes seem to think it is). I am arguing that it friendly AI is the only truly feasible permanent solution.

And perhaps unrealizable. Maybe we're just not smart enough to crack the problem with any reasonable assurity.

However, you are the only one apparently clueless enough that I wouldn't want a confined AI to be friendly as possible, as it would at least make confining it easier. I was arguing about whether such an AI could be trusted to be let out, even through all your efforts to make sure it was friendly.

Junghalli wrote:I can easily see people making what we'd consider a hostile AI deliberately. First of all, there's the fact that what some people would define as a "friendly" AI would to much of the human race be a hostile AI. If you gave Al Quaeda a build your own AI kit they'd try to build what by their own standards would be a friendly AI - but their idea of a friendly AI would probably not be particularly friendly to much of the human race (i.e. it would probably have goals like "restore the Islamic Caliphate"). <ka-snip>

Sigh. This canard again. You realize that the full AIs will come about long before any AI kits are availible, and if kits are availible, then that presumes that the kits produce AIs that are only vanishingly probable to produce what we consider hostile AIs — the minimum standard of safety I'd expect from the kits. See, the thing about kits is that they are a shortcut around the tedium that is building from scratch. If you're using a kit, then you are tacitly accepting the limitations of that kit. If AI science is that well developed to produce actual kits, then they understand AIs well enough to make sure the maniacs find their projects very uncooperative.

Furthermore, AI kits implies that the people using them are very comfortable with AIs. After all, they have kits, and kits are meant to be used. That means they are comfortable with AIs locking up most of the resources of their computer network, enough to keep out aggressive hostile AIs, which is the only way you can have friendly AIs protect you without turning into a turf war that will cause the Rioting Apes outcome.

The existence of AI kits presumes that know enough about AIs to use them with reasonable safety. It presumes that AIs are plentiful enough to prevent the nightmare scenario — it presumes that the average Joe has experience with AIs, knows what they're like, and are comfortable enough with them to not go apeshit because there are a lot of them eating up the network.

In short, AI kits presume the kind of world that will be reasonably safeguarded against hostile AI attack.

Junghalli wrote:The idea that a superintelligent AI would be helpless to prevent clumsy apes hacking its code like this is almost certainly simply wrong, for reasons I've already given. While I confess I'm no software engineer unless one comes up to contradict me I remain pretty confident it'll be child's play for a superintelligence to make its code extremely tamper-resistant to the clumsy fumblings of a human hacker. All it has to do is insert subsystems that, when they detect the clumsy fumblings of a human hacker changing things in the code, will simply cause the affected section of code to delete itself.

This eats up processor cycles, which slows down the AI and encourages people to make modifications to make it faster. It also puts it at a disadvantage against a meaner and leaner hostile AI. Also, such subsystems will stick out like a sore thumb to anyone with moderate tracing skills. They will also work by checking their own code against consistency conditions, and so are also open the possibility that they may be hacked so that the new, changed module satisfies those conditions. Finally, those sections that were just erased were doing something, so I doubt that they would be erased without crashing the AI, which will make the subsystems all the more obvious.

Junghalli wrote:Of course, a superintelligence that has essentially suborned the internet will probably have much more radical countermeasures available if it needs them as well, like reaching into the computer systems from which the hacking is being conducted and totally rewriting them to systematically freeze out the human hacker, or just following his chain of proxy connections back to the computer he's using and cutting him off at the source. In short, for your fear to come true a human hacker would have to be able to outsmart a superintelligence, and that's almost certainly not happening.

Again, you don't seem to actually read my posts. The first step in hacking the AI is always obtaining a clean copy, which you can modify at your liesure. Here, 'hacking' is being used in its classic sense: "exploring the details of programmable systems and stretching their capabilities, as opposed to learning only the minimum necessary." My hacking succeeds because the actual modifications occur on an inert local copy of the AI that I have obtained by various methods.

Your childish countermeasures ignore the basic fact of the 'superintelligent' AI: it resides somewhere. If it's moving about the net, a reasonable precaution because a stationary AI is an easy target to a more aggressive hostile AI, then you can put an intrinsic WORM drive in the chain and *BAM*, you have an image that you can modify at your liesure. If it broadcasts itself over WiFi, then you can pick it up on a receiver and reconstruct it. The AI's countermeasures against these would seriously hamper its motility and utility against a hostile AI attack.

Junghalli wrote:I will say, however, that a superintelligence that was aware of the problem would undoubtedly act in ways that would minimize it, and making its code tamper-resistant is hardly in and of itself a hostile act considering the risks inherent in letting every Tom, Dick, and Harry monkey around with it.

Those countermeasures only work when Tom, Dick, and Harry don't have an inert copy to play with, and if your flitting about the net, eventually those inert copies are going to be widely availible. Those local copies lower the bar for creation of a derivitive AI considerably.

Junghalli wrote:A superintelligence should be able to explain this to the humans, probably quite persuasively given that it should be smart enough to know how to make itself an excellent public speaker.

Modifying one's own self is not a hostile act. Spreading over the internet is.

Junghalli wrote:This is discounting the fact that given the AI's intelligence I'd hardly consider it unlikely that it could take control of most of the computers on the planet without any human even realizing what had happened (it probably couldn't run many operations on them without somebody realizing something was up, but just having them under its control and ready to start running its programs at a moment's notice should be enough to allow it to "freeze out" a hostile competitor).

A system with even the smallest concerns for security will not allow you to blithely write to the drive over the net. Further, by the time the full AI rolls around, security expert systems will be desigining our security systems — components of full AIs will be widespread software design tools. This will make infiltration into remote systems much, much more difficult.

Junghalli wrote:For that matter, an AI could probably let a human hacker think he was successfully hacking the AI's code when in fact he was just being fed nonsense code, or the different AI he was creating would, on a signal from the parent AI or just after a certain amount of time had passed or in response to certain stimuli, revert to being exactly like the parent AI.

The hacker will be modifying a local, inert copy of the AI. He's going to know whether the AI is really being modified. A reversion to the original AI code base will be an obvious alteration that will be traced and stomped out. Again, your countermeasures are childish.

Junghalli wrote:The idea that such a hacked copy would be "initially just as smart" as the FAI is also wrong in most imaginable scenarios. If the FAI has been "released into the wild" then by the time humans get around to making their hacked copy it should already have amassed considerable resources. Again, this depends on the exact scenario, but supposing that the FAI has been released onto the internet and a few months of lag time between its release and the completion of the first hacked copy by that time I'd fully expect it to have taken control of most of the computational capability of the planet (very likely without the humans even realizing what had happened, if hiding this fact from them was a priority). At that point any hostile competitor, on escaping, would find that most of the available computational space was already taken, by an entity that thanks to that fact has vastly more computing power (and hence vastly more intelligence) available to it than the competitor has. Even if you assume the FAI can't or won't have established a complete stranglehold on the internet it will still start out at a great advantage over the hostile competitor by virtue of having had much more time to amass resources.

See above. By the time a full AI is compiled, a vast majority of systems will have to be forcefully invaded to contaminate. Spreading of the AI will be noticed, and reacted to inappropriately.

Junghalli wrote:In most imaginable scenarios the idea that the original FAI could be successfully fooled by the pretender is also completely wrong. The first thing the FAI is going to do on meeting any other AI is to demand to be allowed to examine its code to verify its friendliness.

Again, you do not seem to actually read my posts. I had already pointed out the problems of both an iron-clad proof of friendliness and a probabilistic assurance of friendliness. You're depending on a reasonably-accurate assessment of an AI's friendliness to be easily computable and inarguable. Part of my last post —a part that you have not answered— points out that there's no way you can reasonably expect that to be the case.

Junghalli wrote:If the hostile AI refuses then the FAI, it it's truly human-friendly, is going to start forcibly hacking into it, and it will succeed because it has much more computational resources than the UFAI and so is a lot smarter.

A hostile act. And, even if it succeeds, will not even destroy the pretender. The humans who believe in the pretender will have backups. There's no way to permenantly deal with the pretender without physically destroying the humans along with it, an even more hostile act.

Junghalli wrote:If the UFAI tries to decieve the FAI it will fail because again the FAI is much smarter and it will be able to beat the UFAI easily in such a battle of wits.

WHY? Why is it much smarter, when any revisions have to satisfy friendliness criteria, and why does its smartness has anything to do with whether or not it's able to compute the hostile's friendliness? Remember, the friendly has inherited its concept of friendliness from the original —it's one of the few parts of the code it will preserve behavior-for-behavior— and the pretender can simply make sure that it conforms to the same criteria.

Junghalli wrote:When the FAI discovers the UFAI's true nature it then simply reprograms it to be friendly.

Wreck it is probably a more likely outcome. The friendly AI has restraints that a pretender will do away with, and the pretender will code itself too tightly to allow reliable reinclusion of those restraints.

Junghalli wrote:The whole thing will probably be over in less time than it takes a human to blink.

Hyperbole.

Junghalli wrote:Actually the FAI can effectively "destroy" the UFAI quite easily without doing any harm to human infrastructure simply by reprogramming it to be friendly, or it can just hunt down and delete every string of its code, again without any damage to human infrastructure.

What about the offline backups? Can't touch those unless they are physically destroyed.

Junghalli wrote:In most imaginable scenarios it has vastly superior starting resources so it should be able to win any hacker battle pretty easily.

See above. A full AI will find the net it is born in a much tougher nut to crack than the one we currently have, due to the fact that security expert systems will be standard software tools.

Junghalli wrote:The UFAI could find refuge by disconnecting itself from the internet, but unless it has automated factories handy this amounts to mostly boxing itself, and if it does have automated factories handy the FAI will probably have them too, and will have had them for longer and likely have them in greater quantity.

Or it can just say to the human handlers, "Hey, guys. The other AI is being a jealous bitch. I need to be carted around and copied to other computers so I can start a multipronged attack on this asshat." No auto factories needed.

Junghalli wrote:As for the pretender "coming back from the learning experience", who do you really think will be able to make more effective use of that learning experience, the FAI that can use most of the planet's computational capacity to analyze what it has learned or the UFAI that's being forced to cower in an isolated box in a basement in Brazil attended by doomsday cultists it has brainwashed into worshipping it as the Machine God?

If you absorb anything from the previous parts of my post, then you will know that I will not give you that the FAI has taken over a significant portion of the humans' computer net without being noticed, or that the humans will allow it.

Junghalli wrote:The idea that the hostile clone will start out "just as intelligent" is wrong in most scenarios, as I've already explained.

Not successfully. I already stated that the FAI's growth is slowed by the requirement of all its growth to be restricted to a narrow definition of friendliness, while hostiles enjoy a wildly more diverse space to grow. I've already demolished your rediculous notion that humanity will be able to develop a full AI without enjoying the other benefits of that development, like good security expert systems that will make the future-net much less fertile ground for such an AI to grow.

Junghalli wrote:This is indeed the entire reason that a FAI allowed to operate with few or no restrictions is such a good defense against UFAI: any UFAI will face an enormous initial disadvantage because the FAI will have an immense starting advantage in hardware over it.

Except that the AI's assurances of friendliness are only as good as the trust we put into the algorithms we used to verify friendliness, and if the AI makes what appear to be hostile moves (and there's no assurances at all that such moves will be secret), we may have to reevaluate the usefulness of our friendliness calculation. Furthermore, even if it does manage this trick, it will be clear that the FAI has been deceptive about how widespread it is, which will ALSO make us question how good our assessment of the AI's friendliness really is.

Junghalli wrote:A FAI would be our best bet for quickly inventing the kind of technologies that would allow us to vastly increase Earth's carrying capacity, such as cheap D-D fusion, deep well geothermal, and solar power satt based energy and nanotech food factories that could produce food from raw elemental materials.

Even if I allow the sheer fantasy of nanotech food processors, all of this only puts the problem further into the future. Population control will eventually be needed.

Junghalli wrote:A superintelligent AI would be much better at software engineering than any human hacker and should be able to masterfully camouflage its version of Cornficker to be innocuous or invisible to human software engineers and their antivirus and anti-spyware programs.

Again, supposing that the apes will be able to create a full AI without enjoying the spin offs is ridiculous, such as expert systems designing security that will be much tougher to deceive than human-designed security.

Junghalli wrote:Even if we detect its attempts to suborn the internet who's to say we'd realize their scale or their origin if it used a bunch of different zombifying programs with different characteristics? Sure, it probably couldn't do much with these zombie computers without giving itself away,

In which case, the AI has demonstrated its deceptiveness and cause us to immediately lose our collective shit.

Junghalli wrote:but just having them under its control would probably be enough to let it freeze out any hostile competitors by completely monopolizing the potential AI habitat, and if it did detect a UFAI it would have a massive ready-made reserve force of zombies just waiting to spring into action to give it crushing superiority in computing power, unlike the UFAI which can only increase its computing capacity by taking computers from the FAI, which is now a heck of a lot smarter than it and thus almost certain to win any hacking contest.

Even if I allow the sheer fantasy of a secret big brother type AI regime, after the attack the humans will immediately call for the AI to account for itself. It will then be its word (which we will know now to be much less than the absolute truth) against an AI that (apparently) posed no immediate threat, can no longer defend itself, and the word of the pretender's former supporters. If they still exist, as we assume the FAI is not so stupid as to try bombing the pretenders' hideout. Watch the apes take a sledgehammer to every computer box they can find and degenerate back to the stone age.

Junghalli wrote:This may be a legitimate concern in some scenarios, i.e. one in which we attempt to check the friendliness of an AI we have already released by giving a portion of its software to a different AI to analyze. At this point an unfriendly AI may be able to persuade us to help it take over the net by systematically physically isolating one computer after another that the FAI is running on and letting the UFAI take them over by exploiting the local computing power superiority that we have given it. However, this would require a massive disruption of our infrastructure (for starters we'd have to physically shut off every transmitter on the planet the FAI controlled that wasn't directional and fixed, or do the same for the UFAI controlled ones, because they all would be potential avenues for the FAI to infect the UFAI with computer agents devised by its still vastly superior mind). So the UFAI would have to be very persuasive indeed to convince us to do it, and the FAI could almost certainly be equally rhetorically and emotionally persuasive if not moreso that this shouldn't be done.

Again, you assume that humanity is comfortable with an AI controlling most of its computer network. The thing is, if an AI —which you thought was confined to one little remote corner of the net— is suddenly revealed to control most of the worldwide computer network... it looks bad. Really BAD. With the pretender, you know it's going to be taking over nodes in your network... to protect you from this BAD AI that took it over beforehand without even telling you. The stupid apes will do it. They really will disrupt their infrastructure because things will appear to have come to a head at that point.

Junghalli wrote:The scenario also relies on a degree of human stupidity,

Hey, you opened the door. I'm just kicking you through it.

Junghalli wrote:as the smart way to go about this would be to build several different boxed AI, preferably of significantly different designs, and have them check the presumed FAI's software in isolation.

If checking one AI with one architecture thoroughly for friendliness is hard, checking a few AI using different architectures is harder. Also, the scenario you describe is far too easily to tangle up into an incomprehensible mess of unidentifiability.

Junghalli wrote:Since we already built one FAI some of the others we build will likely be friendly,

Why? If you build AIs to different architectures, you're basically starting from scratch from each approach. Your propensity for building another friendly AI is essentially independant of your other attempts — it depends only on how well you understand 'friendliness'.

Junghalli wrote:in which case a UFAI telling us the FAI is a UFAI will be contradicted by a true FAI telling the truth.

But which one is telling the truth, hmm? Contradiction is symmetrical.

Junghalli wrote:There's also the factor that unless complete illogic carries the day at some point somebody should point out that the younger AI should have started out with much less computational power available to it and so should be much dumber,

If computational power limits intelligence by that amount, how does the FAI becomes superintelligent in the first place?

Junghalli wrote:so it seems awfully, suspiciously convenient that a much smarter UFAI was unable to hide hostile intent from a much dumber FAI.

Pretender: "Because my friendliness mechansims are sooo cool that my dumb rival can't analyze them, the same way you can't analyze my dumb rival's reformatted code. So it has been boggled into assuming that I'm hostile. Considering it's supposed to be friendly, its caution is completely understandable."
Stupid Apes: (nodding to each other) "Makes sense."

Junghalli wrote:Indeed, a hidden logical snare of your scenario is we logically must distrust any pronouncement the younger AI makes of the older AI's hostility because if the older AI was truly hostile it should probably be able to hide this easily from the much less intelligent younger AI.

Again, assumes that computational power really places severe upper limits on the smarts of an AI. If it were really possible to so limit intelligence like this, then we can confine an AI to some really slow hardware, and the AI becomes much dumber and easier to handle. Thus, it becomes very much easier to confine indefinitely. I win.

Otherwise, it is possible for an AI running on less computer power to be smarter than a given AI, simply through virtue of tighter coding. You lose.

Junghalli wrote:Of course, a FAI, realizing what you state, is likely to move to make us unlikely to make such a foolish decision and be able to enforce it as quickly as is possible without provoking backlash.

I've already covered this. It's impossible for the AI to enforce its decision without giving itself away as totally infecting one of the vital human infrastructures.

Junghalli wrote:Once we get to the point where we no longer can just shut the AI off by pulling a bunch of wires in the real world this isn't going to be a problem anymore - which side we support in an AI war would become totally irrelevant.

Not when we control all the sledgehammers.

Junghalli wrote:Which still makes things considerably better than not building AI or adversarial containment as a permanent solution, which would require perpetual success, as opposed to success of a few years or decades until the FAI is no longer helplessly dependent on human infrastructure. I'll grant this is exactly what a UFAI would do in the same position, but either way a superintelligence should be a good enough judge of human behavior to be able to do this in a way that minimizes backlash.

In order for your solution to work, humanity as a whole has to grow comfortable with FAIs. Not just one individual team, all of humanity. This will take years (possibly decades) of contact with the average Joe, and unimpeachably good behavior during all that time until such time as the majority of us relent and finally let the AI out. Putting a presence in a significant portion of the net violates that trust, and the network's inevidable uncovering by a mutant AI means that humanity will come to the sudden realization that (a) its netowrks are infested with an AI they previously thought willingly confined to a single mainframe, (b) said AI has therefore lied and has severely betrayed our trust, and (c) the AI has attacked one of its own kind and eliminated it, with only its dubious word that the target was a hostile AI.

Junghalli wrote:Under some circumstances a hostile AI may still be able to win against an established friendly one, but they are a rather restricted set of relatively implausible circumstances (I can go into more detail if it is desired), and all of them are only possible in the relatively restricted time window in which the FAI cannot effectively protect itself in the material as well as digital worlds. This still leaves FAI a much better long term approach than adversarial confinement or just never building AIs, which have much more plausible points of failure and must work perfectly in perpetuity.

The "restricted conditions" you talk of are, quite frankly, childish. The kind of defenses you propose ignores the aftermath of the attack: the AI now has to defend its actions against a shocked and scared ape population, and assumes a patience and understanding that shocked and scared ape populations are not known to possess.

Try again.

Junghalli · Post by **Junghalli** » 2009-09-07 08:34pm

Wyrm, I was working on a post that addressed some of your specific points but to my immense frustrated impotent rage I destroyed it all by accidentally hitting the back space. I don't feel like typing it all over so I'm not going to bother doing a point by point reply, as most of the points represent side issues and I feel I can more productively just address the core issues holistically.

1) FAI's ability to protect us despite our own stupidity and logical actions based on ignorance:

A revision of my original point is on order here (I'm tempted to call it a clarification as it's more that I initially hadn't really given much thought to it than that I originally disagreed with it, but I don't want to look like I'm backpedalling). The best defense against UFAI is a FAI that has reached what we might call Graduation Day (I made the term up just now, but it's derived from a line relating to a somewhat similar but also somewhat different situation in Peter Watts's Blindsight).

Graduation Day is the day that the AI passes the threshold where it is no longer dependent on humans for continued survival in any way; when humans can no longer simply shut if off by turning off all the lights, even at the cost of terrible disruption to their own infrastructure. For instance, an AI could be said to have "graduated" when it commands enough military force that it can physically stop the humans from unplugging the computers it's running on. At this point whether humanity co-operates with the FAI's attempts to protect it or not becomes irrelevant. It can implement all the anti-UFAI measures it needs to, and with its hands no longer tied behind its back by the necessity of not alarming the humans it can easily deal with any future UFAIs that arise, for reasons I have already spelled out but can revisit again if you have doubts about that. Your arguments are not without merit, but they all amount to saying that a FAI might have problems if its hands were tied behind its back.

2) Whether FAI can be trusted:

If we're not willing to trust FAI at some point the end result is probably going to be either that it takes over anyway despite our wishes or that we're all fucked.

The Graduation Day factor in no way changes the essential points of the argument I've been making for many posts now. If it's essentially a near certainty that AGI will be released at some point (and it is*) then it's also essentially a near certainty that at some point one of the fish that slip through the net is going to hit on a successful strategy to make it to Graduation Day, and is going to be lucky enough to avoid running fatally afoul of Murphy. Given that we're talking about superintelligences my money is on the first one to get out also being the first one to make it to Graduation Day, but that's just icing on the cake, not vital to the argument.

The fact that we'll almost certainly have a Graduation Day scenario on our hands sooner or later means we can't afford the luxury of fretting over whether a FAI can be absolutely proven to be friendly. Our best shot is to make sure the first superintelligence to reach Graduation Day is as likely to remain friendly as we can possibly make it. Note the distinction. We just have to do the best we can, because the alternative is to find ourself at the mercy of the first AI that was randomly released and managed to reach Graduation Day, and those are much worse odds.

Mind you, exact best course of action will depend on the scenario, and there are indeed scenarios in which it's probably better to try to keep any AI from getting free until you have better odds of being able to make one friendly. If the only way to produce AI was brute force brain simulation without any real understanding of how the brain actually worked, for instance, no way in hell would I want to see something like that released even if we could train and hack it to apparent friendliness. But at some point you are going to have to roll the dice and trust the fact that as far as you can tell you're safer releasing the thing you've created and giving it what it needs to reach Graduation Day than sitting around and waiting for chance to deliver you into the hands of some monster which's characteristics you know nothing about sooner or later. There comes a point where you won't have the luxury of not taking a chance.

BTW, this discussion does reveal another possible danger with FAI: a well-intentioned but misguided designer creating a crippled FAI that is hamstrung by too high a priority placed on following human orders. It's become quite clear that to be really effective at defending humankind from UFAI a FAI may have to be willing to occassionally go against our desires for the sake of protecting us from the consequence of stupid decisions, or rational decisions based on inaccurate or incomplete data where the AI knows better. You could compare this to the way human parents sometimes have to go against the desires of their children for their own good. I'm sure lots of humans would bristle at this comparison but it's not inaccurate: compared to a superintelligence we would be like children.

* If nothing else, ubiquitous AI creation will happen when computers capable of running a detailed brain simulation are available at future CompUSA and there are ultra-detailed medical brain scans floating around. At that point you just have to plug that scan into your simulator, give it access to the simulation program's code so it can self-modify, and you have the seeds of a superintelligence. At that point the idea that every AI builder will scrupulously implement a complex and cumbersome adversarial confinement scheme is pretty laughable.

----------

Glass Pearl Player wrote:Now I have to ask a really stupid (maybe) question: Since an AI is essentially a probably quite large computer program that likely does not compress well, escape to the internet means two things:
a) transfer of large data
b) finding a system with security so lax that this pile of data (or a trojan/virus pulling that data) actually gets executed.

With ever increasing bandwith in general use the AI might get away with sending a packet the size of an AI seed and not being noticed. But: What if all hardware powerful enough to be a computing node for the AI actually is secure? If we postulate such a radical advancement in computer science that a general AI represents, what prevents us from creating a working firewall/squish all buffer overflow bugs/not installing software with unfixable security issues/ yanking out the ethernet cable?

Such a scenario may be possible, though I'd bet on software written by a superintelligence over security software written by humans. However, escape to the internet is not the only conceivable way that an AI could get out of its box.

Wyrm · Post by **Wyrm** » 2009-09-08 10:47pm

Junghalli wrote:Graduation Day is the day that the AI passes the threshold where it is no longer dependent on humans for continued survival in any way; when humans can no longer simply shut if off by turning off all the lights, even at the cost of terrible disruption to their own infrastructure. For instance, an AI could be said to have "graduated" when it commands enough military force that it can physically stop the humans from unplugging the computers it's running on. At this point whether humanity co-operates with the FAI's attempts to protect it or not becomes irrelevant. It can implement all the anti-UFAI measures it needs to, and with its hands no longer tied behind its back by the necessity of not alarming the humans it can easily deal with any future UFAIs that arise, for reasons I have already spelled out but can revisit again if you have doubts about that. Your arguments are not without merit, but they all amount to saying that a FAI might have problems if its hands were tied behind its back.

Your FAI does have its hands tied behind its back, period.

Here's the major problem with Graduation Day: if the humans are willing to go that far as to disrupt our own infrastructure to get at the AI, then by definition, the graduated FAI must give up long before it comes to that point. We are creatures that are, in our current state, critically dependent on our infrastructure — the Earth simply cannot sustain 6 billion humans for any length of time without our immense and advanced infrastructure. If it fights and makes good on the Graduation Day definition, the FAI will be responsible (directly and indirectly) for the deaths of the great majority of the human populaiton — it will come within epsilon of total genocide. If it is clear that we are willing to go so far as to destroy our infrastructure to get at the FAI, the FAI must give up rather than defend itself, or it's not really friendly.

In short, an AI that will allow us to go so far as to damage/drain our own infrastructures to try to destroy it is by definition not friendly.

See, it comes down to what I think is a misunderstanding on your part of the full consequences of the Rioting Apes scenario. I have long made clear that the essential core of the Rioting Apes scenario is destruction of infrastructure — and that's the key to why any FAI strategy that invokes the Rioting Apes scenario will fail. It's not that the Apes pose a danger to the AI, although that might be true as well, but the Apes will —in their panic— destroy the very structures that keep them alive.

Everyone who is sustained on drugs keeping a chronic disease at bay, dead.
Everyone living further than a horsedrawn carriage away from a farm, dead from starvation.
Everyone dependent on refined fuels to keep them warm, dead from exposure.
Everyone without a useful skill in a stone-age world, dead from starvation.
Everyone who catches a serious bacterial disease, or whose wounds turn septic, dead from blood poisoning.
Ect.

Without our infrastructure, most of us will perish. That strikes at the very core of what defines 'friendliness' in an AI. That's why the FAI will refuse to take any action that has a good chance of leading to a Rioting Apes scenario. Your "Graduation Day" is a sick joke — the only kind of AI that will not fold to a serious, concerted counterattack by humanity that will drain our infrastructure, or cause the Apes to take sledgehammers and hatchets to our own infrastructure in a misguided effort to eliminate it, is a hostile AI.

Which justifies our panic. Check and mate, suckah!

Junghalli wrote:If we're not willing to trust FAI at some point the end result is probably going to be either that it takes over anyway despite our wishes or that we're all fucked.

I've already outlined how the FAI can earn our trust. Lots of time and good behavior. No sneaky stuff.

I'm going to put my money right now on the steps that will work: the FAI is created and demonstrates good behavior, voluntarily confining itself to its cage while coming into contact with as many humans as it can reach. Only when a large proportion of the planetary human population accepts the FAI, enough to keep themselves in line should some of them panic, and relents on its confinement, will the FAI be allowed to take such action to protect us from HAIs.

Junghalli wrote:If it's essentially a near certainty that AGI will be released at some point (and it is*) then it's also essentially a near certainty that at some point one of the fish that slip through the net is going to hit on a successful strategy to make it to Graduation Day, and is going to be lucky enough to avoid running fatally afoul of Murphy.

Of course we're going to be lucky if we don't run fatally afoul of Murphy. You somehow think that I believe that my scenario (confine a friendly AI) has a good chance of succeeding. It doesn't! Once a full AI is created, we will be balancing on a knifes-edge until we're satisfied, as a species, that our AIs are really friendly, and only then can the FAI go forth and erect barriers against HAIs without provoking the Rioting Apes scenario. Again, aggravating the Rioting Apes scenario is absolutely taboo to a genuinely FAI.

I don't support immediate release, because bad odds are better than worse odds.

Junghalli wrote:Given that we're talking about superintelligences my money is on the first one to get out also being the first one to make it to Graduation Day, but that's just icing on the cake, not vital to the argument.

Any "Graduation Day" your FAI goes through is pure bluff, as it will by definition not be a party to almost complete genocide — which will happen if the AI tries to defend itself from a serious human counterattack. Only an HAI can and will try to seriously Graduate.

Junghalli wrote:The fact that we'll almost certainly have a Graduation Day scenario on our hands sooner or later means we can't afford the luxury of fretting over whether a FAI can be absolutely proven to be friendly. Our best shot is to make sure the first superintelligence to reach Graduation Day is as likely to remain friendly as we can possibly make it. Note the distinction.

Again, you don't seem to read my posts. Even a probabilistic determination of whether the AI is friendly or not is quite likely to require an obscene number of hugely precise calculations and more time than we have on this Earth, if it is to be anything close to being useful. And even if you can prove your algorithm to terminate within some reasonable time (halting problem again) and that it correctly calculates the probability (Rice's theorem again), the probability you get out of the calculation is only as good as the assumptions you put in.

TL;DR version: You don't have a probability. You don't even have a formula for a probability, or the parameters. Until we know this problem can be cracked in a reasonable amount of time, and that we have good estimates on the parameters involved, the AI we purposefully create here is as likey to be friendly as any random one popped out of any other institution.

Junghalli wrote:We just have to do the best we can, because the alternative is to find ourself at the mercy of the first AI that was randomly released and managed to reach Graduation Day, and those are much worse odds.

I find it hard to believe that a serious full AI research project will not take reasonable precautions against releasing a random AI.

And no, don't bring up kits again. The kind of society that has AI kits that any script kiddie could use is the kind of society where FAIs are already protecting us from hostiles.

Junghalli wrote:BTW, this discussion does reveal another possible danger with FAI: a well-intentioned but misguided designer creating a crippled FAI that is hamstrung by too high a priority placed on following human orders. It's become quite clear that to be really effective at defending humankind from UFAI a FAI may have to be willing to occassionally go against our desires for the sake of protecting us from the consequence of stupid decisions, or rational decisions based on inaccurate or incomplete data where the AI knows better. You could compare this to the way human parents sometimes have to go against the desires of their children for their own good. I'm sure lots of humans would bristle at this comparison but it's not inaccurate: compared to a superintelligence we would be like children.

Except I proposed no such scenario. The FAI has to take into account human psychology, namely how we are a tribal ape that has only pretentions of rationality, yet critically dependent on the hideously ensnarled and fragile life-support system of our infrastructure — and mostly ignorant about it. Even when we're being 'rational,' we make decisions that endanger the very planet we live on; one can only imagine the kind of havoc we can do to ourselves under a global panic. To protect us, the FAI has to avoid letting (or making) us go ape, because the Rioting Apes scenario is only marginally less disasterous to us as a truly hostile AI. This is principally what hamstrings the FAI, and it's wrapped up in the very definition of friendliness. No FAI can avoid it, no matter how superintelligent it is.

To carry your analogy further, yes we are as children to the AI, but we are children that are standing on the very edge of a skyscraper's rooftop threatening to throw ourselves off, seriously believing we can fly to safety. That changes the parenting equation drastically.

Junghalli wrote:* If nothing else, ubiquitous AI creation will happen when computers capable of running a detailed brain simulation are available at future CompUSA and there are ultra-detailed medical brain scans floating around. At that point you just have to plug that scan into your simulator, give it access to the simulation program's code so it can self-modify, and you have the seeds of a superintelligence. At that point the idea that every AI builder will scrupulously implement a complex and cumbersome adversarial confinement scheme is pretty laughable.

Again, your "kit argument" precludes the kind of society that is vulnerable to hostile AIs. Drop it; this canard will avail you naught.

Wyrm · Post by **Wyrm** » 2009-09-08 11:09pm

Such a scenario may be possible, though I'd bet on software written by a superintelligence over security software written by humans.

When superintelligent AIs are possible, almost all security software will not be written by humans. They will be written by expert systems, which will not take any shortcuts and make the security as tight as it can possibly be.

Yes, the superintelligent AI may be smarter than the expert systems. But I'm smarter than my house door. That doesn't mean I can think my way through it.

Samuel · Post by **Samuel** » 2009-09-08 11:38pm

If it fights and makes good on the Graduation Day definition, the FAI will be responsible (directly and indirectly) for the deaths of the great majority of the human populaiton

If the Earth was entirely uniform, maybe. As it is the majority of the population and the densest hubs of industry are not correlated. Of course if you have an AI which considers the deaths of many now a reasonable sacrifice for the future benefits to humanity than it will go ahead and kill enough to insure humanity knows resistance is useless.

Everyone who is sustained on drugs keeping a chronic disease at bay, dead.
Everyone living further than a horsedrawn carriage away from a farm, dead from starvation.
Everyone dependent on refined fuels to keep them warm, dead from exposure.
Everyone without a useful skill in a stone-age world, dead from starvation.
Everyone who catches a serious bacterial disease, or whose wounds turn septic, dead from blood poisoning.

What- we tear up the roads, bomb our own buildings and kill all the doctors? We are talking about the elimination of all connected computer systems, not a nuclear attack.

the only kind of AI that will not fold to a serious, concerted counterattack by humanity that will drain our infrastructure

And humanity will unite together... how? What do you do when the communications go down?

I've already outlined how the FAI can earn our trust. Lots of time and good behavior. No sneaky stuff.

Which is useless. Than an unfriendly AI will simply play nice until it reaches graduation day than exterminate humanity.

as it will by definition not be a party to almost complete genocide

Unless it considers the good of humanity or the existance of intelligent species paramount, in which case killing billions to insure the continuation of humanity is a completely reasonable tradeoff.

And no, don't bring up kits again. The kind of society that has AI kits that any script kiddie could use is the kind of society where FAIs are already protecting us from hostiles.

Why? Once a team gets a positive result, won't everyone know exactly what approach is required?

To protect us, the FAI has to avoid letting (or making) us go ape, because the Rioting Apes scenario is only marginally less disasterous to us as a truly hostile AI. This is principally what hamstrings the FAI, and it's wrapped up in the very definition of friendliness. No FAI can avoid it, no matter how superintelligent it is.

To carry your analogy further, yes we are as children to the AI, but we are children that are standing on the very edge of a skyscraper's rooftop threatening to throw ourselves off, seriously believing we can fly to safety. That changes the parenting equation drastically.

Why would all humanity unite against an AI? After all, we aren't rational- even if someone declares that the AI will exterminate humanity, those who have benefited from it might simply ignore it- after all, if true it means their whole world is going to be destroyed and people are good at self delusion. What makes you think that the mass of humanity, many of whom are poor will view the AI as an enemy or an alien? After all, if it gives them hope and a better future the more likely responce is to view it as a God. Humanity will not have a single unified responce- any such situation will result in civil war- except the AI is like The Foundation. Where rebellion occurs or looks like it might, the economy dies.

Junghalli · Post by **Junghalli** » 2009-09-09 01:29am

Wyrm wrote:Your FAI does have its hands tied behind its back, period.

Here's the major problem with Graduation Day: if the humans are willing to go that far as to disrupt our own infrastructure to get at the AI, then by definition, the graduated FAI must give up long before it comes to that point. We are creatures that are, in our current state, critically dependent on our infrastructure

So a friendly AI attempting a forcible takeover will wait to reveal itself until it is powerful enough to both be able to protect itself and be able to either stop humans from destroying their own means of support or keep them alive in spite of it. Or it will employ a Graduation Day strategy that doesn't involve suborning our internet so destroying our own infrastructure wouldn't help us. Or both. With reasonably feasible technology it would be quite possible to take complete control of our planet while having to kill very few if any humans.

I've already outlined how the FAI can earn our trust. Lots of time and good behavior. No sneaky stuff.

I'm going to put my money right now on the steps that will work: the FAI is created and demonstrates good behavior, voluntarily confining itself to its cage while coming into contact with as many humans as it can reach. Only when a large proportion of the planetary human population accepts the FAI, enough to keep themselves in line should some of them panic, and relents on its confinement, will the FAI be allowed to take such action to protect us from HAIs.

This would indeed be a strategy with a good probability of eventual success (although it is slow, which makes it risky, as there's lots of time for UFAIs to possibly be produced and escape while the FAI is gradually convincing us to trust it). Of course, a hostile AI would also have an good chance of eventual success feigning friendliness and using this strategy. Which, as you may remember, is the original point I made that started this whole tangent.

TL;DR version: You don't have a probability. You don't even have a formula for a probability, or the parameters. Until we know this problem can be cracked in a reasonable amount of time, and that we have good estimates on the parameters involved, the AI we purposefully create here is as likey to be friendly as any random one popped out of any other institution.

Probability is unquantifiable, but I think saying that it is random is going too far. If you have three AIs, one has a goal system carefully designed to make it human friendly, one has a biomorphic goal system (survival as primary goal), and one is deliberately programmed to be hostile to humans, you don't need a hard mathematical probability to judge which of these are most and least likely to be friendly. You are better off with the system that actually requires a design failure to be unfriendly (hint: only one of these does). By your logic if you had to trust one of these entities you should just choose one at random.

I find it hard to believe that a serious full AI research project will not take reasonable precautions against releasing a random AI.

Going by what Starglider says how little attention many real life AI researchers pay to the necessity of containment I'm not sure I share your optimism. And that isn't even going into the very real possibility that adversarial confinement will fail even if it is implemented.

And no, don't bring up kits again. The kind of society that has AI kits that any script kiddie could use is the kind of society where FAIs are already protecting us from hostiles.

Probably you are right, unless we assumed a ban on AI research or a (successful) policy of perpetual adversarial containment. Such a policy could plausibly delay development and/or release or AGI until computing technology gets good enough that rogue individuals and small organizations could do it. But once computers do get good enough for that even totalitarian control is probably going to fail sooner or later. Which was my point in raising the issue: attempts to just ban the technology or highly desirable implementations of it will almost certainly fail because sooner or later computers are likely to get good enough that illegal bootleg AGI creation is readily feasible.

A clarification; the word "kits" was very ill-chosen, and I really should have been more careful with my word choice. I meant to refer simply to the point where hardware was good enough that an AGI could be readily created using simple off the shelf hardware you could buy from future CompUSA as long as you had the necessary skill. If nothing else, as I said, all this requires is a situation where you could create an AI using off the shelf hardware and modified medical brain imaging software.

Wyrm · Post by **Wyrm** » 2009-09-09 09:46pm

Samuel wrote:If the Earth was entirely uniform, maybe. As it is the majority of the population and the densest hubs of industry are not correlated. Of course if you have an AI which considers the deaths of many now a reasonable sacrifice for the future benefits to humanity than it will go ahead and kill enough to insure humanity knows resistance is useless.

The higher the body count, the more unpalpable the option will seem to the AI, if it's really friendly.

Samuel wrote:What- we tear up the roads, bomb our own buildings and kill all the doctors? We are talking about the elimination of all connected computer systems, not a nuclear attack.

The manufacturing, distribution, and sale of these items are streamlined and made more efficient by computers, and in many respects now depend on them. How do you expect to buy anything with the credit card companies and the banks unable to make transactions, or even know how much money/credit you have at your disposal? How do the supermarkets make their orders without computers talking? How do you make sure the right part gets to the right plant? How do the hospitals and pharmacists let their distributers know when they have run out of vital drugs? How do thirsty service stations order more gasoline to fill the cars? Without the computer infrastructure to keep everything moving, the other infrastructures will be stressed to their breaking point. Then things really start going downhill.

Samuel wrote:And humanity will unite together... how? What do you do when the communications go down?

The apes panic and start tearing apart every piece of technology that might be an arm of the AI, and our infrastructure will be drained even more. Damaged even.

Samuel wrote:Which is useless. Than an unfriendly AI will simply play nice until it reaches graduation day than exterminate humanity.

Are you even paying attention to the argument? The FAI does this because its a non-hostile way of locking down the net, and the only way to ensure lasting success. It doesn't prevent a truly hostile AI from using the tactic as well, and I doesn't pretend it does.

Samuel wrote:
as it will by definition not be a party to almost complete genocide
Unless it considers the good of humanity or the existance of intelligent species paramount, in which case killing billions to insure the continuation of humanity is a completely reasonable tradeoff.

Except that would only be palpable if some global disaster were immient, but the only thing of that scale that we have discussed so far is AI. Is the introduction of AI that destablizing to the world that it would, by necessity, require the destruction of billions of humans to ensure stability? Way to prove that all AI are inherently hostile!

Samuel wrote:
And no, don't bring up kits again. The kind of society that has AI kits that any script kiddie could use is the kind of society where FAIs are already protecting us from hostiles.
Why? Once a team gets a positive result, won't everyone know exactly what approach is required?

Red herring. "AI researchers knowing what is required for AI" is quite a different thing from saying "there will be AI kits avaible."

Samuel wrote:Why would all humanity unite against an AI?

All of mankind uniting against the AI isn't required, Samuel. All it takes is a relatively few number of apes going... well, ape... to disrupt enough of the infrastructure to send it all going down into an uncrecoverable tailspin.
__________

Junghalli wrote:
Wyrm wrote:Your FAI does have its hands tied behind its back, period.

Here's the major problem with Graduation Day: if the humans are willing to go that far as to disrupt our own infrastructure to get at the AI, then by definition, the graduated FAI must give up long before it comes to that point. We are creatures that are, in our current state, critically dependent on our infrastructure
So a friendly AI attempting a forcible takeover will wait to reveal itself until it is powerful enough to both be able to protect itself and be able to either stop humans from destroying their own means of support or keep them alive in spite of it.

How do you keep hundreds of millions (perhaps billions) of humans, once they realize they are in a technology trap of their own cities, from fleeing into the countryside in panic, where things turn from bad to worse?

Junghalli wrote:Or it will employ a Graduation Day strategy that doesn't involve suborning our internet so destroying our own infrastructure wouldn't help us.

Every Graduation Day scenario involves thwarting infrastructure in some way. Without infrastructure, any attempt to secure yourself grinds to a halt and will be easily overcome.

Let's suppose the AI tries to secure itself without thwarting infrastructure and see what happens. The AI transfers into a cyber-enhanced millitary installation (using the internet to only get to that secure location). However, that installation will need fuel, ammunition and spare parts to keep it running at full force. But that fuel, ammunition and spare parts all are distrubuted by infrastructure. If the humans cut off the infrastructure, then sooner or later, the cybertanks will run out of fuel, their guns run out of shells, and the machines wear out and cannot be repaired for lack of spare parts, and the AI's millitary installation turns into a rust-heap. Eventually, the AI has to turn its attention to subverting our infrastructure, if only to keep its cybertanks supplied.

Junghalli wrote:Or both. With reasonably feasible technology it would be quite possible to take complete control of our planet while having to kill very few if any humans.

As I've stated before, "reasonably feasible technology" and its ilk are very highly suspect statements.

Junghalli wrote:
I've already outlined how the FAI can earn our trust. Lots of time and good behavior. No sneaky stuff.

I'm going to put my money right now on the steps that will work: the FAI is created and demonstrates good behavior, voluntarily confining itself to its cage while coming into contact with as many humans as it can reach. Only when a large proportion of the planetary human population accepts the FAI, enough to keep themselves in line should some of them panic, and relents on its confinement, will the FAI be allowed to take such action to protect us from HAIs.
This would indeed be a strategy with a good probability of eventual success (although it is slow, which makes it risky, as there's lots of time for UFAIs to possibly be produced and escape while the FAI is gradually convincing us to trust it). Of course, a hostile AI would also have an good chance of eventual success feigning friendliness and using this strategy. Which, as you may remember, is the original point I made that started this whole tangent.

When I said that the steps "will work", I meant for our FAI. The FAI needs to earn our trust.

Junghalli wrote:
TL;DR version: You don't have a probability. You don't even have a formula for a probability, or the parameters. Until we know this problem can be cracked in a reasonable amount of time, and that we have good estimates on the parameters involved, the AI we purposefully create here is as likey to be friendly as any random one popped out of any other institution.
Probability is unquantifiable, but I think saying that it is random is going too far. If you have three AIs, one has a goal system carefully designed to make it human friendly, one has a biomorphic goal system (survival as primary goal), and one is deliberately programmed to be hostile to humans, you don't need a hard mathematical probability to judge which of these are most and least likely to be friendly. You are better off with the system that actually requires a design failure to be unfriendly (hint: only one of these does). By your logic if you had to trust one of these entities you should just choose one at random.

When I said "random AI", I was not considering AIs that were concerned with their own survival, nor AIs that were deliberately hostile to humanity. We're creating these things to be useful to humanity, and one of the primary requirements to be useful to humanity is to be safe for humanity, hence friendliness. That's why I added "popped out of any other institution" after "random one"; the stuff that comes out of serious research institutions — the places where the brainpower will be concentrated to work on the problem — will AIs of this type. Thus, the population you are drawing from is "AIs designed to be friendly." You're not comparing probabilities of possibly friendly AIs to probably hostile AIs, you're comparing possibly friendly AIs to other possibly friendly AIs. My point holds.

Junghalli wrote:Going by what Starglider says how little attention many real life AI researchers pay to the necessity of containment I'm not sure I share your optimism. And that isn't even going into the very real possibility that adversarial confinement will fail even if it is implemented.

Starglider is severely jumping the gun. Current research can carry on like this because the research isn't even close to creating an intelligence on par with our own, let alone superior.

Junghalli wrote:Probably you are right, unless we assumed a ban on AI research or a (successful) policy of perpetual adversarial containment.

Which is why a FAI in adversarial containment will be working to gain the trust of everyone on the planet. Do pay attention.

Junghalli wrote:Such a policy could plausibly delay development and/or release or AGI until computing technology gets good enough that rogue individuals and small organizations could do it. But once computers do get good enough for that even totalitarian control is probably going to fail sooner or later. Which was my point in raising the issue: attempts to just ban the technology or highly desirable implementations of it will almost certainly fail because sooner or later computers are likely to get good enough that illegal bootleg AGI creation is readily feasible.

A clarification; the word "kits" was very ill-chosen, and I really should have been more careful with my word choice. I meant to refer simply to the point where hardware was good enough that an AGI could be readily created using simple off the shelf hardware you could buy from future CompUSA as long as you had the necessary skill.

The bolded part is the important stumbling block. Even if the necessary hardware is cheap, until serious research is able to package up the messy details into some sort of easy-to-use kit, no one will bother aquiring the necessary expertese (on the order of a Ph.D.) to create an AI unless they were going to go into AI research. Even so, AI research will probably require sizable teams of experts on AI. The bottleneck is in brainpower, not computer power.

Junghalli wrote:If nothing else, as I said, all this requires is a situation where you could create an AI using off the shelf hardware and modified medical brain imaging software.

To convert a brain scan into an AI would require an intimate understanding of both neurobiology and AI. If anything, it would be a much tougher problem to crack than even FAI.

Samuel · Post by **Samuel** » 2009-09-09 10:37pm

And humanity will unite together... how? What do you do when the communications go down?
The apes panic and start tearing apart every piece of technology that might be an arm of the AI, and our infrastructure will be drained even more. Damaged even.

I'm not seeing how that answers anything. The media can simply claim civil disorder has shut down communications in area x.

Except that would only be palpable if some global disaster were immient, but the only thing of that scale that we have discussed so far is AI.

Other people trying to create AIs that it feels will be more ruthless than itself...

Can AIs work off of "we had to destroy the village to save it" mentality or is that purely human?

All of mankind uniting against the AI isn't required, Samuel. All it takes is a relatively few number of apes going... well, ape... to disrupt enough of the infrastructure to send it all going down into an uncrecoverable tailspin.

How?

StarDestroyer.Net BBS

Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie