Robots Learn How to Lie

Junghalli · Post by **Junghalli** » 2009-09-09 11:33pm

Samuel wrote:I'm not seeing how that answers anything. The media can simply claim civil disorder has shut down communications in area x.

This is a point worth expanding on. A superintelligence should easily be able to imitate humans in electronic media (computers, TV, radio, telephones etc.). If an AI has really suborned much of the computers on the planet it will be able to easily feed people false information about what's going on. At that point it shouldn't be too hard for it to fake calls, announcements etc. from human authorities that everything is under control. In an AI takeover scenario it very likely wouldn't be too hard for the AI to see to it that most people never get a clue what's really going on until it's too late to do anything about it.

Wyrm wrote:The higher the body count, the more unpalpable the option will seem to the AI, if it's really friendly.

Of course, if it came down to a choice between saving humanity at the cost of a high body count and letting an AI that desires the destruction of humanity go about its agenda unopposed a well-designed FAI would take the high body count as the lesser of two evils. A well-designed FAI will try to minimize disruption to human civilization in getting to the point where it can protect us from future UFAIs that may arise, but if it senses a well-defined threat already out there it will go all out if it has to.

How do you keep hundreds of millions (perhaps billions) of humans, once they realize they are in a technology trap of their own cities, from fleeing into the countryside in panic, where things turn from bad to worse?

Slap-drone each person (assign them a robot which watches them continually and restrains them from doing anything stupid). Alternately, set up a cordon around the cities and contain everybody who tries to leave using nonlethal methods (knockout gas, shock stunning etc.). Of course, these both require substantial industrial capacity, see below.

Let's suppose the AI tries to secure itself without thwarting infrastructure and see what happens. The AI transfers into a cyber-enhanced millitary installation (using the internet to only get to that secure location). However, that installation will need fuel, ammunition and spare parts to keep it running at full force. But that fuel, ammunition and spare parts all are distrubuted by infrastructure. If the humans cut off the infrastructure, then sooner or later, the cybertanks will run out of fuel, their guns run out of shells, and the machines wear out and cannot be repaired for lack of spare parts, and the AI's millitary installation turns into a rust-heap. Eventually, the AI has to turn its attention to subverting our infrastructure, if only to keep its cybertanks supplied.

Naturally, no superintelligence would be that stupid. What it would likely do is try to gather human collaborators who it could use to help build it an automated factory (suborning the dictator of some Third World country would probably be ideal). Getting them to build it a universal assembler would be ideal as it would allow you to do much or all of your work from small, easily hidden facilities, but if it's impractical with the available technology and resources it's not necessary, nor does the first factory have to be anything but a horrible kludge. All that matters is it's good enough to produce a second, better version of itself. From there bootstrapping up a robot army capable of easily defeating humanity is a simple matter of exponential growth in classic Von Neumann fashion. Of course, such a strategy will probably not be quick, and hiding all this activity from humans would be the big challenge, but there are conceivable ways that could be done (the optimal strategy would depend on exactly what technology and resources the AI had access to).

When I said that the steps "will work", I meant for our FAI. The FAI needs to earn our trust.

Indeed, it is probably the least risky strategy.

BTW I take it you have quietly conceeded your earlier point that a truly friendly AI would not want to be de-boxed.

When I said "random AI", I was not considering AIs that were concerned with their own survival, nor AIs that were deliberately hostile to humanity. We're creating these things to be useful to humanity, and one of the primary requirements to be useful to humanity is to be safe for humanity, hence friendliness. That's why I added "popped out of any other institution" after "random one"; the stuff that comes out of serious research institutions — the places where the brainpower will be concentrated to work on the problem — will AIs of this type.

Hopefully. Depending on how good computing technology is it's hardly inconceivable that AIs could be created by people we really don't want to see creating them, like terrorist groups. Such people would both be highly likely to deliberately create an AI that's hostile to the interests of many humans and to go about building their monster in an inept way that causes it to turn into a full-blown anti-human UFAI. We really, really want to have FAI ready to go by the time computer technology gets good enough that you don't need a huge super-expensive project to create an AGI.

The bolded part is the important stumbling block. Even if the necessary hardware is cheap, until serious research is able to package up the messy details into some sort of easy-to-use kit, no one will bother aquiring the necessary expertese (on the order of a Ph.D.) to create an AI unless they were going to go into AI research. Even so, AI research will probably require sizable teams of experts on AI. The bottleneck is in brainpower, not computer power.

That's hardly incredibly reassuring. AI researchers are human beings with human failings. Get enough of them running around and sooner or later one of them is bound to do something malicious or stupid.

Plus in the scenario I outlined these brain simulators would probably be a modification of programs that would already be used in medicine. You probably wouldn't need to be a world class programmer to convert them.

To convert a brain scan into an AI would require an intimate understanding of both neurobiology and AI. If anything, it would be a much tougher problem to crack than even FAI.

Oh yes, you'll probably be able to produce reasonably safe FAI long before it gets this far. This is an issue for misguided attempts to keep Pandora's box closed forever by simply banning AGI, or trying to mandate that every AI every produced be kept in a box in perpetuity. It is the reason those approaches are almost certainly unsustainable.

Samuel · Post by **Samuel** » 2009-09-09 11:53pm

What it would likely do is try to gather human collaborators who it could use to help build it an automated factory (suborning the dictator of some Third World country would probably be ideal).

A dictator would be bad. They can always flee and retire rich. What you want is someone who is a populist- promising sweeping reform, industrialization and improvements in the standard of living. Basically copy Hugo Chavez- blame problems on outside forces and use it as an excuse to keep control of information and stop questions about the industrialization or where the money is coming from. Help the impoverished so the government is wildly popular and any accusation of evil AI will be dismissed as propoganda or jealousy.

Wyrm · Post by **Wyrm** » 2009-09-10 09:47pm

Samuel wrote:I'm not seeing how that answers anything. The media can simply claim civil disorder has shut down communications in area x.

The people living in area X will not be able to remain ex-communicado for long. The claimed civil disorder will become actual civil disorder unless real communications are restored quickly. It's one of those vital infrastructures, you know — it has purpose in our society, not just a means for your mother-in-law to annoy you.

Samuel wrote:Other people trying to create AIs that it feels will be more ruthless than itself...

Where will they come from? The only places where they can come from is research facilities that will have a concentration of highly-trained brainpower, who would presumably understand the risks and program accordingly. Those people will have a fucking clue of what they're doing. This means that the likely products of those projects is more friendly AIs, which means that acceptance of the AIs may be sped up by showing that it can work multiple times.

The AIs are only dangerous when any schmuck with an AI-ready computer has their own AI base code to play with.

Samuel wrote:Can AIs work off of "we had to destroy the village to save it" mentality or is that purely human?

No FAI will use that excuse. It doesn't even work against humans with half a brain, and that FAI is supposed to be superintelligent, remember?

Samuel wrote:
All of mankind uniting against the AI isn't required, Samuel. All it takes is a relatively few number of apes going... well, ape... to disrupt enough of the infrastructure to send it all going down into an uncrecoverable tailspin.
How?

Sledgehammers being applied to computer boxes. This isn't hard.

_______________

Junghalli wrote:
Samuel wrote:I'm not seeing how that answers anything. The media can simply claim civil disorder has shut down communications in area x.
This is a point worth expanding on. A superintelligence should easily be able to imitate humans in electronic media (computers, TV, radio, telephones etc.). If an AI has really suborned much of the computers on the planet it will be able to easily feed people false information about what's going on. At that point it shouldn't be too hard for it to fake calls, announcements etc. from human authorities that everything is under control. In an AI takeover scenario it very likely wouldn't be too hard for the AI to see to it that most people never get a clue what's really going on until it's too late to do anything about it.

False information is, by definition, false. The communication infrastructure is there for the purpose of exchanging real information. If the information is not real, then any activity depending on that information fails in spectactular ways.

Junghalli wrote:
Wyrm wrote:The higher the body count, the more unpalpable the option will seem to the AI, if it's really friendly.
Of course, if it came down to a choice between saving humanity at the cost of a high body count and letting an AI that desires the destruction of humanity go about its agenda unopposed a well-designed FAI would take the high body count as the lesser of two evils. A well-designed FAI will try to minimize disruption to human civilization in getting to the point where it can protect us from future UFAIs that may arise, but if it senses a well-defined threat already out there it will go all out if it has to.

Except the FAI has to choose between ALL options availible to it, not just the ones you artificially shoehorn it into. It has been my argument that the FAI waiting for acceptance is the safest bet, and you've done little to harm that assertion. As long as that remains true, the FAI will not consider options that will lead to great loss of human life.

Junghalli wrote:
How do you keep hundreds of millions (perhaps billions) of humans, once they realize they are in a technology trap of their own cities, from fleeing into the countryside in panic, where things turn from bad to worse?
Slap-drone each person (assign them a robot which watches them continually and restrains them from doing anything stupid).

And where in the world is the AI going to get hundreds of millions of drones to survey everyone? They just don't appear out of thin air!

Furthermore, don't you see that just being out there is a danger to the fleeing humans? Those humans could only remain in the cities because the infrastructure was built around them to service them inside that city. Out in the wilderness, they're separated from that infrastructure and must instead rely on their (mostly nonexistent) wilderness survival skills. Even if they reach towns, they'll severely strain infrastructure that was never meant to carry that many people. Fleeing their metropolitan womb in the first place was the deadly mistake... the one that the FAI should've prevented.

Junghalli wrote:Alternately, set up a cordon around the cities and contain everybody who tries to leave using nonlethal methods (knockout gas, shock stunning etc.). Of course, these both require substantial industrial capacity, see below.

Cities cannot sustain themselves without a constant influx of materials. That requires at least trucks and trains and things going into the city and waste coming out. If someone sneaks out on the waste shipment, you can kiss your communications blackout goodbye. So trucks and things come in, but they won't go out. Someone is going to figure out that the city is a black hole for delivery trucks and not send them, which will cause everyone inside to run out of food and riot within days, starving within weeks while wallowing within their own filth and trash. Way to break it, hero.

Junghalli wrote:Naturally, no superintelligence would be that stupid. What it would likely do is try to gather human collaborators who it could use to help build it an automated factory (suborning the dictator of some Third World country would probably be ideal). Getting them to build it a universal assembler <snip>

Automated factories and universal assemblers are not in evidence. The first AIs will be built on human-built and -maintained machines, serviced by infrastructure designed to service humans. Neither universal assemblers nor automated factories are implied by the existence of first-generation AIs, and I refuse to get sidetracked into a discussion about hyperbolic technologies with unknown properties and limitations.

Junghalli wrote:
When I said that the steps "will work", I meant for our FAI. The FAI needs to earn our trust.
Indeed, it is probably the least risky strategy.

BTW I take it you have quietly conceeded your earlier point that a truly friendly AI would not want to be de-boxed.

If the FAI is forced to act as guarddog against HAIs, then it becomes the lesser of two evils, and the AI will be as ansy as anyone to be let out because it must have doubt concerning its own friendliness. It will be watching itself closest of all.

If anyone suggests immediate release, the AI would refuse. Fortunately, if the team doesn't have their heads up their asses (reasonable, given that only top researchers will be able to make such a creature), that will not come up and the AI will have time to examine itself closely.

Junghalli wrote:Hopefully. Depending on how good computing technology is it's hardly inconceivable that AIs could be created by people we really don't want to see creating them, like terrorist groups. Such people would both be highly likely to deliberately create an AI that's hostile to the interests of many humans and to go about building their monster in an inept way that causes it to turn into a full-blown anti-human UFAI. We really, really want to have FAI ready to go by the time computer technology gets good enough that you don't need a huge super-expensive project to create an AGI.

Drop the "terrorist group AI" bullshit. Terrorist groups may want to create an AI to kill the infidels, but they simply will not have the skill to create one. People who go through the steps to earn freaking Ph.D.'s in artificial intelligence are not going to go work for terrorist groups, and even with the offchance they do, they'll easily be able to tell their sponsors that such a thing could easily boomarang back on them! Also, an AI will likely require a full team of these experts to be successful, each specializing on a different aspect of the problem. A terrorist group would be unspeakably lucky to get one Ph.D. from one of these areas, let alone a whole team of them.

The only way they're going to do it is with a working AI to modify.

Junghalli wrote:
The bolded part is the important stumbling block. Even if the necessary hardware is cheap, until serious research is able to package up the messy details into some sort of easy-to-use kit, no one will bother aquiring the necessary expertese (on the order of a Ph.D.) to create an AI unless they were going to go into AI research. Even so, AI research will probably require sizable teams of experts on AI. The bottleneck is in brainpower, not computer power.
That's hardly incredibly reassuring. AI researchers are human beings with human failings. Get enough of them running around and sooner or later one of them is bound to do something malicious or stupid.

The difference is that Ph.D.'s will not be blind to their shortcomings. Also, you missed the point that a real AI will need a team of them. One loopy Ph.D. is not going to be able to fuck things up.

Junghalli wrote:Plus in the scenario I outlined these brain simulators would probably be a modification of programs that would already be used in medicine. You probably wouldn't need to be a world class programmer to convert them.

What sort of medicine would require recording a brain image?

Also, you'd need to be a world class Ph.D. in neurophysiology and AI to convert a neruon-based intelligence into a silicon-based intelligence.

Junghalli wrote:This is an issue for misguided attempts to keep Pandora's box closed forever by simply banning AGI, or trying to mandate that every AI every produced be kept in a box in perpetuity. It is the reason those approaches are almost certainly unsustainable.

Which, if you'd been paying attention, is not what I was advocating.

No, poking holes in someone else's position is not an endorsement for an opposing position. I jumped in when you claimed that an AI-in-a-box would be worthless, and argued with you about the details, but I never said that trying to keep Pandora's box was a viable perpetual strategy — in fact, because it isn't, I was kind of dispairing that it would be yet one more way that we'd destroy ourselves, as if global warming, peak oil, and all that shit wasn't enough.

Samuel · Post by **Samuel** » 2009-09-10 11:29pm

All of mankind uniting against the AI isn't required, Samuel. All it takes is a relatively few number of apes going... well, ape... to disrupt enough of the infrastructure to send it all going down into an uncrecoverable tailspin.
Sledgehammers being applied to computer boxes. This isn't hard.

Yeah, I don't see a few people being able to sufficiently distrupt the infrastructure with sledgehammers. I'm pretty sure they would be targeted by the FBI as luddite fanatics. You'd need alot broader base for smashing to happen.

Junghalli · Post by **Junghalli** » 2009-09-11 02:03am

Samuel wrote:A dictator would be bad. They can always flee and retire rich. What you want is someone who is a populist- promising sweeping reform, industrialization and improvements in the standard of living. Basically copy Hugo Chavez- blame problems on outside forces and use it as an excuse to keep control of information and stop questions about the industrialization or where the money is coming from. Help the impoverished so the government is wildly popular and any accusation of evil AI will be dismissed as propoganda or jealousy.

Good point. Of course, I have no doubt a superintelligence will be vastly better at this game than I am.

Wyrm wrote:I never said that trying to keep Pandora's box was a viable perpetual strategy — in fact, because it isn't,

So we have been agreeing violently for four pages now. I kind of wish you'd made your core position clearer from the beginning because we could have avoided this whole exercise (though I'll grant it hasn't been entirely uninteresting). As long as you acknowledge that FAI is our best bet then we agree on what really matters. The rest is just squabbling over details, and we could do that for another year if you like (I can see a whole bunch of things in your last post I'd dispute), but I don't find the possibility of more tedious nitpicking over the plausibility of this or that particular poorly contextualized scenario or strategy particularly appealing. Unless something new is brought up or I see something that really screams out for response or rebuttal this will be my last post on this subject, unless you really want to continue the discussion.

Wyrm · Post by **Wyrm** » 2009-09-12 02:52pm

Samuel wrote:
All of mankind uniting against the AI isn't required, Samuel. All it takes is a relatively few number of apes going... well, ape... to disrupt enough of the infrastructure to send it all going down into an uncrecoverable tailspin.
Sledgehammers being applied to computer boxes. This isn't hard.
Yeah, I don't see a few people being able to sufficiently distrupt the infrastructure with sledgehammers. I'm pretty sure they would be targeted by the FBI as luddite fanatics. You'd need alot broader base for smashing to happen.

By "few," I mean millions, dearheart. That's why there's a "relatively" there.

Also, sledgehammers are only the most violent aspect of the infrastructure takedown. If you discover your computer was forcefully infected from arriving internet packets, the first thing you do is take it offline and start scrubbing. All those computers going offline to be scrubbed clean will break up the net, disrupting the vital infrastructure we've spent decades putting together. The data has to be restored from backups, which have to be examined too, given that the infection has been there for a while. If they can't then the systems have to be rebuilt from scratch. There also has to be steps taken to make sure the system isn't reinfected, which will no doubt involve system changes that never go smoothly.

This a tremendous drain on resources, and while this is going on, the computer infrastructure will be fragmented and useless for the purposes it had been used for before.

Although the physical boxes aren't destroyed, they might as well have had a sledgehammer taken to them.

Junghalli wrote:So we have been agreeing violently for four pages now. I kind of wish you'd made your core position clearer from the beginning because we could have avoided this whole exercise (though I'll grant it hasn't been entirely uninteresting).

Fucksakes, Jung, you were all "Just create a friendly AI and you don't have to lock it up!" That was the position I was arguing against, because that's the position you seemed to be arguing for. I argued against it because the proposition is ill-defined, and the most straightforward ways to define it properly leads to disasterous results. The property of friendliness is likely too difficult to solve analytically, and will probably only yield to actual experiment. That is, we won't have AIs that have any guarantee of friendliness until we study real AIs in action — you have to confine them if for no other purpose but to observe them going about their business.

Ariphaos · Post by **Ariphaos** » 2009-09-12 04:23pm

Wyrm wrote: Fucksakes, Jung, you were all "Just create a friendly AI and you don't have to lock it up!" That was the position I was arguing against, because that's the position you seemed to be arguing for. I argued against it because the proposition is ill-defined, and the most straightforward ways to define it properly leads to disasterous results. The property of friendliness is likely too difficult to solve analytically, and will probably only yield to actual experiment. That is, we won't have AIs that have any guarantee of friendliness until we study real AIs in action — you have to confine them if for no other purpose but to observe them going about their business.

There are people who claim it is impossible to build a symbolic, non-evolutionary AGI, but they have so far proven nothing of the sort. If a such an AGI is possible, its reasoning, decision making, modeling, planning and acting abilities will be describable via something vaguely like a predicate logic. Proving friendliness in such a situation is the entire point.

Starglider · Post by **Starglider** » 2009-09-12 04:26pm

Sadly, I was not able to keep up with this thread. I feel compelled to point out, once again, that the following is tragically incorrect;

Wyrm wrote:The property of friendliness is likely too difficult to solve analytically, and will probably only yield to actual experiment. That is, we won't have AIs that have any guarantee of friendliness until we study real AIs in action — you have to confine them if for no other purpose but to observe them going about their business.

Firstly there is no argument for friendliness being 'too difficult to solve', other than the fact that some AGI architectures are inherently opposed to formal analysis (aside from anything else, the encoding of goals and knowledge in most connectionist architectures could best be described as holographic, and the computation cost of lossless extraction of that into a clean description tends to have exponential computational cost). Formal AI goal system theory has received very little academic attention to date, but (relatively) good progress is being made regardless.

Secondly you have no argument for why observation of behavior of particular AIs in in a particular simulated environment will generalise across all future incarnations of those AIs; because there is no such argument. Generalisation from black-box observation is worthless even if we make the unreasonable assumptions that your simulations are sufficiently close to reality and the AIs aren't being actively deceptive. The primary reason being that AGIs will inevitably obtain the ability to self-modify, and restructuring the foundations of the goal system automatically invalidates your basis for extrapolation; although this is not strictly necessary, simple binding drift in response to novel stimuli is quite sufficient to render such simple extrapolation useless on most connectionist architectures.

The argument that white-box observation might enable Friendly AI to succeed where pure theory would fail is much more reasonable, but the sheer difficultly of the task means that if you are relying on observation to provide more than that last few critical pieces, you will almost certainly fail. Given the danger of trying to keep AGIs of uncertain Friendliness contained, it is a far better idea to try and develop a solid FAI theory first, and only resort to experimental methods as an absolute last resort. Yudkowsky would object to even that, on that basis that no researcher would be qualified to decide when the risk is justified (cue long rant about biases and irrationality), but I am not quite that extreme.

I do not expect you to get this right, when there are numerous (otherwise) highly qualified AGI researchers who get it wrong. Usually they just make the ludicrously anthropomorphic assumption that all intelligences are benevolent by default and that 'evil intent has to come from somewhere, and we won't introduce it'. Still, it's annoying.

Starglider · Post by **Starglider** » 2009-09-12 04:40pm

Xeriar wrote:If a such an AGI is possible, its reasoning, decision making, modeling, planning and acting abilities will be describable via something vaguely like a predicate logic.

Since all AGI designs are relatively compact pieces of software, running on hardware that can be effectively treated as deterministic, technically they're all describable by predicate logic. For an FAI proof, you need compact models that can make hard predictions about the kind of optimisation pressure the system will apply to the universe, in particular the future configuration of the system's decision structure. The three most critical pieces are probably reflective stability of the core goals, 'causal cleanliness' and binding stability. The first is probably self-explanatory, the second basically means ensuring all activity has a valid expected utility justification (harder than it sounds in the practical world of reasoning under constraints - ensuring adequate scope is really a generalisation of the classic 'frame problem'). The third part means guaranteeing that the sofware inputs to the general utility function remain correlated with external universe structures you originally specified, even under complete structural revision, and from where I'm sitting that currently looks like the hardest part of a formal proof.

Then there is the question of Friendly goal system content, but that's not so much a technical question as an ethical one (though encoding any such scheme accurately is certainly a technical challenge).

Ariphaos · Post by **Ariphaos** » 2009-09-12 07:25pm

I don't think we would want a fully deterministic AGI, even if such a thing was possible. It will ultimately need to make guesses, and use entropy for doing so - and occasionally for the sole purpose of unpredictability within some constraints.

Naturally, you don't want it doing that to its ethics, supergoals, utility determination, etc. I would not want it to restructure itself without supervision. Ever.

My main quibble with the 'seed AI' concept as a rule is that it assumes no iterative improvement in humans, rather than say, you build... I don't know, call it a gamete AI, which helps improve its creators, who improve it, and so on, eventually resulting in the creation of a truly and mathematically provably 'fair' system authority, and most of humanity as superintelligences effectively operating under what (to them...us...whatever) amounts to a legal system that they have agreed to abide to - which could even vary significantly from star system to star system.

Starglider · Post by **Starglider** » 2009-09-13 05:54am

Xeriar wrote:I don't think we would want a fully deterministic AGI, even if such a thing was possible. It will ultimately need to make guesses

That isn't what 'deterministic' means. Deterministic means that for any given set of inputs, the output is always predictable (using a relatively simple function). Computing hardware is almost deterministic for single-threaded programs, parallelism can introduce unpredictability due to timing issues but usually not in a way that is relevant for AGI. 'Guesses' involving the use of probability theory, and even software RNGs are still deterministic operations. Hardware RNGs are not, but firstly they're treated as input data like any other unpredictable world data, and secondly hardly anyone is using them (in AGI) anyway.

In any case the kind of determinism required for FAI theory is not low-level determinism; for one thing the available computing power for any given cognitive task is unknown even if the information available is known. Nor is it the ability to predict the actual output of an intelligence using less computational power than the intelligence itself did; that is inherently impossible for near-optimal intelligences. What we require is the ability to prove that the FAI will not violate certain structural and behavioural constraints. This is (fortunately) a far more tractable task than trying to find a fully predictive theory of AGI behaviour more compact than the AGI itself.

and use entropy for doing so - and occasionally for the sole purpose of unpredictability within some constraints.

I think it was Yudkowsky who told me that any time you are tempted to use random numbers in your AGI design, you have failed as a designer. I would agree with that assessment. Once upon a time I was a fan of 'noise injection' for Bayesian systems, and of course genetic programming is highly reliant on random number generation (to the point that swapping out the standard RNG for a custom higher-randomness one once greatly improved a commercial GP project I was working on). However a normative intelligence can always do better than random chance. The only use for an RNG is in some esoteric game-theoretic scenarios where the AI competes against a copy of itself (or hypothetical similar AGI design).

My main quibble with the 'seed AI' concept as a rule is that it assumes no iterative improvement in humans,

Humans don't have the ability to improve our own hardware. All you can do is educate people, a process which has strict limits. You can't double your short term memory capacity (from 7+-2 chunks to 14+-4 chunks) no matter how hard you wish for it to happen.

You can use technology to interface human brains to computers, or upload them entirely, and improve them that way, but that's a whole different ball game. Theoretically you could use genetic engineering, but I don't expect that to get social approval any time soon.

I don't know, call it a gamete AI, which helps improve its creators, who improve it, and so on,

That has been seriously proposed as an FAI scheme; build an AGI that works out how to upload some humans, then proceed from there. It isn't unreasonable, although I would say that if we have the technology to do that reliably we could probably do better. However it turns out that 90% of the effort in solving FAI is not directly related to the goal system content; you still need to do the theoretical work regardless of whether your FAI's goal is to upload humans and 'help them evolve', or be independently Friendly. Strange as it seems, the actual Friendliness part is 'implementation detail' from a theoretical point of view; if you wanted to reliably turn as much of the solar system as possible into cheesecake, you'd need the same theory.

eventually resulting in the creation of a truly and mathematically provably 'fair' system authority, and most of humanity as superintelligences effectively operating under what (to them...us...whatever) amounts to a legal system that they have agreed to abide to - which could even vary significantly from star system to star system.

Humanlike intelligences and formal proofs of reliability are pretty much inherently incompatible. That said, of course transhuman uploads would be in a much better position to design a 'final FAI design' based on normative logic than current researchers are, which is probably what you meant.

Wyrm · Post by **Wyrm** » 2009-09-13 12:05pm

Starglider wrote:I feel compelled to point out, once again, that the following is tragically incorrect;

Wyrm wrote:The property of friendliness is likely too difficult to solve analytically, and will probably only yield to actual experiment. That is, we won't have AIs that have any guarantee of friendliness until we study real AIs in action — you have to confine them if for no other purpose but to observe them going about their business.
Firstly there is no argument for friendliness being 'too difficult to solve', other than the fact that some AGI architectures are inherently opposed to formal analysis (aside from anything else, the encoding of goals and knowledge in most connectionist architectures could best be described as holographic, and the computation cost of lossless extraction of that into a clean description tends to have exponential computational cost). Formal AI goal system theory has received very little academic attention to date, but (relatively) good progress is being made regardless.

So? No matter how much progress you make in any formal field, most problems will be undecidable. There's no guarantee that the statements you want to be decidable, such as the friendliness of a particular AI, will actually be decideable. And even if they are, there's no guarantee in the order of the computation involved. Sure, an AIs friendliness may be formally decidable, but it does you very little good if the number of computations involved is g64. There's also no guarantee that even a probabilistic answer would be useful in a reasonable number of computations.

What you and I can absolutely agree on is that we need to develop the theory of friendliness. Only when we have formally answered the question of friendliness, either decidable or not, practically computable or not, probabilistically computable or not, can we know if such a thing is a good risk.

Starglider wrote:Secondly you have no argument for why observation of behavior of particular AIs in in a particular simulated environment will generalise across all future incarnations of those AIs; because there is no such argument. Generalisation from black-box observation is worthless even if we make the unreasonable assumptions that your simulations are sufficiently close to reality and the AIs aren't being actively deceptive.

And who said it would be "black-box" observation? You added that adjective all by yourself.

Starglider wrote:The primary reason being that AGIs will inevitably obtain the ability to self-modify, and restructuring the foundations of the goal system automatically invalidates your basis for extrapolation; although this is not strictly necessary, simple binding drift in response to novel stimuli is quite sufficient to render such simple extrapolation useless on most connectionist architectures.

Irrelevant, as I posed no such "black-box" testing, nor was I talking specifically about connectionist architectures.

Starglider wrote:The argument that white-box observation might enable Friendly AI to succeed where pure theory would fail is much more reasonable,

Thank you, as that was basically what I was getting at.

Starglider wrote:but the sheer difficultly of the task means that if you are relying on observation to provide more than that last few critical pieces, you will almost certainly fail. Given the danger of trying to keep AGIs of uncertain Friendliness contained, it is a far better idea to try and develop a solid FAI theory first, and only resort to experimental methods as an absolute last resort. Yudkowsky would object to even that, on that basis that no researcher would be qualified to decide when the risk is justified (cue long rant about biases and irrationality), but I am not quite that extreme.

Starglider, absolutely no one was arguing that you cannot try to develop a theory of friendliness. However, given that friendliness is partially dependant on physical phenomena, that the AGI does modify itself, and the sheer state space that AIs can occupy, makes developing such theory —by any definition of the phrase— a tall order. Chrissakes, we can't even predict when (or if) we'll get commercially viable fusion power.

Starglider wrote:I do not expect you to get this right, when there are numerous (otherwise) highly qualified AGI researchers who get it wrong.

I... don't. How the hell are you going from my saying "the problem is likely to be insoluable" to "I think I can solve the (insoluble) problem, but people more qualified than me cannot"?

Starglider wrote:Usually they just make the ludicrously anthropomorphic assumption that all intelligences are benevolent by default and that 'evil intent has to come from somewhere, and we won't introduce it'. Still, it's annoying.

Agreed.

Junghalli · Post by **Junghalli** » 2009-09-13 06:36pm

Wyrm wrote:Fucksakes, Jung, you were all "Just create a friendly AI and you don't have to lock it up!" That was the position I was arguing against, because that's the position you seemed to be arguing for.

Then you simply misinterpreted my argument. If you look back at my arguments I didn't rule out confinement as a temporary measure during de-bugging phase, which is what you seem to be advocating (if I made statements that could be misinterpreted as saying what you think I was saying then sorry for the miscommunication). I was arguing against confinement as any sort of permanent solution.

On this position I agree with Starglider: white boxing during debugging phase is probably not a bad idea, but it's something that should be done after you've already got a working theory of friendliness as a last line of defense in case your friendliness design fails*, not something you want to rely on as a primary safeguard.

* Even there it's hardly the most reliable method. A friendly design could work perfectly well in a box only to fail spectacularly when it encounters some stimulus in the real world that your designers failed to plan for. To be clear, I'm not saying it's not worth it when I say that.

Perhaps we both could avoided this by making our actual positions clearer. We seem to have been more miscommunicating than actually disagreeing about core issues.

StarDestroyer.Net BBS

Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie

Re: Robots Learn How to Lie