Pallavi1990 wrote: ↑2020-07-14 04:57amThank you for your help. I'm looking for something like that
Perhaps it is worth updating this:
Over the last decade we've seen a massive concentration of research effort into deep ANN models, to the extent that every other approach is getting only a trickle of attention. I don't think the field has been this focused since circa 1980, when symbolic AI was ascendant, the first wave of NNs had been discredited, and before the second wave NN models, statistical machine learning and Bayesian approaches had made an impact. No one expected this in the mid 2000s; sure NNs were going to keep making progress, but stastical ML was (and is) much more compute-efficient, Bayesian approaches were hotter and more promising, genetic programming was still relevant, the 'semantic web' guys were still going full bore on with classic symbolic AI etc.
The reasons for this are obvious, substantial, ongoing success on tackling hard and relevant problems (starting with machine vision) and the relative ease of applying ANN ML to arbitrary tasks, enabling widespread commercial adoption. It is thrilling to see widespread deployment of real machine learning; a lot of modern software would meet early (1960s) definitions of 'AI' but before the deep NN surge very little of it was black-box trained from data, rather than carefully engineered and calibrated by data. However it's also important to understand that this is absolutely another hype cycle, of inflated expectations followed by disillusionment. I don't think we're going to see another 'AI Winter' because what we already have still has massive untapped commercial potential and isn't at risk of being declared 'not really AI' to the extent earlier advances were, but we are going to see (and indeed are already seeing in the research community) a plateau and broadening of research back into non-NN (or at least, not classical-ANN, including hybrid) approaches.
The basic problem is that ANN progress was almost entirely due to the availability of 'big data'; the raw compute and storage, the frameworks to process at scale, and enormous data sets captured from various online, simulated and internet-of-things sources. This made it possible to do some very impressive pattern processing, for the first time. Many researchers tried to claim improvements in the algorithms themselves, but nearly all of them evaporated under scrutiny; researchers tend to optimise the hell out of their new idea while comparing to unoptimised or outdated competition, and often massage the test data to favour their research direction. As a result most practical deep learning is done with a small set of very simple models, at very large scale.
Throwing massive compute at the problem isn't a huge issue, although the contemporary slowdown in hardware scaling due to planar sillicon litho scaling petering out indicates this approach is going to stall in the near term. The real problem is the data sets; the most powerful and impressive examples use enormous data sets, on the scale of 'many millions of miles of dashcam footage' or 'a good fraction of all the text on the Internet'. This is always presented as a positive, something to be impressed by, but it's the exact opposite; it inidicates very low information processing efficiency in the ML algorithm. Which is better than having no way to do this without building it by hand, but is far far behind humans or just animals in general, who can learn things from a handful of examples that current ANN takes millions of examples to learn.
A good example is GPT-2 and related predictive text models, which have received a lot of press recently. Essentially a few quintillion compute operations are used to generate a kind of holographic compressed model of the many millions of documents in the training corpus. If you can imagine shining a light through an n-dimensional hologram in a novel direction, it will produce a new image by recombining patterns in the source material. It's impressive and even useful, but unfortunately it's also nothing like general intelligence and perhaps the most severe demonstration of the ELIZA effect (people assuming relatively simple algorithms are doing a lot more modelling and reasoning than they actually are) yet. In fact it is rather like the underlying mechanism of ELIZA, scaled to a billion or so layered production rules auto-generated from gradient descent on the training corpus, rather than a hundred or so handwritten ones. If it wasn't for the fact that NN model size has vastly outrun the limits of our NN analysis / rules extraction tools, we'd probably see a gaggle of philosophers presenting the revenge of Searle's Chinese Room argument (actually they do this regardless but if we could do a proper heuristic extraction on something the size of GPT-2 they'd at least have evidence).
The low-level mechanism of the brain does still look like a massively parallel array of heuristics, even if it has time domain processing and working recurrence & persistence that current NN models haven't really managed to implement (despite endless hype spiking NNs still aren't terribly useful compared to conventional ones). So it isn't clear what's missing that allows humans to do such impressive deduction and modelling, or even for relatively simple animals to learn so much more efficiently than ANN algorithms - though there are many, many theories (I have my own of course).
The upshot of this is that we are in a phase where there is exciting progress in rolling out the fruits of mid-2010s AI progress to practial applications, which will continue for some time, coupled with a kind of plateau at the research level. To be clear the more data, more compute trend is still going, especially for domains where the data can be produced by simulations e.g. getting ANNs to play video games. For that, pretty much as long as you can get a more powerful GPU you are going to show some progress. But in terms of real world problems, pressure to find alternate approaches is mounting; self-driving cars are kind of at the crux of this where many are still insistent that deep enough ANNs and enough million miles of dashcam/LIDAR data will suffice, while others are reverting to a larger fraction of conventional software engineering or trying novel AI approaches (to reach full level 5 capability). Of course with ANNs showing the most recent success, and still dominating mindshare, most researchers are still starting from there e.g. 'neural nets but what if each neuron had a turing machine attached, and we do gradient descent on possible states of that' (anything you can integrate can be attached to an ANN in principle).
Counter-intuitively, I think the ANN focus has reduced the near-term risk of superintelligence / seed AI takeoff, by focusing research effort on approaches that scale 'out' but not 'up' (e.g. they can't undergo recursive self-improvment). However it's probably increased the risk of a bad outcome in the longer term by getting everyone even more focused on and apologetic of / actively cheerleading 'black box' approaches. 'AI Ethics' is now focused on 'lets make sure our datasets are politically correct' and 'let's train things to be nice (except for the military researchers who are completely ignoring us)', the idea of 'let's make this provably safe' is nearly inconceivable for current models. I remain slightly optimistic that the pursuit of higher information efficiency (e.g. models that can self-train effectively from small data sets) and later models that are more amenable to self-modification will mean a preference for less opaque algorithms.