Talk:Recurrent neural network

This is the talk page for discussing improvements to the Recurrent neural network article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Cognitive science (inactive)

This article is within the scope of WikiProject Cognitive science, a project which is currently considered to be inactive.Cognitive scienceWikipedia:WikiProject Cognitive scienceTemplate:WikiProject Cognitive scienceCognitive science articles

Robotics

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
???	This article has not yet received a rating on the project's importance scale.

Computing: Software / CompSci

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles

???

This article has not yet received a rating on the project's importance scale.

This article is supported by WikiProject Software.

This article is supported by WikiProject Computer science.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Bidirectional associative memory[edit]

The orphaned article "Bidirectional associative memory" (BAM) references this article, claiming that BAM is a kind of RNN. If this is correct, then this article should mention BAM and reference the BAM article. Likewise if it is appropriate to mention "Content-addressable memory" in this context, this should be done (however the article about that is biased towards computer hardware technology and not machine learning). 195.60.183.2 (talk) 16:06, 17 July 2008 (UTC)[reply]

I added a few words and a ref. Dicklyon (talk) 06:04, 26 June 2010 (UTC)[reply]

Better now. Remove template?[edit]

I think the article is in much better shape now than it was a couple of months ago, although it still needs to be polished. But I guess one could take out this template now:

Epsiloner (talk) 15:41, 7 December 2010 (UTC)[reply]

Needs major review[edit]

This whole article needs a major rewrite. The word "recurrent" means that part or the whole output is used as input. The very first sentence "...where connections between nodes form a directed or undirected graph along a temporal sequence" is inaccurate (see quote below). A much better statement is "where the connection graph has cycles.". By the way, how can a neural network have "undirected" edges? Every node is a computation in one direction. And it just gets worse from that - there are *many* false statements in this article derived from this basic misunderstanding.

In the neural network literature, neural networks with one or more feedback loops are referred to as recurrent networks.
— Simon Haykin, "Neural networks: a comprehensive foundation", (p. 686)

Carlosayam (talk) 00:45, 18 August 2022 (UTC)[reply]

Disputed Statement[edit]

"RNN can use their internal memory to process arbitrary sequences of inputs."

Some types can, but the typical RNN has nodes with binary threshold outputs, which makes it a finite state machine. This article needs clarification of what types are turing-complete. — Preceding unsigned comment added by Mister Mormon (talk • contribs) 13:22, 18 December 2010 (UTC)[reply]

I don't understand what's disputed in that statement. An FSM can process arbitrary input sequences. And no RNN can be Turing complete, because they don't have unbounded memory (as far as any type I've heard of). Dicklyon (talk) 17:04, 18 December 2010 (UTC)[reply]

RNN with rational state variables are Turing complete. This is because, unlike machine numbers that have fixed finite precision, mathematical integers and rationals can encode arbitrarily long bit strings. In Siegelmann's construction of a universal Turing machine as an 886-node rational-state RNN, she used the rational numbers to encode stacks of unbounded depth. This works, because a single rational variable (or a single "mathematical integer" variable) immediately provides infinite memory. Practical RNN of course usually only have fixed-width machine numbers as states, in which case they are not Turing complete. 94.145.5.105 (talk) 16:30, 17 April 2019 (UTC)[reply]

Isn't 'arbitrarily long' part of 'arbitrary'? Without unbounded memory, some types of processing are impossible. I know of at least one Turing complete RNN: http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/Mister Mormon (talk) 20:09, 18 December 2010 (UTC)[reply]

There's no claim that all types of processing are possible, is there? Dicklyon (talk) 00:33, 19 December 2010 (UTC)[reply]

I'm having a hard time understanding or believing that paper about a finite RNN being Turing complete. Dicklyon (talk) 00:36, 19 December 2010 (UTC)[reply]

True, but that sentence can be interpreted more strongly; I still suggest a change. As for the paper, no learning algorithm is presented, so it isn't useful regardless of its power. Anyway, can't RNNs have unbounded memory if weights and node outputs are rational numbers? There are several papers where they can hypercompute if numbers are real. — Preceding unsigned comment added by Mister Mormon (talk • contribs) 12:03, 23 December 2010 (UTC)[reply]

Encoding things via unbounded precision would be a very different model, hardly relevant here. Go ahead and make improvements if you see a way. Dicklyon (talk) 19:21, 23 December 2010 (UTC)[reply]

Hardly relevant to 'recurrent neural network'? —Preceding unsigned comment added by 71.163.181.66 (talk) 22:26, 23 December 2010 (UTC)[reply]

Well, Hava Siegelman got a Science paper out of showing that a recurrent neural network with sigmoidal units and exact reals initialised with uncomputable values in the weights or units can compute uncomputable functions. And it turns out that by following this line of research she was able to close some long-open conjectures in circuit theory. Barak (talk) 17:34, 27 December 2010 (UTC)[reply]

Barak, thanks for that update. I'm not sure what it means, but not Turing complete, anyway. Dicklyon (talk) 20:57, 27 December 2010 (UTC)[reply]

Siegelmann's construction proved Turing completeness for RNN with rational number state variables, by implementing a two-stack universal Turing machine as an 886-node RNN. A single rational number variable -- or even an integer -- already provides the infinite memory, as any finite-length bit string can be encoded as a sum of powers of two. If uncomputable reals are allowed, one gets to super-Turing capabilities. But that's not so strange: if I'm allowed to pass in Chaitin's constant, it's no wonder if I can solve the Halting problem. 94.145.5.105 (talk) 16:49, 17 April 2019 (UTC)[reply]

Yeah, thanks. It's super-Turing complete. Seriously, are all published RNNs either finite or uncomputable? Where's the middle ground with rational/integer weights and no thresholds in literature? I would be surprised if there were none to be found, since sub-symbolic AI has been in use for 30 years.Mister Mormon (talk) 17:58, 28 December 2010 (UTC)[reply]

Hey, this paper on a Turing-complete net could be helpful: http://www.math.rutgers.edu/~sontag/FTP_DIR/aml-turing.pdf Mister Mormon (talk) 02:29, 10 September 2011 (UTC)[reply]

Elman network[edit]

about the picture, it seems to me that there supposed to be multi connections from the context layer forward to the hidden layer and not just one to one. although the save state connections from the hidden layer to the context are indeed one to one. ^[1] ^[2]

References

^ ELMAN, JEFFREY (1990). "Finding Structure in Time". COGNITIVE SCIENCE.
^ ELMAN, JEFFREY (1990). "Finding Structure in Time": 5. {{cite journal}}: Cite journal requires |journal= (help)

Indeed, the network in the image is not an Elman network, for the reason stated above. The image should be fixed, removed, or at least relabelled.

2604:3D08:2486:5D00:88FA:2817:2657:E641 (talk) 21:14, 6 May 2022 (UTC)[reply]

This is right! As far as i get it the original Elman network is an one-layer "fully recurrent neural network" (as in the goodfellow graphic) and the general case (several recurrent hidden layers) are also called hierarchical Elman network in older literature. So IMHO both sections should be merged to avoid confusion. 2A02:3038:40E:9963:C54C:D526:5581:9DB9 (talk) 15:13, 20 June 2022 (UTC)[reply]

The issue of the 'Elman network' is much larger than the point above. Over time there have been two uses of the term 'Simple Recurrent Network' and these are confused in this article and this has a knock on mistake in the PyTorch documentation. I know, I was very active in the research on early RNNs (see https://tonyrobinson.com/publications).

What happened was that the original back propagation work made it clear that recurrent networks were possible. However, people found it hard to code back propagation through time or thought it biologically implausible, so Jeff Elman introduced the Simple Recurrent network which sort-of-worked for many tasks. It was a hack. Getting the code right wasn't that hard, so I did a lot of shouting about it at the time.

Later LSTMs and GRUs came along. Now we needed a name for the simpler RNNs, and they are now known as Simple Recurrent Networks. Now we have a name clash, the 1990 Simple Recurrent Network and the RNNs of my PhD thesis. It seems that nobody has noticed, or cares, and that's fair enough given I can't find an online pdf of ELMAN, JEFFREY (1990). "Finding Structure in Time" right now - and not many of us are still going from the early days.

Given that PyTorch copies the Wikipedia mistake, it would be good to get it fixed here first. If I found the original paper would that be enough to get the ball rolling? To be clear, I'm not objecting to the reuse of the term 'Simple Recurrent Network', just that the cited Elman Network is not what was published so we should get that naming right. DrTonyR (talk) 17:15, 10 January 2024 (UTC)[reply]

I've found the PDF version of Elman now - it's at https://onlinelibrary.wiley.com/doi/epdf/10.1207/s15516709cog1402_1 (parent https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1). Figure 2 makes it very clear that the state is copied from one time frame to another and is not a trainable link, that is there is no back propagation through time. The paper also describes Mike Jordan's network where the state is the output. To be honest, both of these architectures are not relevant to this article or current practice, they are both approximations to the general case of back propagation through time which is what we use right now. I remain concerned that this article has resulted in very misleading PyTorch naming. I'm reluctant to edit the article myself as I was central to the development of recurrent networks and so could reasonable be accused of promoting my own work. I'm happy to assist an independent person. DrTonyR (talk) 10:35, 11 January 2024 (UTC)[reply]

Not true[edit]

QUOTE:In particular, RNNs cannot be easily trained for large numbers of neuron units nor for large numbers of inputs units. Successful training has been mostly in time series problems with few inputs.

Current(2013) state of art in speech recognition technique do use RNN. And speech require a lot of input. Check this: SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton. 50.100.193.20 (talk) 11:42, 5 August 2013 (UTC)[reply]

This wasn't even true in 1990 when I used RNNs for speech recognition (https://www.researchgate.net/publication/247173709_Phoneme_Recognition_from_the_TIMIT_database_using_Recurrent_Error_Propagation_Networks) DrTonyR (talk) 05:08, 8 December 2018 (UTC)[reply]

deepdrumpf[edit]

Can somebody add this if it is of note?

Thanks! --Potguru (talk) 19:01, 4 March 2016 (UTC)[reply]

Architectures[edit]

Hi, the Architectures sections currently contains 18 subsections. I think the readability of the article would be improved if there is some kind of structure or order in them. I do not (yet) have the knowledge to rearrange them. Maybe someone else is willing to do this. VeniVidiVicipedia (talk) 11:06, 2 January 2017 (UTC)[reply]

Terminology[edit]

This article is difficult for non-experts. It relies heavily on a lot of terminology and jargon without defining or explaining what much of it means. And so this article is only useful if you already know a lot about these concepts. 108.29.37.131 (talk) 19:03, 25 July 2018 (UTC)[reply]

Finite impulse response[edit]

The second and third paragraphs of this article talk about "finite impulse response", and are introduced with

The term "recurrent neural network" is used indiscriminately to refer to two broad classes of networks with a similar general structure, where one is finite impulse and the other is infinite impulse.

As one of the very first researchers in recurrent neural networks I find these two paragraphs very misleading. The whole point of a recurrent neural network is that it's recurrent, that is some of the output is fed back to the input. It is possible to do this in a restricted way that results in finite impulse response (e.g. time delay neural networks) but it is very misleading to give prominence to this uninteresting restricted subset.

Supporting evidence for the above can be seen in the whole of the rest of the article, in every case there is infinite impulse response (even in the case of Hopfield networks, and it's arguable whether they should be included here).

Also, the History section is very sparse (one brief mention of Hopfield then lots on TDNN). I believe that I was the first to write a PhD thesis on recurrent networks ("Dynamic Error Propagation Networks". A. J. Robinson. PhD thesis, Cambridge University Engineering Department, February 1989.) so if there is interest I can help flesh this bit out (in order not to violate the conflict of interest policy someone else should do the edits).

DrTonyR (talk) 04:59, 8 December 2018 (UTC)[reply]

Agreed. In fact, I wonder whether the paragraph is not just misleading, but false. I do not think anyone uses the term "finite impulse recurrent neural network", contrary to what the paragraph claims. The cited source in this paragraph does not use the term "recurrent neural network" for finite impulse response networks either. Are there any published papers where people use "RNN" for finite impulse networks? If not, delete this paragraph.192.76.8.68 (talk) 15:10, 28 July 2019 (UTC)Anonymous[reply]

Just fixed it. Atcold (talk) — Preceding undated comment added 20:20, 15 November 2021 (UTC)[reply]

False statement[edit]

"A finite impulse recurrent network is a directed acyclic graph" In general, a finite impulse recurrent network may contain cycles, it's just that these cycles are not directed. Only feedforward networks are truly acyclic. — Preceding unsigned comment added by 142.184.185.164 (talk) 03:15, 24 December 2019 (UTC)[reply]

References on RTRL[edit]

There are two references to RTRL and both seem to have issues. The first is to one of my publications but it leads to an empty Google books page. https://www.academia.edu/30351853/The_utility_driven_dynamic_error_propagation_network would be better. The second has a 2013 date, but I think it should be R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, June 1989. ISSN 0899-7667. doi: 10.1162/neco.1989.1.2.270.

Things seem to have changed since I last understood references here. I can't fix with my current knowledge, but I'll learn if needed. DrTonyR (talk) 08:06, 21 June 2020 (UTC)[reply]

[1] ELMAN, JEFFREY (1990). "Finding Structure in Time". COGNITIVE SCIENCE.

[2] ELMAN, JEFFREY (1990). "Finding Structure in Time": 5. {{cite journal}}: Cite journal requires |journal= (help)

[1]

[2]