Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrolled recurrent layers (RNN, LSTM) #1873

Closed
wants to merge 103 commits into from
Closed

Conversation

jeffdonahue
Copy link
Contributor

Based on #1872 (adds EmbedLayer -- not technically used here but often used with RNNs in practice, and will be needed for my examples), which in turn is based on #1486 and #1663.

This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

Examples of using these layers to train a language model and image captioning model will follow soon.

@jeffdonahue jeffdonahue force-pushed the recurrent branch 4 times, most recently from 19501cc to c38f9ac Compare February 16, 2015 08:15
@jeffdonahue jeffdonahue force-pushed the recurrent branch 3 times, most recently from 0f110c1 to 668ab41 Compare February 17, 2015 00:40
@jeffdonahue
Copy link
Contributor Author

I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

cd data/coco
./get_coco_aux.sh # download train/val/test splits
./download_tools.sh # download official COCO tool
cd tools
python setup.py install # follow instructions to install tools and download COCO data if needed
cd ../../.. # back to caffe root
./examples/coco_caption/coco_to_hdf5_data.py

Then, you can train a language model using ./examples/coco_caption/train_language_model.sh, or train LRCN using ./examples/coco_caption/train_lrcn.sh (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.sh).

Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

@jeffdonahue jeffdonahue force-pushed the recurrent branch 3 times, most recently from 872e47c to 716262a Compare February 17, 2015 00:57
@jeffdonahue jeffdonahue mentioned this pull request Feb 17, 2015
@niuchuang
Copy link

Could someone give me some guidance about how to construct a RNN with jeffdonahue's PR? I have downloaded the lrcn.prototxt , unfortunately I cannot understand most of its contents , such as include { stage: "freeze-convnet" }, include { stage: "unfactored" } and so on. In fact,I have some time sequence image data , each of which has a label. I have trained reference model in caffe with these data, and now I try to use RNN to classify them. What document I should read so that I can understand lrcn.prototxt and something like this,and then train a RNN model with my data. Much thanks !

@Kumaresh-Krishnan
Copy link

I have been able to train the LRCN model successfully.
Could someone guide me on how to test this model on a small set of images and also view the generated captions?
Thanks

@mostafa-saad
Copy link

Is it possible to get prototxt network example for the activity recognition case?
Is it possible to get some documentation about current fikes (e.g. lrcn.prototxt)?

@liuchang8am
Copy link

Same question as @Kumaresh-Krishnan, would appreciate any replies about "how to test", thanks.

@sxjzwq
Copy link

sxjzwq commented Jul 8, 2015

check this
#2033

@twinanda
Copy link

Is this still the most updated LSTM implementation on Caffe? Just wondering if there are any major updates not in this branch.

Anyway, has anybody tried bidirectional LSTM using this implementation? Some pointers on this one, please. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.