How are weigths of layer "LinearizedConvolution" initialized? #10

weilheim · 2017-09-28T12:21:28Z

I would like to use your Conv S2S model in another task, so I hope to know the details of the model.
I have read the code defining class "LinearizedConvolution" and its parent layer "ConvTBC", however, I haven't seen any code doing weight initialization.

Could you please let me know how weigths of layer "LinearizedConvolution" are initialized?
Thanks!

myleott · 2017-09-28T15:12:33Z

Weight initialization happens here: https://github.com/facebookresearch/fairseq-py/blob/master/fairseq/models/fconv.py#L405-L407

@colesbury

Summary: Adds support for batched generation, improving speed by 8x for IWSLT from ~50wps to ~400wps with a batch size of 128. This version is still ~1.5x slower than the LuaTorch version, since we are not yet caching the convolutions across steps. I've also added a few optional features: - `--max-len-a/b`: maxlen is now a function of source len: `a*srclen + b` - `--no-early-stop`: the LuaTorch version stopped immediately after finalizing k=beam hypotheses, but since we compare in the length-normalized score space, it's possible that a longer hypothesis would score even better. Setting this option increases generation time by ~50%, with no consistent increase in accuracy. - `--unnormalized`: choose hypotheses based on the unnormalized scores. Note: currently generation requires re-specifying the model configuration on the command line, although ideally this would be stored in the model file. Test Plan: ``` python train.py ~/local/tmp/iwslt14_de-en \ --encoder-embed-dim 256 --encoder-layers '[(256, 3)] * 6' \ --decoder-embed-dim 256 --decoder-layers '[(256, 3)] * 6' \ --dropout 0.2 --clip-norm 0.1 --lr 0.25 \ --save-dir tmp python generate.py ~/local/tmp/iwslt14_de-en \ --path tmp/checkpoint_best.pt \ --encoder-embed-dim 256 --encoder-layers '[(256, 3)] * 6' \ --decoder-embed-dim 256 --decoder-layers '[(256, 3)] * 6' \ --dropout 0.2 \ --batch-size 128 ``` Using the new IWSLT dataset that @colesbury preprocessed (test set has 6750 sentences): - LuaTorch, epoch 15, trainloss 2.51, test BLEU 29.26 - PyTorch, epoch 17, trainloss 2.36, test BLEU 30.34

…arch#10) * Add option to assert on training and/or validation loss * applied suggestion

optimizer fix progress bar comment out temporarily some changes to train_tpu int mask instead of float pfpfpfpf fix printing device index per loop bkpt to investigate resize_ call attempting to init buffer size to 2*dim bkpt better print do not drop records when computing loss Changes that reduce graph compiles. * Loss function replaced with an equivalent logic that doesn't resize tensors. * cli args changed to guarantee consistency * collate_tokens function in fairseq/data/data_utils.py overwritten to guarantee consistency undoing some changes made while debugging progress_bar implements len some irrelevant changes to train_tpu.py new xla changes bug fix in enable_torch_version removing the last batch that is of diferent size from the iterator delete optimizer step in fairseq s trainer Added `self.xla` flag that controls if Trainer includes optimizer step + Tried to include more explanation why skip optimizer step this time deleted obsolete file add norm clipping count back in (#4) remove grad norm clip count (#5) Change masked_fill_ input in loss in order to accomodate necessary pytorch changes (#6) Adding tpu capabilities to train.py (facebookresearch#8) * Adding tpu capabilities to train.py * flush when printing for better user experience * separated cli_main into parse_args, maingpu and maintpu deleted unused line in datautils.py Enumerate the loader in training and validation (facebookresearch#9) * Adding tpu capabilities to train.py * flush when printing for better user experience * separated cli_main into parse_args, maingpu and maintpu deleted unused line in datautils.py * Enumerate the loader * enumerate the loader Add option to assert on training and/or validation loss (facebookresearch#10) * Add option to assert on training and/or validation loss * applied suggestion None loss should be filled to inf (facebookresearch#11) Enabling multiprocessing for fairseq training. (facebookresearch#12) * initial commit for multiprocess api * indentation fixes and import fix * no need to softlink, fix save/load * Remove the hacks to only save from master ordinal as xm.save takes care of that * fix indentation; 3 -> 4 spaces * Moved xu.eprints after spawn and dropping last batches better trainers->trainer (facebookresearch#13) fix bug in assert_on_losses Replace usage of unsqueeze with transpose + broadcasting (facebookresearch#15) remove attn mask + loss rewrite + save per host + format suppress loss report allow usage of batch_by_size in translation. attn_weights masked fill in place Clean up the log output suppressing a bit Revert multihead attn's in_proj code changes non-rebased tpu branch is about 10% faster on TPUs compared to the rebased branch. The regression is inside multihead attn's in_proj mechanism. Reverting the relevant changes to preserve performance. Pass correct args to the new get_valid_stats function Send meters to device in order not to fail training when resuming dfrom chkpt

Fix a minor typo

* Support computing nbest oracle WER. * Add scale to all nbest based decoding/rescoring methods. * Add script to run pretrained models. * Use torchaudio to extract features. * Support decoding multiple files at the same time. Also, use kaldifeat for feature extraction. * Support decoding with LM rescoring and attention-decoder rescoring. * Minor fixes. * Replace scale with lattice-score-scale. * Add usage example with a provided pretrained model.

myleott closed this as completed Sep 28, 2017

taylanbil added a commit to taylanbil/fairseq that referenced this issue Oct 18, 2019

Add option to assert on training and/or validation loss (facebookrese…

63c19e9

…arch#10) * Add option to assert on training and/or validation loss * applied suggestion

noisychannel pushed a commit to noisychannel/fairseq that referenced this issue Jan 3, 2020

Merge pull request facebookresearch#10 from iamsimha/patch-1

65a33d0

Fix a minor typo

jogonba2 mentioned this issue Jul 9, 2020

BART-Large: RuntimeError: CUDA error: the launch timed out and was terminated #2311

Open

thies1006 mentioned this issue Aug 25, 2020

terminate called after throwing an instance of 'c10::Error' #2526

Closed

Gavin90s mentioned this issue Mar 11, 2021

speech_recognition/w2l_decoder.py load kenlm core dump #3337

Closed

Jxu-Thu mentioned this issue Jul 1, 2021

Wav2vec2 error after validation when training : terminate called after throwing an instance of 'c10::Error' #3674

Open

jiayidengYumy mentioned this issue Nov 20, 2023

CUDA error of self.embed_positions #5381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are weigths of layer "LinearizedConvolution" initialized? #10

How are weigths of layer "LinearizedConvolution" initialized? #10

weilheim commented Sep 28, 2017

myleott commented Sep 28, 2017

How are weigths of layer "LinearizedConvolution" initialized? #10

How are weigths of layer "LinearizedConvolution" initialized? #10

Comments

weilheim commented Sep 28, 2017

myleott commented Sep 28, 2017