-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are weigths of layer "LinearizedConvolution" initialized? #10
Comments
Weight initialization happens here: https://github.com/facebookresearch/fairseq-py/blob/master/fairseq/models/fconv.py#L405-L407 |
myleott
added a commit
that referenced
this issue
Jun 26, 2018
Summary: Adds support for batched generation, improving speed by 8x for IWSLT from ~50wps to ~400wps with a batch size of 128. This version is still ~1.5x slower than the LuaTorch version, since we are not yet caching the convolutions across steps. I've also added a few optional features: - `--max-len-a/b`: maxlen is now a function of source len: `a*srclen + b` - `--no-early-stop`: the LuaTorch version stopped immediately after finalizing k=beam hypotheses, but since we compare in the length-normalized score space, it's possible that a longer hypothesis would score even better. Setting this option increases generation time by ~50%, with no consistent increase in accuracy. - `--unnormalized`: choose hypotheses based on the unnormalized scores. Note: currently generation requires re-specifying the model configuration on the command line, although ideally this would be stored in the model file. Test Plan: ``` python train.py ~/local/tmp/iwslt14_de-en \ --encoder-embed-dim 256 --encoder-layers '[(256, 3)] * 6' \ --decoder-embed-dim 256 --decoder-layers '[(256, 3)] * 6' \ --dropout 0.2 --clip-norm 0.1 --lr 0.25 \ --save-dir tmp python generate.py ~/local/tmp/iwslt14_de-en \ --path tmp/checkpoint_best.pt \ --encoder-embed-dim 256 --encoder-layers '[(256, 3)] * 6' \ --decoder-embed-dim 256 --decoder-layers '[(256, 3)] * 6' \ --dropout 0.2 \ --batch-size 128 ``` Using the new IWSLT dataset that @colesbury preprocessed (test set has 6750 sentences): - LuaTorch, epoch 15, trainloss 2.51, test BLEU 29.26 - PyTorch, epoch 17, trainloss 2.36, test BLEU 30.34
taylanbil
added a commit
to taylanbil/fairseq
that referenced
this issue
Oct 18, 2019
…arch#10) * Add option to assert on training and/or validation loss * applied suggestion
taylanbil
added a commit
to taylanbil/fairseq
that referenced
this issue
Nov 13, 2019
optimizer fix progress bar comment out temporarily some changes to train_tpu int mask instead of float pfpfpfpf fix printing device index per loop bkpt to investigate resize_ call attempting to init buffer size to 2*dim bkpt better print do not drop records when computing loss Changes that reduce graph compiles. * Loss function replaced with an equivalent logic that doesn't resize tensors. * cli args changed to guarantee consistency * collate_tokens function in fairseq/data/data_utils.py overwritten to guarantee consistency undoing some changes made while debugging progress_bar implements len some irrelevant changes to train_tpu.py new xla changes bug fix in enable_torch_version removing the last batch that is of diferent size from the iterator delete optimizer step in fairseq s trainer Added `self.xla` flag that controls if Trainer includes optimizer step + Tried to include more explanation why skip optimizer step this time deleted obsolete file add norm clipping count back in (#4) remove grad norm clip count (#5) Change masked_fill_ input in loss in order to accomodate necessary pytorch changes (#6) Adding tpu capabilities to train.py (facebookresearch#8) * Adding tpu capabilities to train.py * flush when printing for better user experience * separated cli_main into parse_args, maingpu and maintpu deleted unused line in datautils.py Enumerate the loader in training and validation (facebookresearch#9) * Adding tpu capabilities to train.py * flush when printing for better user experience * separated cli_main into parse_args, maingpu and maintpu deleted unused line in datautils.py * Enumerate the loader * enumerate the loader Add option to assert on training and/or validation loss (facebookresearch#10) * Add option to assert on training and/or validation loss * applied suggestion None loss should be filled to inf (facebookresearch#11) Enabling multiprocessing for fairseq training. (facebookresearch#12) * initial commit for multiprocess api * indentation fixes and import fix * no need to softlink, fix save/load * Remove the hacks to only save from master ordinal as xm.save takes care of that * fix indentation; 3 -> 4 spaces * Moved xu.eprints after spawn and dropping last batches better trainers->trainer (facebookresearch#13) fix bug in assert_on_losses Replace usage of unsqueeze with transpose + broadcasting (facebookresearch#15) remove attn mask + loss rewrite + save per host + format suppress loss report allow usage of batch_by_size in translation. attn_weights masked fill in place Clean up the log output suppressing a bit Revert multihead attn's in_proj code changes non-rebased tpu branch is about 10% faster on TPUs compared to the rebased branch. The regression is inside multihead attn's in_proj mechanism. Reverting the relevant changes to preserve performance. Pass correct args to the new get_valid_stats function Send meters to device in order not to fail training when resuming dfrom chkpt
noisychannel
pushed a commit
to noisychannel/fairseq
that referenced
this issue
Jan 3, 2020
Fix a minor typo
yfyeung
pushed a commit
to yfyeung/fairseq
that referenced
this issue
Dec 6, 2023
* Support computing nbest oracle WER. * Add scale to all nbest based decoding/rescoring methods. * Add script to run pretrained models. * Use torchaudio to extract features. * Support decoding multiple files at the same time. Also, use kaldifeat for feature extraction. * Support decoding with LM rescoring and attention-decoder rescoring. * Minor fixes. * Replace scale with lattice-score-scale. * Add usage example with a provided pretrained model.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I would like to use your Conv S2S model in another task, so I hope to know the details of the model.
I have read the code defining class "LinearizedConvolution" and its parent layer "ConvTBC", however, I haven't seen any code doing weight initialization.
Could you please let me know how weigths of layer "LinearizedConvolution" are initialized?
Thanks!
The text was updated successfully, but these errors were encountered: