Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are weigths of layer "LinearizedConvolution" initialized? #10

Closed
weilheim opened this issue Sep 28, 2017 · 1 comment
Closed

How are weigths of layer "LinearizedConvolution" initialized? #10

weilheim opened this issue Sep 28, 2017 · 1 comment

Comments

@weilheim
Copy link

I would like to use your Conv S2S model in another task, so I hope to know the details of the model.
I have read the code defining class "LinearizedConvolution" and its parent layer "ConvTBC", however, I haven't seen any code doing weight initialization.

Could you please let me know how weigths of layer "LinearizedConvolution" are initialized?
Thanks!

@myleott
Copy link
Contributor

myleott commented Sep 28, 2017

@myleott myleott closed this as completed Sep 28, 2017
myleott added a commit that referenced this issue Jun 26, 2018
Summary:
Adds support for batched generation, improving speed by 8x for IWSLT from ~50wps
to ~400wps with a batch size of 128. This version is still ~1.5x slower than the
LuaTorch version, since we are not yet caching the convolutions across steps.

I've also added a few optional features:
- `--max-len-a/b`: maxlen is now a function of source len: `a*srclen + b`
- `--no-early-stop`: the LuaTorch version stopped immediately after finalizing
    k=beam hypotheses, but since we compare in the length-normalized score
    space, it's possible that a longer hypothesis would score even better.
    Setting this option increases generation time by ~50%, with no consistent
    increase in accuracy.
- `--unnormalized`: choose hypotheses based on the unnormalized scores.

Note: currently generation requires re-specifying the model configuration on the
command line, although ideally this would be stored in the model file.

Test Plan:
```
python train.py ~/local/tmp/iwslt14_de-en \
  --encoder-embed-dim 256 --encoder-layers '[(256, 3)] * 6' \
  --decoder-embed-dim 256 --decoder-layers '[(256, 3)] * 6' \
  --dropout 0.2 --clip-norm 0.1 --lr 0.25 \
  --save-dir tmp

python generate.py ~/local/tmp/iwslt14_de-en \
  --path tmp/checkpoint_best.pt \
  --encoder-embed-dim 256 --encoder-layers '[(256, 3)] * 6' \
  --decoder-embed-dim 256 --decoder-layers '[(256, 3)] * 6' \
  --dropout 0.2 \
  --batch-size 128
```

Using the new IWSLT dataset that @colesbury preprocessed (test set has 6750 sentences):
- LuaTorch, epoch 15, trainloss 2.51, test BLEU 29.26
- PyTorch, epoch 17, trainloss 2.36, test BLEU 30.34
taylanbil added a commit to taylanbil/fairseq that referenced this issue Oct 18, 2019
…arch#10)

* Add option to assert on training and/or validation loss

* applied suggestion
taylanbil added a commit to taylanbil/fairseq that referenced this issue Nov 13, 2019
optimizer fix
progress bar comment out temporarily
some changes to train_tpu
int mask instead of float

pfpfpfpf

fix

printing device index per loop

bkpt to investigate resize_ call

attempting to init buffer size to 2*dim

bkpt

better print

do not drop records when computing loss

Changes that reduce graph compiles.

* Loss function replaced with an equivalent logic that doesn't resize
tensors.
* cli args changed to guarantee consistency
* collate_tokens function in fairseq/data/data_utils.py overwritten to
guarantee consistency

undoing some changes made while debugging

progress_bar implements len

some irrelevant changes to train_tpu.py

new xla changes

bug fix in enable_torch_version

removing the last batch that is of diferent size from the iterator

delete optimizer step in fairseq s trainer

Added `self.xla` flag that controls if Trainer includes optimizer step

+ Tried to include more explanation why skip optimizer step this time

deleted obsolete file

add norm clipping count back in (#4)

remove grad norm clip count (#5)

Change masked_fill_ input in loss in order to accomodate necessary pytorch changes (#6)

Adding tpu capabilities to train.py (facebookresearch#8)

* Adding tpu capabilities to train.py

* flush when printing for better user experience

* separated cli_main into parse_args, maingpu and maintpu
deleted unused line in datautils.py

Enumerate the loader in training and validation (facebookresearch#9)

* Adding tpu capabilities to train.py

* flush when printing for better user experience

* separated cli_main into parse_args, maingpu and maintpu
deleted unused line in datautils.py

* Enumerate the loader

* enumerate the loader

Add option to assert on training and/or validation loss (facebookresearch#10)

* Add option to assert on training and/or validation loss

* applied suggestion

None loss should be filled to inf (facebookresearch#11)

Enabling multiprocessing for fairseq training. (facebookresearch#12)

* initial commit for multiprocess api

* indentation fixes and import fix

* no need to softlink, fix save/load

* Remove the hacks to only save from master ordinal as xm.save takes care of that

* fix indentation; 3 -> 4 spaces

* Moved xu.eprints after spawn and dropping last batches better

trainers->trainer (facebookresearch#13)

fix bug in assert_on_losses

Replace usage of unsqueeze with transpose + broadcasting (facebookresearch#15)

remove attn mask + loss rewrite + save per host +

format
suppress loss report
allow usage of batch_by_size in translation.
attn_weights masked fill in place

Clean up the log output suppressing a bit

Revert multihead attn's in_proj code changes

non-rebased tpu branch is about 10% faster on TPUs
compared to the rebased branch. The regression is inside multihead
attn's in_proj mechanism. Reverting the relevant changes to preserve
performance.

Pass correct args to the new get_valid_stats function

Send meters to device in order not to fail training when resuming dfrom chkpt
noisychannel pushed a commit to noisychannel/fairseq that referenced this issue Jan 3, 2020
yfyeung pushed a commit to yfyeung/fairseq that referenced this issue Dec 6, 2023
* Support computing nbest oracle WER.

* Add scale to all nbest based decoding/rescoring methods.

* Add script to run pretrained models.

* Use torchaudio to extract features.

* Support decoding multiple files at the same time.

Also, use kaldifeat for feature extraction.

* Support decoding with LM rescoring and attention-decoder rescoring.

* Minor fixes.

* Replace scale with lattice-score-scale.

* Add usage example with a provided pretrained model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants