- Image classification task results
- ResNetCifar training from scratch on CIFAR100
- ResNet training from scratch on ImageNet1K(ILSVRC2012)
- DarkNet training from scratch on ImageNet1K(ILSVRC2012)
- RepVGG training from scratch on ImageNet1K(ILSVRC2012)
- RegNet training from scratch on ImageNet1K(ILSVRC2012)
- U2NetBackbone training from scratch on ImageNet1K(ILSVRC2012)
- ResNet finetune from ImageNet21k pretrain weight on ImageNet1K(ILSVRC2012)
- ResNet finetune from DINO pretrain weight on ImageNet1K(ILSVRC2012)
- ViT finetune from self-trained MAE pretrain weight(400epoch) on ImageNet1K(ILSVRC2012)
- ViT finetune from offical MAE pretrain weight(800 epoch) on ImageNet1K(ILSVRC2012)
- ResNet train from ImageNet1K pretrain weight on ImageNet21K(Winter 2021 release)
- ViT finetune from self-trained MAE pretrain weight(100epoch) on ACCV2022
- VAN finetune from offical pretrain weight on ImageNet1K(ILSVRC2012)
- Object detection task results
- All detection models training from scratch on COCO2017
- All detection models finetune from objects365 pretrain weight on COCO2017
- All detection models train from COCO2017 pretrain weight on Objects365(v2,2020)
- All detection models training from scratch on VOC2007 and VOC2012
- All detection models finetune from objects365 pretrain weight on VOC2007 and VOC2012
- Semantic Segmentation task results
- Instance Segmentation task results
- Knowledge distillation task results
- Contrastive learning task results
- Masked image modeling task results
- ViT MAE pretrain on ImageNet1K(ILSVRC2012)
- ViT MAE pretrain on ACCV2022 from ImageNet1K pretrain
- ViT finetune from self-trained MAE pretrain weight(400epoch) on ImageNet1K(ILSVRC2012)
- ViT finetune from offical MAE pretrain weight(800 epoch) on ImageNet1K(ILSVRC2012)
- ViT finetune from self-trained MAE pretrain weight(100epoch) on ACCV2022
- OCR text detection task results
- OCR text recognition task results
- Human matting task results
- Salient object detection task results
- Face detection task results
- Interactive segmentation task results
- Image inpainting model task results
- Diffusion model task results
ResNet
Paper:https://arxiv.org/abs/1512.03385
DarkNet
Paper:https://arxiv.org/abs/1804.02767?e05802c1_page=1
RepVGG
Paper:https://arxiv.org/abs/2101.03697
RegNet
Paper:https://arxiv.org/abs/2003.13678
ViT
Paper:https://arxiv.org/abs/2010.11929
VAN
Paper:https://arxiv.org/abs/2202.09741
ResNetCifar is different from ResNet in the first few layers.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18Cifar | 557.935M | 11.220M | 32x32 | 1 RTX A5000 | 128 | 200 | 77.110 |
ResNet34Cifar | 1.164G | 21.328M | 32x32 | 1 RTX A5000 | 128 | 200 | 78.140 |
ResNet50Cifar | 1.312G | 23.705M | 32x32 | 1 RTX A5000 | 128 | 200 | 75.610 |
ResNet101Cifar | 2.531G | 42.697M | 32x32 | 1 RTX A5000 | 128 | 200 | 76.970 |
ResNet152Cifar | 3.751G | 58.341M | 32x32 | 1 RTX A5000 | 128 | 200 | 77.710 |
You can find more model training details in classification_training/cifar100/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 2 RTX A5000 | 256 | 100 | 70.512 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 256 | 100 | 73.680 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 256 | 100 | 76.300 |
ResNet101 | 7.834G | 44.549M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.380 |
ResNet152 | 11.559G | 60.193M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.542 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
DarkNetTiny | 412.537M | 2.087M | 256x256 | 2 RTX A5000 | 256 | 100 | 57.786 |
DarkNet19 | 3.663G | 20.842M | 256x256 | 2 RTX A5000 | 256 | 100 | 74.248 |
DarkNet53 | 9.322G | 41.610M | 256x256 | 2 RTX A5000 | 256 | 100 | 76.352 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
RepVGG_A0_deploy | 1.362G | 8.309M | 224x224 | 2 RTX A5000 | 256 | 120 | 72.010 |
RepVGG_A1_deploy | 2.364G | 12.790M | 224x224 | 2 RTX A5000 | 256 | 120 | 74.032 |
RepVGG_A2_deploy | 5.117G | 25.500M | 224x224 | 2 RTX A5000 | 256 | 120 | 76.078 |
RepVGG_B0_deploy | 3.058G | 14.339M | 224x224 | 2 RTX A5000 | 256 | 120 | 74.880 |
RepVGG_B1_deploy | 11.816G | 51.829M | 224x224 | 2 RTX A5000 | 256 | 120 | 77.790 |
RepVGG_B2_deploy | 18.377G | 80.315M | 224x224 | 2 RTX A5000 | 256 | 120 | 78.120 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
RegNetX_400MF | 410.266M | 5.158M | 224x224 | 2 RTX A5000 | 4096 | 300 | 69.466 |
RegNetX_600MF | 616.813M | 6.196M | 224x224 | 2 RTX A5000 | 4096 | 300 | 71.754 |
RegNetX_800MF | 820.324M | 7.260M | 224x224 | 2 RTX A5000 | 4096 | 300 | 73.148 |
RegNetX_1_6GF | 1.635G | 9.190M | 224x224 | 2 RTX A5000 | 4096 | 300 | 76.142 |
RegNetX_3_2GF | 3.222G | 15.297M | 224x224 | 2 RTX A5000 | 4096 | 300 | 78.244 |
RegNetX_4_0GF | 4.013G | 22.118M | 224x224 | 2 RTX A5000 | 4096 | 300 | 78.916 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
U2NetBackbone | 13.097G | 26.181M | 224x224 | 2 RTX A5000 | 256 | 100 | 76.038 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 2 RTX A5000 | 4096 | 300 | 71.580 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 4096 | 300 | 76.316 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 300 | 79.484 |
ResNet101 | 7.834G | 44.549M | 224x224 | 2 RTX A5000 | 4096 | 300 | 80.940 |
ResNet152 | 11.559G | 60.193M | 224x224 | 2 RTX A5000 | 4096 | 300 | 81.236 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 1 RTX A5000 | 256 | 100 | 70.754 |
ResNet18 | 1.819G | 11.690M | 224x224 | 1 RTX A5000 | 4096 | 300 | 71.362 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 256 | 100 | 74.218 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 4096 | 300 | 75.916 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.114 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 300 | 79.418 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Tiny-Patch16 | 1.075G | 5.670M | 224x224 | 1 RTX A5000 | 4096 | 100 | 68.614 |
ViT-Small-Patch16 | 4.241G | 21.955M | 224x224 | 2 RTX A5000 | 4096 | 100 | 79.006 |
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.204 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.020 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.290 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.876 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Semantic Softmax Acc |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 2 RTX A5000 | 4096 | 80 | 68.639 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 4096 | 80 | 71.873 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 80 | 74.664 |
ResNet101 | 7.834G | 44.549M | 224x224 | 2 RTX A5000 | 4096 | 80 | 76.136 |
ResNet152 | 11.559G | 60.193M | 224x224 | 2 RTX A5000 | 4096 | 80 | 75.731 |
You can find more model training details in classification_training/imagenet21k/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Large-Patch16 | 59.651G | 308.124M | 224x224 | 2 RTX 4090 | 4096 | 100 | 90.693 |
You can find more model training details in classification_training/accv2022/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
VAN-B0 | 880.224M | 4.103M | 224x224 | 2 RTX A5000 | 1024 | 300 | 75.618 |
VAN-B1 | 2.518G | 13.856M | 224x224 | 2 RTX 4090 | 1024 | 300 | 80.956 |
VAN-B2 | 5.033G | 26.567M | 224x224 | 4 RTX 4090 | 1024 | 300 | 82.322 |
You can find more model training details in classification_training/imagenet/.
RetinaNet
Paper:https://arxiv.org/abs/1708.02002
FCOS
Paper:https://arxiv.org/abs/1904.01355
CenterNet
Paper:https://arxiv.org/abs/1904.07850
TTFNet
Paper:https://arxiv.org/abs/1909.00700
DETR
Paper:https://arxiv.org/abs/2005.12872
DINO-DETR
Paper:https://arxiv.org/abs/2203.03605
Trained on COCO2017 train dataset, tested on COCO2017 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 95.558G | 37.969M | 2 RTX A5000 | 32 | 13 | 34.459 |
ResNet50-RetinaNet | YoloStyle-800 | 800x800 | 149.522G | 37.969M | 2 RTX A5000 | 32 | 13 | 36.023 |
ResNet50-RetinaNet | RetinaStyle-800 | 800x1333 | 250.069G | 37.969M | 2 RTX A5000 | 8 | 13 | 35.434 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 81.943G | 32.291M | 2 RTX A5000 | 32 | 13 | 37.176 |
ResNet50-FCOS | YoloStyle-800 | 800x800 | 128.160G | 32.291M | 2 RTX A5000 | 32 | 13 | 38.745 |
ResNet50-FCOS | RetinaStyle-800 | 800x1333 | 214.406G | 32.291M | 2 RTX A5000 | 8 | 13 | 39.649 |
ResNet18DCN-CenterNet | YoloStyle-512 | 512x512 | 14.854G | 12.889M | 2 RTX A5000 | 64 | 140 | 26.209 |
ResNet18DCN-TTFNet-3x | YoloStyle-512 | 512x512 | 16.063G | 13.737M | 2 RTX A5000 | 64 | 39 | 27.054 |
ResNet50-DETR | YoloStyle-1024 | 1024x1024 | 89.577G | 30.440M | 8 RTX A5000 | 64 | 500 | 36.941 |
ResNet50-DINO-DETR | YoloStyle-1024 | 1024x1024 | 844.204G | 47.082M | 8 RTX A5000 | 16 | 13 | 42.870 |
ResNet50-DINO-DETR | YoloStyle-1024 | 1024x1024 | 844.204G | 47.082M | 8 RTX A5000 | 16 | 39 | 45.445 |
You can find more model training details in detection_training/coco/.
Trained on COCO2017 train dataset, tested on COCO2017 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 95.558G | 37.969M | 2 RTX A5000 | 32 | 13 | 38.930 |
ResNet50-RetinaNet | YoloStyle-800 | 800x800 | 149.522G | 37.969M | 2 RTX A5000 | 32 | 13 | 40.483 |
ResNet50-RetinaNet | RetinaStyle-800 | 800x1333 | 250.069G | 37.969M | 2 RTX A5000 | 8 | 13 | 40.424 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 81.943G | 32.291M | 2 RTX A5000 | 32 | 13 | 42.871 |
ResNet50-FCOS | YoloStyle-800 | 800x800 | 128.160G | 32.291M | 2 RTX A5000 | 32 | 13 | 44.526 |
ResNet50-FCOS | RetinaStyle-800 | 800x1333 | 214.406G | 32.291M | 2 RTX A5000 | 8 | 13 | 42.848 |
You can find more model training details in detection_training/coco/.
Trained on objects365 train dataset, tested on objects365 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-800 | 800x800 | 149.522G | 37.969M | 8 RTX A5000 | 32 | 13 | 16.360 |
ResNet50-FCOS | RetinaStyle-800 | 800x1333 | 214.406G | 32.291M | 8 RTX A5000 | 32 | 13 | 17.068 |
Trained on VOC2007 trainval dataset + VOC2012 trainval dataset, tested on VOC2007 test dataset.
mAP is IoU=0.50,area=all,maxDets=100,mAP.
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 84.947G | 36.724M | 2 RTX A5000 | 32 | 13 | 81.948 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 80.764G | 32.153M | 2 RTX A5000 | 32 | 13 | 81.624 |
You can find more model training details in detection_training/voc/.
Trained on VOC2007 trainval dataset + VOC2012 trainval dataset, tested on VOC2007 test dataset.
mAP is IoU=0.50,area=all,maxDets=100,mAP.
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 84.947G | 36.724M | 2 RTX A5000 | 32 | 13 | 90.220 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 80.764G | 32.153M | 2 RTX A5000 | 32 | 13 | 90.371 |
You can find more model training details in detection_training/voc/.
DeepLabv3+
Paper:https://arxiv.org/abs/1802.02611
U2Net
Paper:https://arxiv.org/abs/2005.09007
Network | input size | macs | params | gpu num | batch | epochs | miou |
---|---|---|---|---|---|---|---|
ResNet50-DeepLabv3+ | 512x512 | 25.548G | 26.738M | 2 RTX A5000 | 8 | 128 | 34.659 |
U2Net | 512x512 | 219.012G | 46.191M | 2 RTX A5000 | 8 | 128 | 39.046 |
You can find more model training details in semantic_segmentation_training/ade20k/.
Network | input size | macs | params | gpu num | batch | epochs | miou |
---|---|---|---|---|---|---|---|
ResNet50-DeepLabv3+ | 512x512 | 25.548G | 26.738M | 2 RTX A5000 | 32 | 64 | 64.176 |
U2Net | 512x512 | 219.012G | 46.191M | 4 RTX A5000 | 32 | 64 | 66.529 |
You can find more model training details in semantic_segmentation_training/coco/.
YOLACT
Paper:https://arxiv.org/abs/1904.02689
SOLOv2
Paper:https://arxiv.org/abs/2003.10152
Trained on COCO2017 train dataset, tested on COCO2017 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-YOLACT | YoloStyle-800 | 800x800 | 123.095G | 31.165M | 4 RTX A5000 | 64 | 39 | 28.061 |
ResNet50-SOLOv2 | YoloStyle-1024 | 1024x1024 | 248.546G | 46.582M | 4 RTX A5000 | 32 | 39 | 36.559 |
You can find more model training details in instance_segmentation_training/coco/.
KD loss
Paper:https://arxiv.org/abs/1503.02531
DML loss
Paper:https://arxiv.org/abs/1706.00384
Teacher Network | Student Network | method | Freeze Teacher | input size | gpu num | batch | epochs | Teacher Top-1 | Student Top-1 |
---|---|---|---|---|---|---|---|---|---|
ResNet152 | ResNet50 | CE+KD | True | 224x224 | 2 RTX A5000 | 256 | 100 | / | 77.352 |
ResNet152 | ResNet50 | CE+DML | False | 224x224 | 2 RTX A5000 | 256 | 100 | 79.274 | 78.122 |
ResNet152 | ResNet50 | CE+KD+Vit Aug | True | 224x224 | 2 RTX A5000 | 4096 | 300 | / | 80.168 |
ResNet152 | ResNet50 | CE+DML+Vit Aug | False | 224x224 | 2 RTX A5000 | 4096 | 300 | 81.508 | 79.810 |
You can find more model training details in distillation_training/imagenet/.
DINO:Emerging Properties in Self-Supervised Vision Transformers
Paper:https://arxiv.org/abs/2104.14294
Network | input size | gpu num | batch | epochs | Loss |
---|---|---|---|---|---|
ResNet18-DINO | 224x224 | 4 RTX A5000 | 256 | 400 | 3.081 |
ResNet34-DINO | 224x224 | 4 RTX A5000 | 256 | 400 | 2.425 |
ResNet50-DINO | 224x224 | 4 RTX A5000 | 256 | 400 | 1.997 |
You can find more model training details in contrastive_learning_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.114 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 300 | 79.418 |
You can find more model training details in classification_training/imagenet/.
MAE:Masked Autoencoders Are Scalable Vision Learners
Paper:https://arxiv.org/abs/2111.06377
Network | input size | gpu num | batch | epochs | Loss |
---|---|---|---|---|---|
ViT-Tiny-Patch16 | 224x224 | 1 RTX A5000 | 256 | 400 | 0.427 |
ViT-Small-Patch16 | 224x224 | 2 RTX A5000 | 256 | 400 | 0.414 |
ViT-Base-Patch16 | 224x224 | 2 RTX A5000 | 256 | 400 | 0.388 |
ViT-Large-Patch16 | 224x224 | 2 RTX A5000 | 256 | 400 | 0.378 |
You can find more model training details in masked_image_modeling_training/imagenet/.
Network | input size | gpu num | batch | epochs | Loss |
---|---|---|---|---|---|
ViT-Large-Patch16 | 224x224 | 2 RTX 4090 | 256 | 100 | 0.423 |
You can find more model training details in masked_image_modeling_training/accv2022/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Tiny-Patch16 | 1.075G | 5.670M | 224x224 | 1 RTX A5000 | 4096 | 100 | 68.614 |
ViT-Small-Patch16 | 4.241G | 21.955M | 224x224 | 2 RTX A5000 | 4096 | 100 | 79.006 |
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.204 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.020 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.290 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.876 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Large-Patch16 | 59.651G | 308.124M | 224x224 | 2 RTX 4090 | 4096 | 100 | 90.693 |
You can find more model training details in classification_training/accv2022/.
DBNet
Paper:https://arxiv.org/abs/1911.08947
Use combine dataset include ICDAR2017RCTW/ICDAR2019ART/ICDAR2019LSVT/ICDAR2019MLT to train and test.
Network | macs | params | input size | gpu num | batch | epochs | precision | recall | f1 |
---|---|---|---|---|---|---|---|---|---|
repvgg_dbnet | 11.806G | 726.338K | 960x960 | 2 RTX A5000 | 128 | 200 | 88.756 | 74.205 | 80.831 |
resnet50_dbnet | 139.141G | 24.784M | 960x960 | 2 RTX A5000 | 64 | 100 | 92.973 | 86.316 | 89.521 |
vanb1_dbnet | 108.596G | 14.439M | 960x960 | 2 RTX A5000 | 64 | 100 | 93.049 | 86.881 | 89.859 |
You can find more model training details in ocr_text_detection_training/.
CRNN+LSTM+CTC
Paper:https://arxiv.org/abs/1507.05717
Use combine dataset aistudio_baidu_street/chinese_dataset/synthetic_chinese_string_dataset/meta_self_learning_dataset to train and test.
Network | macs | params | input size | gpu num | batch | epochs | lcs_precision | lcs_recall |
---|---|---|---|---|---|---|---|---|
repvgg_ctc_model | 951.804M | 6.865M | 32x512 | 2 RTX A5000 | 512 | 50 | 98.079 | 97.564 |
resnet50_ctc_model | 12.474G | 179.864M | 32x512 | 2 RTX A5000 | 1024 | 50 | 99.368 | 99.135 |
van_b1_ctc_model | 2.410G | 27.954 M | 32x512 | 2 RTX A5000 | 1024 | 50 | 98.868 | 97.597 |
You can find more model training details in ocr_text_recognition_training/.
PFAN+Matting
Paper1:https://arxiv.org/abs/1903.00179
Paper2:https://arxiv.org/abs/2104.14222
Paper3:https://arxiv.org/abs/2202.09741
Use combine dataset Deep_Automatic_Portrait_Matting/RealWorldPortrait636/P3M10K to train and test.
Network | macs | params | input size | gpu num | batch | epochs | iou | precision | recall | sad | mae | mse | grad | conn |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
resnet50_pfan_matting | 85.638G | 29.654M | 832x832 | 2 RTX 3090 | 32 | 50 | 0.9818 | 0.9879 | 0.9937 | 5.9215 | 0.0085 | 0.0048 | 7.5277 | 5.6842 |
van_b2_pfan_matting | 85.926G | 27.854M | 832x832 | 2 RTX 3090 | 32 | 50 | 0.9850 | 0.9900 | 0.9948 | 5.0200 | 0.0072 | 0.0038 | 5.5563 | 4.7644 |
You can find more model training details in human_matting_training/.
PFAN+Segmentation
Paper1:https://arxiv.org/abs/1903.00179
Paper2:https://arxiv.org/abs/2202.09741
Use combine dataset DIS5K/HRS10K/HRSOD/UHRSD to train and test.
Network | macs | params | input size | gpu num | batch | epochs | iou | precision | recall | f_squared_beta |
---|---|---|---|---|---|---|---|---|---|---|
resnet50_pfan_segmentation | 70.921G | 26.580M | 832x832 | 8 RTX 4090D | 64 | 100 | 0.8501 | 0.8977 | 0.9389 | 0.9068 |
van_b2_pfan_segmentation | 77.433G | 26.953M | 832x832 | 8 RTX 4090D | 64 | 100 | 0.8904 | 0.9292 | 0.9527 | 0.9345 |
You can find more model training details in salient_object_detection_training/.
RetinaFace
Paper:https://arxiv.org/pdf/1905.00641
Use WiderFace train and UFDD val datasets to train, WiderFace val dataset to test.
Network | macs | params | input size | gpu num | batch | epochs | decode setting | Easy AP | Medium AP | Hard AP |
---|---|---|---|---|---|---|---|---|---|---|
resnet50_retinaface | 100.372G | 27.277M | 960x960 | 2 RTX A5000 | 16 | 100 | max_object_num:200 min_score_threshold:0.3 topn:1000 |
0.9311 | 0.9043 | 0.7357 |
resnet50_retinaface | 100.372G | 27.277M | 960x960 | 2 RTX A5000 | 16 | 100 | max_object_num:1000 min_score_threshold:0.1 topn:2000 |
0.9357 | 0.9158 | 0.8105 |
You can find more model training details in face_detection_training/.
SAM(segment-anything)
Paper:https://arxiv.org/pdf/2304.02643
using random noise prompt box to test model.
Network | dataset | input size | gpu num | batch | epochs | iou | precision | recall |
---|---|---|---|---|---|---|---|---|
sam_b | salient_object_detection | 1024x1024 | 8 RTX 4090D | 16 | 500 | 0.9486 | 0.9676 | 0.9783 |
sam_b | sobav2 | 1024x1024 | 2 RTX 3090 | 4 | 500 | 0.9871 | 0.9935 | 0.9930 |
sam_b | desobav2 | 1024x1024 | 8 RTX 4090D | 16 | 200 | 0.9862 | 0.9919 | 0.9941 |
You can find more model training details in interactive_segmentation_training/.
Aggregated Contextual Transformations for High-Resolution Image Inpainting
Paper:https://arxiv.org/abs/2104.01431
Trained image inpainting model on CelebA-HQ dataset.Test image num=2000.
Network | input size | epochs | Mask | mae | psnr | ssim | fid |
---|---|---|---|---|---|---|---|
AOT-GAN | 512x512 | 100 | 0.01-0.1 | 0.0023 | 40.368 | 0.9853 | 0.8003 |
AOT-GAN | 512x512 | 100 | 0.1-0.2 | 0.0064 | 33.724 | 0.9592 | 2.1704 |
AOT-GAN | 512x512 | 100 | 0.2-0.3 | 0.0122 | 29.996 | 0.9245 | 3.8093 |
AOT-GAN | 512x512 | 100 | 0.3-0.4 | 0.0192 | 27.343 | 0.8860 | 5.4981 |
AOT-GAN | 512x512 | 100 | 0.4-0.5 | 0.0279 | 25.154 | 0.8426 | 8.3303 |
AOT-GAN | 512x512 | 100 | 0.5-0.6 | 0.0486 | 21.576 | 0.7704 | 14.553 |
You can find more model training details in image_inpainting_training/celebahq/.
Trained image inpainting model on Places365-standard dataset.Test image num=36500.
Network | input size | epochs | Mask | mae | psnr | ssim | fid |
---|---|---|---|---|---|---|---|
AOT-GAN | 512x512 | 5 | 0.01-0.1 | 0.0041 | 35.505 | 0.9772 | 0.1412 |
AOT-GAN | 512x512 | 5 | 0.1-0.2 | 0.0114 | 29.250 | 0.9374 | 0.4833 |
AOT-GAN | 512x512 | 5 | 0.2-0.3 | 0.0214 | 25.802 | 0.8855 | 1.1973 |
AOT-GAN | 512x512 | 5 | 0.3-0.4 | 0.0331 | 23.391 | 0.8291 | 2.5272 |
AOT-GAN | 512x512 | 5 | 0.4-0.5 | 0.0469 | 21.504 | 0.7677 | 5.0670 |
AOT-GAN | 512x512 | 5 | 0.5-0.6 | 0.0737 | 18.904 | 0.6795 | 14.951 |
Network | input size | epochs | Mask | mae | psnr | ssim | fid |
---|---|---|---|---|---|---|---|
AOT-GAN-light | 512x512 | 5 | 0.01-0.1 | 0.0043 | 35.023 | 0.9757 | 0.1680 |
AOT-GAN-light | 512x512 | 5 | 0.1-0.2 | 0.0121 | 28.824 | 0.9338 | 0.6524 |
AOT-GAN-light | 512x512 | 5 | 0.2-0.3 | 0.0227 | 25.423 | 0.8798 | 1.7831 |
AOT-GAN-light | 512x512 | 5 | 0.3-0.4 | 0.0350 | 23.052 | 0.8218 | 4.0379 |
AOT-GAN-light | 512x512 | 5 | 0.4-0.5 | 0.0494 | 21.199 | 0.7590 | 8.2494 |
AOT-GAN-light | 512x512 | 5 | 0.5-0.6 | 0.0768 | 18.690 | 0.6719 | 22.745 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.01-0.1 | 0.0042 | 35.287 | 0.9763 | 0.1611 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.1-0.2 | 0.0117 | 29.148 | 0.9356 | 0.5648 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.2-0.3 | 0.0217 | 25.790 | 0.8835 | 1.3744 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.3-0.4 | 0.0331 | 23.451 | 0.8281 | 2.7966 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.4-0.5 | 0.0464 | 21.599 | 0.7681 | 5.1937 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.5-0.6 | 0.0718 | 19.034 | 0.6838 | 12.825 |
You can find more model training details in image_inpainting_training/places365_standard.
Trained image inpainting model on Places365-challenge dataset.Test image num=36500.
Network | input size | epochs | Mask | mae | psnr | ssim | fid |
---|---|---|---|---|---|---|---|
AOT-GAN | 512x512 | 1 | 0.01-0.1 | 0.0039 | 35.807 | 0.9781 | 0.1318 |
AOT-GAN | 512x512 | 1 | 0.1-0.2 | 0.0110 | 29.499 | 0.9395 | 0.4493 |
AOT-GAN | 512x512 | 1 | 0.2-0.3 | 0.0207 | 26.021 | 0.8890 | 1.0881 |
AOT-GAN | 512x512 | 1 | 0.3-0.4 | 0.0320 | 23.586 | 0.8338 | 2.2785 |
AOT-GAN | 512x512 | 1 | 0.4-0.5 | 0.0454 | 21.674 | 0.7734 | 4.4948 |
AOT-GAN | 512x512 | 1 | 0.5-0.6 | 0.0715 | 19.039 | 0.6848 | 13.475 |
Network | input size | epochs | Mask | mae | psnr | ssim | fid |
---|---|---|---|---|---|---|---|
AOT-GAN-light | 512x512 | 1 | 0.01-0.1 | 0.0042 | 35.263 | 0.9762 | 0.1609 |
AOT-GAN-light | 512x512 | 1 | 0.1-0.2 | 0.0118 | 29.043 | 0.9349 | 0.6028 |
AOT-GAN-light | 512x512 | 1 | 0.2-0.3 | 0.0221 | 25.609 | 0.8814 | 1.6013 |
AOT-GAN-light | 512x512 | 1 | 0.3-0.4 | 0.0340 | 23.209 | 0.8235 | 3.5484 |
AOT-GAN-light | 512x512 | 1 | 0.4-0.5 | 0.0480 | 21.332 | 0.7606 | 7.2095 |
AOT-GAN-light | 512x512 | 1 | 0.5-0.6 | 0.0745 | 18.778 | 0.6714 | 20.031 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.01-0.1 | 0.0041 | 35.466 | 0.9769 | 0.1392 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.1-0.2 | 0.0114 | 29.275 | 0.9369 | 0.4566 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.2-0.3 | 0.0212 | 25.888 | 0.8854 | 1.0463 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.3-0.4 | 0.0325 | 23.530 | 0.8300 | 2.0431 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.4-0.5 | 0.0457 | 21.667 | 0.7699 | 3.8222 |
TRANSX-LKA-AOT-GAN-light | 512x512 | 1 | 0.5-0.6 | 0.0711 | 19.104 | 0.6836 | 9.9868 |
You can find more model training details in image_inpainting_training/places365_challenge.
Denoising Diffusion Probabilistic Models
Paper:https://arxiv.org/abs/2006.11239
Denoising Diffusion Implicit Models
Paper:https://arxiv.org/abs/2010.02502
High-Resolution Image Synthesis with Latent Diffusion Models
Paper:https://arxiv.org/abs/2112.10752
Trained diffusion unet on CIFAR10 dataset(DDPM method).Test image num=50000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 32x32 | 1000 | False/False | 5.394 | 8.684/0.169 |
DDIM | 32x32 | 50 | False/False | 7.644 | 8.642/0.129 |
DDPM | 32x32 | 1000 | True/True | 3.949 | 8.985/0.139 |
You can find more model training details in diffusion_model_training/cifar10/.
Trained diffusion unet on CIFAR100 dataset(DDPM method).Test image num=50000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 32x32 | 1000 | False/False | 9.620 | 9.399/0.138 |
DDIM | 32x32 | 50 | False/False | 13.250 | 8.946/0.150 |
DDPM | 32x32 | 1000 | True/True | 5.209 | 10.880/0.180 |
You can find more model training details in diffusion_model_training/cifar100/.
Trained diffusion unet on CelebA-HQ dataset(DDPM method).Test image num=28000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 64x64 | 1000 | False/False | 6.491 | 2.577/0.035 |
DDIM | 64x64 | 50 | False/False | 15.195 | 2.625/0.028 |
You can find more model training details in diffusion_model_training/celebahq/.
Trained diffusion unet on FFHQ dataset(DDPM method).Test image num=60000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 64x64 | 1000 | False/False | 6.671 | 3.399/0.055 |
DDIM | 64x64 | 50 | False/False | 10.479 | 3.431/0.044 |
You can find more model training details in diffusion_model_training/ffhq/.