Model | Modality | Download Link | Kinetics Val Top-1 | Kinetics Val Top-5 |
---|---|---|---|---|
BNInception | RGB | file_download BN RGB Pretrained Weights | 69.1% | 88.7% |
BNInception | Optical Flow | file_download BN Flow Pretrained Weights | 62.1% | 83.9% |
BNInception | RGB+Flow (1:1) | - | 73.9% | 91.1% |
Inception V3 | RGB | file_download V3 RGB Pretrained Weights | 72.5% | 90.2% |
Inception V3 | Optical Flow | file_download V3 Flow Pretrained Weights | 62.8% | 84.2% |
Inception V3 | RGB+Flow(1:1) | - | 76.6% | 92.4% |
Model | Pretraining | RGB | Flow | RGB+Flow |
---|---|---|---|---|
BNInception | ImageNet only | 85.4% | 89.4% | 94.9% |
BNInception | ImageNet + Kinetics | 91.1% | 95.2% | 97.0% |
Inception V3 | ImageNet only | - | - | - |
Inception V3 | ImageNet + Kinetics | 93.2% | 95.3% | 97.3% |
Pretraining | ActivityNet v1.3 Val Top-1 | ActivityNet v1.3 Test Top-1 |
---|---|---|
ImageNet only | 85.4% | 88.7% |
ImageNet + Kinetics | 88.3% | 90.2% |
Pretraining | ActivityNet v1.2 Val Average mAP (0.5:0.05:0.95) | THUMOS14 Test mAP@0.5 |
---|---|---|
ImageNet only | 26.81% | 26.73% |
ImageNet + Kinetics | 28.57% | 31.90% |
@inproceedings{TSN2016ECCV, author = {Limin Wang and Yuanjun Xiong and Zhe Wang and Yu Qiao and Dahua Lin and Xiaoou Tang and Luc {Val Gool}}, title = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition}, booktitle = {ECCV}, year = {2016}, }
The framework used to train the provided models.
[Github Link]
[ECCV Paper]
The state-of-the-art temporal action detection framework proposed in ICCV2017.
[Github Link]
[ICCV Paper]
We secured the first place of untrimmed video classification task
in ActivityNet Large Scale Action Recognition Challenge 2016, held in conjunction with CVPR'16.
The method and models of our submissions are released for research use.
[Github Link]
[Notebook Paper]
Our modified version of the famous Caffe toolbox featuring MPI-based
parallel training and Video IO support. We also introduced the cross-modality training of optical flow networks in this work.
[Github Link]
[Tech Report]
Enhanced MV-CNN is a real-time action recognition algorithm.
It uses motion vector to achieve real-time processing speed and knowledge transfer techniques to improve recognition performance.
[CVPR16 Paper]
[Project Page]
The state-of-the-art approach for action recognition before TSN.
[CVPR15 Paper]
[Github Link]