Modality | RGB | Flow | Warp Flow | RGB + Flow (1:1.5) |
RGB + Flow + Warp Flow (1:1:0.5) |
---|---|---|---|---|---|
HMDB51 | 51.0% | 64.2% | 63.0% | 68.5% | 69.0% |
UCF101 | 85.1% | 89.7% | 89.8% | 94.0% | 94.2% |
Split 1 | Split 2 | Split 3 | Average of 3 Splits | |
---|---|---|---|---|
RGB | 54.4% | 50.0% | 49.2% | 51.0% |
Flow | 62.4% | 63.3% | 63.9% | 63.2% |
RGB + Flow (1:1.5) | 69.5% | 67.4% | 68.5% | 68.5% |
* Note: it is fine to observe small differences in accuracy numbers of single streams, while the combined performance will be very stable. |
Split 1 | Split 2 | Split 3 | Average of 3 Splits | |
---|---|---|---|---|
RGB | 85.5% | 84.9% | 84.5% | 85.1% |
Flow | 87.6% | 90.2% | 91.3% | 89.7% |
RGB + Flow (1:1.5) | 93.5% | 94.3% | 94.5% | 94.0% |
* Note: it is fine to observe small differences in accuracy numbers of single streams, while the combined performance will be very stable. |
Segment Number | RGB | Flow | RGB + Flow (1:1) | RGB + Flow (1:1.5) |
---|---|---|---|---|
3 | 85.1% | 89.7% | 94.0% | 94.0% |
5 | 85.1% | 89.7% | 94.5% | 94.6% |
7 | 85.4% | 89.6% | 94.6% | 94.9% |
9 | 85.3% | 89.6% | 94.8% | 94.9% |
Segment Number | RGB | Flow | RGB + Flow (1:1) |
---|---|---|---|
3 | 83.6% | 70.6% | 86.9% |
5 | 84.6 | 72.9% | 87.6% |
7 | 84.0% | 72.8% | 87.8% |
9 | 83.7% | 72.6% | 87.9% |
Speed (on GPU) | UCF101 Split 1 | UCF101 3 Splits Average | |
---|---|---|---|
Enhanced MV [1] | 390 FPS | 86.6% | 86.4% |
Two-stream 3Dnet [2] | 246 FPS | - | 90.2% |
RGB Diff w/o TSN | 660FPS | 83.0% | N/A |
RGB Diff + TSN | 660FPS | 86.5% | 87.7% |
RGB Diff + RGB (both TSN) | 340 FPS | 90.7% | 91.0% |
@inproceedings{TSN2016ECCV, author = {Limin Wang and Yuanjun Xiong and Zhe Wang and Yu Qiao and Dahua Lin and Xiaoou Tang and Luc {Val Gool}}, title = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition}, booktitle = {ECCV}, year = {2016}, }
We secured the first place of untrimmed video classification task
in ActivityNet Large Scale Action Recognition Challenge 2016, held in conjunction with CVPR'16.
The method and models of our submissions are released for research use.
[Github Link]
[Notebook Paper]
[Challenge Results]
Our modified version of the famous Caffe toolbox featuring MPI-based
parallel training and Video IO support. We also introduced the cross-modality training of optical flow networks in this work.
[Github Link]
[Tech Report]
A tool to extract RGB and optical flow frames from videos.
[Github Link]
Enhanced MV-CNN is a real-time action recognition algorithm.
It uses motion vector to achieve real-time processing speed and knowledge transfer techniques to improve recognition performance.
[CVPR16 Paper]
[Project Page]
The state-of-the-art approach for action recognition before TSN.
[CVPR15 Paper]
[Github Link]