Towards Good Practices for Very Deep Two-Stream ConvNets

Limin Wang, Yuanjun Xiong, Zhe Wang, and Yu Qiao



Please see the following link for the models.

Models and config files.

The modified pre-trained VGG-16 models are also provided

Temporal Initialization Model, Spatial Initialization Model.


Validation accuracy (%) on UCF101 Dataset

Validation Split Spatial Temporal Combined
1 79.8 85.7 90.9
2 77.3 88.2 91.6
3 77.8 87.4 91.6
Average 78.4 87.0 91.4


Github Repository

Please consult the README files in the repository for features and usages.

Optical Flow

Some have reported that there is performance drop when using other video decoders or optical flow algorithms.
Here we provide the optical flow images we extracted on UCF101 dataset for your references.

UCF101 Optical Flow

You are advised to use the same tool to extract optical flow if you plan to directly use the released models.

Dense Optical Flow Extraction Tool


  author    = {Limin Wang and Yuanjun Xiong and Zhe Wang and Yu Qiao},
  title     = {Towards Good Practices for Very Deep Two-Stream ConvNets},
  journal   = {CoRR},
  volume    = {abs/1507.02159},
  year      = {2015},
  url       = {},

Trajectory-Pooled Deep-Convolutional Descriptors