{"id":420,"date":"2017-09-08T14:40:32","date_gmt":"2017-09-08T06:40:32","guid":{"rendered":"http:\/\/blog.yjxiong.me\/?p=420"},"modified":"2017-09-08T14:40:32","modified_gmt":"2017-09-08T06:40:32","slug":"kinetics-pretrained-tsn-models-released","status":"publish","type":"post","link":"https:\/\/yjxiong.me\/blogs\/index.php\/2017\/09\/08\/kinetics-pretrained-tsn-models-released\/","title":{"rendered":"Kinetics Pretrained TSN Models Released"},"content":{"rendered":"<p><a href=\"https:\/\/deepmind.com\/research\/open-source\/open-source-datasets\/kinetics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Kinetics Human Action Video Dataset<\/a>\u00a0is a large-scale video action recognition dataset released by Google DeepMind. It contains around\u00a0<b>300,000 trimmed human action videos<\/b>\u00a0from\u00a0<b>400 action classes<\/b>. This year (2017), it served in the\u00a0<b>ActivityNet challenge<\/b>\u00a0as the trimmed video classification track. During our participation in the challenge, we have confirmed that our\u00a0<a href=\"https:\/\/github.com\/yjxiong\/temporal-segment-networks\" target=\"_blank\" rel=\"noopener noreferrer\">TSN framework\u00a0<\/a>published in ECCV 2016 works smoothly on Kinetics. Using the Inception V3 architecture, our single model with two streams reaches the top-1 accuracy of 76.6% on Kinetics. This result is achieved by only extracting 25 snippets from each video, while the DeepMind&#8217;s I3D models use all video frames for testing and get 74.1%.<\/p>\n<p>It is also verified that TSN models learned on Kinetics can provide excellent\u00a0<b>pretraining<\/b>\u00a0for other related tasks such as\u00a0<b>untrimmed video classification<\/b>\u00a0and\u00a0<b>temporal action detection<\/b>\u00a0(<a href=\"https:\/\/github.com\/yjxiong\/action-detection\" target=\"_blank\" rel=\"noopener noreferrer\">SSN in ICCV2017<\/a>).<\/p>\n<p>&nbsp;<\/p>\n<p>Due to the huge volume of Kinetics, training action recognition models on Kinetics become really intensive for the academia. But we believe the benefit brought by Kinetics should not be limited to some rich labs or companies with lots of GPUs. In this sense, we release our action recognition models trained with TSN on the Kinetics dataset. For references, we also list the performance comparison of Kinetics and ImageNet pretrained models on two action understanding tasks, i.e. untrimmed video classification and temporal action detection using SSN.<\/p>\n<p>&nbsp;<\/p>\n<p>Model weights and experimental results can be found on the [<a href=\"http:\/\/yjxiong.me\/others\/kinetics_action\/\">project website<\/a>].<\/p>\n<p>&nbsp;<\/p>\n<p>Some performance pictures here:<\/p>\n<p>&nbsp;<\/p>\n<p>Kinetics Action Recognition<\/p>\n<p><a href=\"\/wp-content\/uploads\/2017\/09\/kinetics_perf.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-421\" src=\"\/wp-content\/uploads\/2017\/09\/kinetics_perf-1024x390.png\" alt=\"\" width=\"720\" height=\"274\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Temporal Action Detection with <a href=\"https:\/\/github.com\/yjxiong\/action-detection\">SSN<\/a><\/p>\n<p><a href=\"\/wp-content\/uploads\/2017\/09\/action_detection.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-422\" src=\"\/wp-content\/uploads\/2017\/09\/action_detection-1024x158.png\" alt=\"\" width=\"720\" height=\"111\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Happy experimenting!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kinetics Human Action Video Dataset\u00a0is a large-scale video action recognition dataset released by Google DeepMind. It contains around\u00a0300,000 trimmed human action videos\u00a0from\u00a0400 action classes. This year (2017), it served in the\u00a0ActivityNet challenge\u00a0as the trimmed video classification track. During our participation in the challenge, we have confirmed that our\u00a0TSN framework\u00a0published in ECCV 2016 works smoothly on [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-420","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/posts\/420","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/comments?post=420"}],"version-history":[{"count":0,"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/posts\/420\/revisions"}],"wp:attachment":[{"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/media?parent=420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/categories?post=420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/yjxiong.me\/blogs\/index.php\/wp-json\/wp\/v2\/tags?post=420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}