Paper Note: Two-Stream Convolutional Networks for Action Recognition in Videos

Simonyan, Karen, and Andrew Zisserman. “Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014.

This is to do human action recognition with 2-stream CNN. Previously the job is done best with hand-crafted features. Some CNN attempts, treating every frame as an image, were 20% less accurate than hand-crafted state-of-the-art trajectory-based method (on UCV-101 dataset).

To use CNNs to get better performance, this work take temporal information into consider, not only the spacial part. They introduced the 2-stream CNN, where the 1st is for spatial stream, the other is for temporal stream, and at last they are late-fusioned to do classification.

螢幕快照 2016-04-21 上午10.35.48.png

Spatial Stream ConvNet

Many actions are strongly associated with particular objects, so the static appearance of an action is a useful clue. This CNN is trained as an standard image classifier. A single frame in a video is input, and the action is output. The great part is pre-trained CNN models can be utilized.

Temporal: Optical Flow ConvNet

To represent the motion of a video as input to CNN, they use optical flows.

螢幕快照 2016-04-21 上午10.58.44.png

The optical flows are split to two: 1 for x-direction, the other for y-direction. An example of a frame of optical flows is on the figure above (d)(e).

While the optical flow representation samples the displacement vector at the same location in multiple frames, a trajectory-based representation is introduced, which works even better.

The input channel stack all L frames, so it has dimension of (w, h, L * 2). The 2 is for x and y.

螢幕快照 2016-04-21 上午11.03.11.png

Fusion

Averaging
multi-class linear SVM on softmax scores as features

Evaluation

螢幕快照 2016-04-21 上午11.14.08.png

2 datasets are used, UCV-101 and HMDB-51.

88% is as good as the state-of-the-art 87.9%.

Good.

TANG NOTE

Paper Note: Two-Stream Convolutional Networks for Action Recognition in Videos

Spatial Stream ConvNet

Temporal: Optical Flow ConvNet

Fusion

Evaluation

發表留言取消回覆

Spatial Stream ConvNet

Temporal: Optical Flow ConvNet

Fusion

Evaluation

分享此文：

相關

發表留言 取消回覆

發表留言取消回覆