ADHA: A Benchmark for Recognizing Adverbs describing Human Actions in Videos

Bo Pang, Kaiwen Zha, Cewu Lu*

(* corresponding author: lu-cw@cs.sjtu.edu.cn)

Introduction

ADHA: “Adverbs Describing Human Actions” is the first benchmark for a new problem — recognizing human action adverbs (HAA). This is the first step for computer vision to change over from pattern recognition to real AI. Some key features of ADHA are: a semantically complete set of adverbs describing human actions, a set of common, describable human actions, and an exhaustive labeling of simultaneously emerging actions in each video. An in-depth analysis on the implementation of current effective models in action recognition and image captioning on adverb recognition is committed, and the results show that such methods are infeasible. Moreover, we use expression knowledge in those models and show that it can significantly improve HAA recognition performance.

Three-Stream Model

Demo

result

PBLSTM results. “T1-F1” means task 1 with feature 1. “-e” means using expression knowledge. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Two-stream Model results. “-S” means spatial stream. “-M” means motion stream. “-F” means fusion streams. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Hybrid models results. "-H" means Hybrid model. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Code

You can find the code of the model at the Github.

Statistics

The ADHA dataset consists of 11736 short videos and associated labels from an action set with 32 actions and an adverb set with 51 adverbs. In the dataset we also provide the tracking result of the target person using a semi-automatic annotation framework. In total there are 16716 persons labeled.
16716
Target Person
89096
Second of video
350
Action-Adverb Pairs
4.5312
Adverb/Person

Action Set

brush_hair chew clap climb_stairs dive draw_sword drink eat
fall_floor hit hug kick kiss pick pour pullup
punch push run shake_hands shoot_bow shoot_gun walk wave
sit smoke stand swing_baseball sword sword_exercise talk throw

Adverb Set

promptly fast kindly carefully seriously barely easily slowly
quietly precisely gently surprisedly lightly heavily happily freely
sadly proudly comfortably calmly vigorously nervously reluctantly professionally
politely painfully angrily patiently bitterly incidentally frantically intently
gracefully flatly confidently weakly solemnly expertly inexorably triumphantly
hesitantly dramatically officially anxiously hard amazingly wearily clumsily
sweetly excitedly ironically

Distributions

Action Distribution

Adverb Distribution

Publication

If you want to know more about the dataset, you can download the related paper here.

Bibtex

@inproceedings{pang2017adha,
                   title={Human Action Adverb Recognition: ADHA Dataset and A Hybrid Model},
                   author={Pang, Bo and Zha, Kaiwen and Lu, Cewu},
                   booktitle={arXiv preprint},
                   year={2017}
                  }
                

Download

Click Here to download the dataset and the related materials.