CS348 Computer Vision

Introduction

ADHA: “Adverbs Describing Human Actions” is the first benchmark for a new problem — recognizing human action adverbs (HAA). This is the first step for computer vision to change over from pattern recognition to real AI. Some key features of ADHA are: a semantically complete set of adverbs describing human actions, a set of common, describable human actions, and an exhaustive labeling of simultaneously emerging actions in each video. An in-depth analysis on the implementation of current effective models in action recognition and image captioning on adverb recognition is committed, and the results show that such methods are infeasible. Moreover, we use expression knowledge in those models and show that it can significantly improve HAA recognition performance.

Three-Stream Model

Demo

result

PBLSTM results. “T1-F1” means task 1 with feature 1. “-e” means using expression knowledge. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.	Two-stream Model results. “-S” means spatial stream. “-M” means motion stream. “-F” means fusion streams. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.
Hybrid models results. "-H" means Hybrid model. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Code

You can find the code of the model at the Github.

Statistics

The ADHA dataset consists of 11736 short videos and associated labels from an action set with 32 actions and an adverb set with 51 adverbs. In the dataset we also provide the tracking result of the target person using a semi-automatic annotation framework. In total there are 16716 persons labeled.

16716
Target Person

89096
Second of video

350
Action-Adverb Pairs

4.5312
Adverb/Person

Action Set

brush_hair	chew	clap	climb_stairs	dive	draw_sword	drink	eat
fall_floor	hit	hug	kick	kiss	pick	pour	pullup
punch	push	run	shake_hands	shoot_bow	shoot_gun	walk	wave
sit	smoke	stand	swing_baseball	sword	sword_exercise	talk	throw

Adverb Set

promptly	fast	kindly	carefully	seriously	barely	easily	slowly
quietly	precisely	gently	surprisedly	lightly	heavily	happily	freely
sadly	proudly	comfortably	calmly	vigorously	nervously	reluctantly	professionally
politely	painfully	angrily	patiently	bitterly	incidentally	frantically	intently
gracefully	flatly	confidently	weakly	solemnly	expertly	inexorably	triumphantly
hesitantly	dramatically	officially	anxiously	hard	amazingly	wearily	clumsily
sweetly	excitedly	ironically

Publication

If you want to know more about the dataset, you can download the related paper here.

Bibtex

@inproceedings{pang2017adha,
                   title={Human Action Adverb Recognition: ADHA Dataset and A Hybrid Model},
                   author={Pang, Bo and Zha, Kaiwen and Lu, Cewu},
                   booktitle={arXiv preprint},
                   year={2017}
                  }

ADHA: A Benchmark for Recognizing Adverbs describing Human Actions in Videos

Introduction

Three-Stream Model

Demo

result

PBLSTM results. “T1-F1” means task 1 with feature 1. “-e” means using expression knowledge. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Two-stream Model results. “-S” means spatial stream. “-M” means motion stream. “-F” means fusion streams. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Hybrid models results. "-H" means Hybrid model. Task 2 doesn’t recognize actions so “Act” does not have values in task 2.

Code

Statistics

Action Set

Adverb Set

Distributions

Action Distribution

Adverb Distribution

Publication

Bibtex

Download