Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks

doi:10.1142/S021800142350009X

CSpace

	Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
	Li, Tao 1; Xiong, Wenjun 2; Zhang, Zheng 2; Pei, Lishen 3
	2023-08-30
发表期刊	INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
ISSN	0218-0014
摘要	Video action recognition relies heavily on the way spatio-temporal cues are combined in order to enhance recognition accuracy. This issue can be addressed with explicit modeling of interactions among objects within or between videos, such as the graph neural network, which has been shown to accurately model and represent complicated spatial- temporal object relations for video action classification. However, the visual objects in the video are diversified, whereas the nodes in the graphs are fixed. This may result in information overload or loss if the visual objects are too redundant or insufficient for graph construction. Segment level graph convolutional networks (SLGCNs) are proposed as a method for recognizing actions in videos. The SLGCN consists of a segment-level spatial graph and a segment-level temporal graph, both of which are capable of simultaneously processing spatial and temporal information. Specifically, the segment-level spatial graph and the segment-level temporal graph are constructed using 2D and 3D CNNs to extract appearance and motion features from video segments. Graph convolutions are applied in order to obtain informative segment-level spatial-temporal features. A variety of challenging video datasets, such as EPIC-Kitchens, FCVID, HMDB51 and UCF101, are used to evaluate our method. In experiments, it is demonstrated that the SLGCN can achieve performance comparable to the state-of-the-art models in terms of obtaining spatial-temporal features.
关键词	Video action recognition graph convolutional networks spatial-temporal graphs feature combination
DOI	10.1142/S021800142350009X
收录类别	SCIE
语种	英语
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence
WOS记录号	WOS:001170344400001
出版者	WORLD SCIENTIFIC PUBL CO PTE LTD
原始文献类型	Article ; Early Access
EISSN	1793-6381
引用统计	被引频次[WOS]：0 [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.library.ouchn.edu.cn/handle/39V7QQFX/169780
专题	国家开放大学
通讯作者	Li, Tao
作者单位	1.Open Univ Henan, Dept Informat Engn, Zhengzhou 450046, Peoples R China; 2.Open Univ Henan, Resource Construct & Management Ctr, Zhengzhou 450046, Peoples R China; 3.Henan Univ Econ & Law, Dept Informat Engn, Zhengzhou 450046, Peoples R China
推荐引用方式 GB/T 7714	Li, Tao,Xiong, Wenjun,Zhang, Zheng,et al. Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE,2023.
APA	Li, Tao,Xiong, Wenjun,Zhang, Zheng,&Pei, Lishen.(2023).Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks.INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE.
MLA	Li, Tao,et al."Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks".INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (2023).