Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks | |
Li, Tao1; Xiong, Wenjun2; Zhang, Zheng2; Pei, Lishen3 | |
2023-08-30 | |
发表期刊 | INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
![]() |
ISSN | 0218-0014 |
摘要 | Video action recognition relies heavily on the way spatio-temporal cues are combined in order to enhance recognition accuracy. This issue can be addressed with explicit modeling of interactions among objects within or between videos, such as the graph neural network, which has been shown to accurately model and represent complicated spatial- temporal object relations for video action classification. However, the visual objects in the video are diversified, whereas the nodes in the graphs are fixed. This may result in information overload or loss if the visual objects are too redundant or insufficient for graph construction. Segment level graph convolutional networks (SLGCNs) are proposed as a method for recognizing actions in videos. The SLGCN consists of a segment-level spatial graph and a segment-level temporal graph, both of which are capable of simultaneously processing spatial and temporal information. Specifically, the segment-level spatial graph and the segment-level temporal graph are constructed using 2D and 3D CNNs to extract appearance and motion features from video segments. Graph convolutions are applied in order to obtain informative segment-level spatial-temporal features. A variety of challenging video datasets, such as EPIC-Kitchens, FCVID, HMDB51 and UCF101, are used to evaluate our method. In experiments, it is demonstrated that the SLGCN can achieve performance comparable to the state-of-the-art models in terms of obtaining spatial-temporal features. |
关键词 | Video action recognition graph convolutional networks spatial-temporal graphs feature combination |
DOI | 10.1142/S021800142350009X |
收录类别 | SCIE |
语种 | 英语 |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence |
WOS记录号 | WOS:001170344400001 |
出版者 | WORLD SCIENTIFIC PUBL CO PTE LTD |
原始文献类型 | Article ; Early Access |
EISSN | 1793-6381 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.library.ouchn.edu.cn/handle/39V7QQFX/169780 |
专题 | 国家开放大学 |
通讯作者 | Li, Tao |
作者单位 | 1.Open Univ Henan, Dept Informat Engn, Zhengzhou 450046, Peoples R China; 2.Open Univ Henan, Resource Construct & Management Ctr, Zhengzhou 450046, Peoples R China; 3.Henan Univ Econ & Law, Dept Informat Engn, Zhengzhou 450046, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Tao,Xiong, Wenjun,Zhang, Zheng,et al. Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE,2023. |
APA | Li, Tao,Xiong, Wenjun,Zhang, Zheng,&Pei, Lishen.(2023).Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks.INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE. |
MLA | Li, Tao,et al."Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks".INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (2023). |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论