At present, there are mature 3D pose reconstruction solutions in the industry, that is, contact motion capture systems, such as the famous optical motion capture system Vicon (Figure 1). Firstly, special optical markers are attached to key parts of the human body (such as human joints), and multiple special motion capture cameras can detect the marker points in real time from different angles. Then accurately calculate the spatial coordinates of the marker points according to the triangulation principle, and then calculate the joint angle of the human skeleton by using the inverse kinematics (IK) algorithm. Contact motion capture is difficult to be used by ordinary consumers because of the limitations of scenes and equipment and the high price. Therefore, researchers have turned their attention to low-cost, non-contact unmarked motion reconstruction technology. This paper mainly introduces the work of attitude reconstruction using monocular RGB camera or RGB camera in recent years.
Attitude reconstruction based on monocular RGB-D camera
Three-dimensional pose reconstruction methods based on RGB-D can be divided into two categories, such as joint angle. All the above work has been trained under strong supervision. Because the training data is collected in a controlled environment, it is usually difficult to generalize the training model to natural scenes.
In order to improve the generalization ability of the model, some works try to use weak supervision to supervise the images in natural scenes, such as using domain discriminator or model fitting [76] to upgrade them to three-dimensional space.
Martinez et al [62] designed a simple but effective fully connected network structure, which takes two-dimensional joint position as input and outputs three-dimensional joint position, as shown in Figure 2.
Subsequently, Zhao et al. [75] proposed to capture the topological correlation between human joint points (such as human symmetry) by using the semantic map superposition module, which further improved the accuracy of 3D pose reconstruction. However, mapping from two-dimensional pose to three-dimensional pose is a fuzzy problem, because multiple three-dimensional poses can project the same two-dimensional pose [77]. Some recent work attempts to increase more prior knowledge to reduce ambiguity [78-80].
All the above work belongs to discriminant model, and the predicted 3D joint position may not conform to human anatomical constraints (such as not satisfying symmetry and unreasonable bone length ratio) or kinematic constraints (the joint angle exceeds the limit). Mehta et al [63] fitted the human skeleton template to the predicted two-dimensional and three-dimensional joint positions, and proposed the first real-time three-dimensional posture reconstruction system VNect based on RGB camera, and obtained more accurate posture reconstruction results. As shown in figure 3.
refer to
Continue the previous reference.
Plagemann C, GANAPATHI V, KOLLER D, etc. Real-time identification and location of body parts from depthimages [C]//20 10 IEEE International Conference on Robotics and Automation. IEEE,20 10:3 108-3 1 13。
[48] Short, fitzgibbon, Cook and others. Local real-time human posture recognition from single range images [C]//CVPR20 1 1. 20 1 1: 1297- 1304.
Taylor J, SHOTTON J, SHARP T, et al. Vitruvian Manifold: Inferring Dense Correspondence for Single Human Attitude Estimation [C]// 20 12 IEEE Conference on Computer Vision and Pattern Recognition. IEEE,20 12: 103- 1 10。
Ganapathi V, PLAGEMANN C, KOLLER D, etc. Real-time motion capture using a single time-of-flight camera [C]//20 10 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE,20 10: 755-762。
Pati V of Ghana, Plagemann III, Kohler IV, etc. Real-time human posture tracking based on distance data [C]// European computer vision conference. Springer, 20 12: 738-75 1.
[52] Yunescu C, Papava D, Olarov, et al. Human3. 6m: Large-scale data set and prediction method of three-dimensional human perception in natural environment [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,36 (7):1325-/kloc-0.
[53] Liu Jianmin, Liu Jianmin, et al. Evaluation method of human motion based on video and motion capture [J]. China Journal of Sports Medicine, 2002. International Journal of Computer Vision, 20 10/0,87 (1-2): 4.
[54] Li, Chana B. Three-dimensional human pose estimation of monocular image based on deep convolution neural network [C]// Asian Computer Vision Conference. Springer, 20 14: 332-347.
Popa, Zanfir, Smenchescu. Deep multitasking architecture for integrating 2D and 3D human perception [C]//Proceedings of IEEE Computer Vision and Pattern Recognition.2017: 6289-6298.
[56] Pavlakos G, Zhou X, Del Panis K G, et al. Coarse-to-fine volume prediction of single-image 3D human posture [C]//IEEE Conference on Computer Vision and Pattern Recognition. 20 17:7025-7034.
[57] Fang Hongsheng, Xu Yong, Wang Wei, et al. Learning posture grammar to encode three-dimensional posture estimation of human structure [C]//Proceedings of AAAI conference on artificial intelligence: Volume 32.2018.
[58] Sun X, Xiao B, Wei F, et al. Integrated human posture regression [C]// Proceedings of the European Conference on Computer Vision (ECCV).20 18: 529-545.
[59] LEE K, LEE I, Lee S. Propagation lstm: 3d Attitude Estimation Based on Joint Correlation [C]// Proceedings of the European Conference on Computer Vision (ECCV). 20 18: 1 19- 135.
Habibie I, XU W, MEHTA D, et al. Estimation of wild human posture using explicit two-dimensional features and intermediate three-dimensional representation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10905-109/kloc.
[6 1] fabry m, zilan f, caldera s, et al. compressed thermography for multi-person three-dimensional pose estimation [C]//IEEE/CVF proceedings of computer vision and pattern recognition. 2020: 7204-72 13.
[62] Martinez, Hussein, Romero, etc. A simple and effective baseline for three-dimensional human posture estimation [C]//Proceedings of IEEE International Conference on Computer Vision. 20 17: 2640-2649.
[63] Meta D, sridhar S, Sotnichenko O, etc. Real-time 3D human pose estimation with a single RGB camera [J]. ACM Transactions on Graphics (TOG), 2017,36 (4): 44.
Luo, Chu, Yang. A fully convolution neural network for three-dimensional human posture estimation [J]. Arxiv preprintiaxiv:181.04989,2018.
[65] JOO H, SIMON T, SHEIKH Y. Totalcapture: 3D deformation model for tracking face, hand and body [C]//IEEE Conference on Computer Vision and Pattern Recognition. 20 18: 8320-8329.
Habermann m, Xu w, ZOLLHOEFER M, et al. Deepcap: Monocular Human Behavior Capture with Weak Supervision [J].arXiv: Computer Vision and Pattern Recognition, 2020.
Sun Xiao, Shang Jun, Liang, et al. Synthetic human posture regression [C]//Proceedings of IEEE International Conference on Computer Vision.2017: 2602-261.
[68] Sun X, Li C, Lin S. Explicit spatio-temporal correlation learning for tracking human posture [C]//IEEE/ Proceedings of the International Computer Conference.2019.
Yangw, Ouyang, WANGX, et al. Attitude estimation of field human body based on antagonistic learning [C]//IEEE Conference on Computer Vision and Pattern Recognition.2018: 52555264.
Zhou X, Huang Q, Sun X, et al. Weak supervision method for field three-dimensional human posture estimation [C]//IEEE International Conference on Computer Vision, 20 17: 398-407.
[7 1] Weiser, Ramakrishna V, Kanade T, et al. Convolutional attitude machine [C]//IEEE Conference on Computer Vision and Pattern Recognition. 20 16: 4724-4732.
Yang Kai, Deng Jie. Cascaded hourglass network for human posture estimation [C]// European Computer Vision Conference. 20 16: 483-499.
Chen Y, Wang Z, Peng Y, et al. Cascaded Pyramid Network for Multi-person Attitude Estimation [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).20 18.
Xiao B, Wu H, Wei Y. Simple baseline of human posture estimation and tracking [C]// European Conference on Computer Vision (ECCV).20 18.
Zhao, Peng, Tian, et al. Semantic Graph Convolution Network for 3D Human Posture Regression [C]//IEEE/CVF Proceedings of Computer Vision and Pattern Recognition.2019: 3425-3435.
[76] Ramanand Chentzch. 3D human posture estimation = 2D posture estimation+matching [C]//IEEE Conference on Computer Vision and Pattern Recognition. 20 17: 7035-7043.
Hossainmr I, LITTLE J J Estimation of 3D human posture using time information [C]// Proceedings of the European Conference on Computer Vision (ECCV). 20 18: 68-84.
[78] TEKIN B,M? RQUEZ-NEILA P, SALZMANN M, et al. Learning to fuse 2d and 3d image clues for monocular human posture estimation [C]//Proceedings of IEEE International Conference on Computer Vision.2017: 3941-3950.
Wang, Huang, Wang, et al. Not all parts are equal: Three-dimensional pose estimation by simulating the two-way dependence of body parts [C]//Proceedings of IEEE/CVF International Conference on Computer Vision.2019: 7771-7780.
Pavlakos G, Zhou X, Danilidis K. Sequential Depth Supervision for 3D Human Pose Estimation [C]//IEEE Conference on Computer Vision and Pattern Recognition, 20 18: 7307-73 16.