Details of photo modeling
I. Technical background

According to our long-term follow-up research on related technical fields at home and abroad, some international institutions, such as Microsoft, Autodesk, Stanford University, Massachusetts Institute of Technology, etc., have good research results in the rapid reconstruction of three-dimensional shapes based on images, but they are only laboratory research results and cannot be commercialized at present. Microsoft used to provide image-based 3D reconstruction services on the Internet, but due to the large number of users' visits and poor technology, it was unable to undertake heavy technical services, so it quickly shut down the corresponding servers. At present, the image-based 3D reconstruction system has been sold internationally, such as FOTO3D Company in Canada. However, it requires a lot of human interaction and requires high shooting environment and shooting accuracy, so the market recognition is not high.

In China, except for Peking University, Tsinghua, Institute of Automation, Chinese Academy of Sciences, Beihang University, Hong Kong Polytechnic University, Beijing University of Science and Technology Everbright Company and other institutions, this technology is almost in the academic stage, and there are few substantive in-depth studies. As far as marketization and productization are concerned, there are only Autodesk 123D Catch and Beike Everbright 3DCloud platform. But these research institutions have different research emphases. For example, Beihang University mainly focuses on the study of virtual scenes used in the military field, while Hong Kong Polytechnic University mainly focuses on the study of three-dimensional synthesis of human faces.

Second, the technical characteristics

If the object is digitized in three dimensions, the three-dimensional model of the object can be obtained. At present, there are generally three methods to build a three-dimensional model:

1) 3D software modeling:

At present, many excellent modeling softwares can be seen on the market, among which 3DMAX and Maya are well-known. Their similarity lies in that through a series of geometric operations such as translation, rotation, stretching and Boolean operation, some basic geometric elements, such as cubes and spheres, are used to construct complex geometric scenes. This method requires the operator to have rich professional knowledge, skillfully use modeling software, and the operation is complicated and the period is long, so the 3D model of the final component is not realistic. Generally used in games, animation design, and architectural design such as architecture, which belongs to the category of design.

2) Modeling with instruments and equipment:

A 3d scanner is also called a 3D digitizer. At present, 3d scanner mainly uses laser, structured light and other technologies to obtain 3D coordinate information through the feedback of emitted light, while texture color is basically obtained through the camera on the device. This method needs expensive hardware equipment such as 3d scanner, and some materials and colors are affected by the reflection or refraction of light due to the limitation of the technology itself, which leads to many loopholes in the 3D model and can't complete the scanning. Such as human hair, dark clothes and transparent objects. In addition, the 3d scanner can only obtain the position information of the object at present, and most of the texture features on the surface of the object need a lot of manpower, so the whole process is costly and time-consuming. Because of its high accuracy, the grid is generally used in industrial production, cultural relics restoration and other fields, belonging to the technical field of three-dimensional reconstruction.

3) Photo modeling (modeling based on image/video):

Image-based Modeling and Rendering (IBMR) is a very active research field in computer graphics. Compared with traditional geometry-based modeling and rendering, IBMR technology has many unique advantages. Image-based modeling and rendering technology provides us with a most natural way to get realistic pictures. Using IBMR technology, modeling becomes faster and more convenient, and high rendering speed and high realism can be obtained. The latest research progress of IBMR has achieved many fruitful results, which may fundamentally change our understanding and concept of computer graphics. Because the image itself contains rich scene information, it is easy to get a photo-realistic scene model from the image. The main purpose of image-based modeling is to restore the three-dimensional geometric structure of the scene from two-dimensional images. Restoring three-dimensional objects from two-dimensional images originally belongs to computer graphics and computer vision. Because of its broad application prospects, researchers in computer graphics and computer vision are now very interested in this field. Compared with the traditional method of obtaining three-dimensional model by using modeling software or 3d scanner, the method based on image modeling has the advantages of low cost, strong sense of reality and high degree of automation, and has broad application prospects. This method has the advantages of simple operation, high degree of automation, low cost, realistic texture and color, and is not limited by time and space. For example, the domestic 3DCloud runs in the form of a cloud, and a 3D model can be automatically generated by uploading photos to the cloud. It is mainly used in many application fields, such as 3D display, 3D printing, film and television media, advertising production, virtual reality and so on. Due to low cost and other factors, it has a good future development prospect.

Third, the technical principle

It is an important research content in the field of computer vision and graphics to calculate three-dimensional features from multiple two-dimensional images and reconstruct scenes. At present, there are many related studies. Accurate camera calibration is very important for image-based 3D reconstruction. For applications that require high accuracy of 3D scene or model reconstruction and the shooting environment can be customized as required, offline calibration technology can generally better meet the needs of users. On the contrary, if we need to analyze and reconstruct the scene from some images or video sequences that cannot customize the environment or lack calibration information, we can only use online calibration technology.

In view of the importance of camera calibration technology in 3D reconstruction, we divide the related technologies into two categories: 3D reconstruction technology based on off-line camera calibration and 3D reconstruction technology based on on on-line camera calibration, and expound their research history, present situation and development trend respectively.

① 3D reconstruction technology based on off-line camera calibration.

Off-line camera calibration technology needs accurate camera internal parameters and external parameters as the input and premise of reconstruction algorithm. At present, the most popular offline camera calibration algorithm is proposed by Tsai in 1987 [Tsai 1987]. TSAI method uses a three-dimensional calibration object with a special calibration mark for non-* * plane to provide the corresponding relationship between image points and their corresponding three-dimensional space points and calculate calibration parameters. Zhang proposed another practical method in 1999 [Bouguet2007], which needs to calibrate at least two different views of a plane calibration graph. The camera calibration tool of California Institute of Technology has effectively realized the above two methods, and has been integrated into Intel's visual algorithm library OpenCV [OpenCV2004]. Through the calibration algorithm, the projection matrix of the camera can be calculated and the three-dimensional measurement information of the scene can be provided. Without giving the absolute translation, rotation and scaling parameters of the real scene, the measurement and reconstruction of similar transformation levels can be realized.

② Image-based reconstruction

In image-based reconstruction technology, both sparse feature matching and dense feature matching can be considered, and it is generally necessary to make specific choices according to the application background and scene characteristics. Feature detection is a key step in accurate reconstruction framework. In the traditional sense, features are defined as image areas or positions with large brightness or chromaticity changes in at least one specific direction [Moravec 1977]. Harris et al. estimate the local cross-correlation value with the first derivative [Harris 1988]. This method can give robust detection results, but in some cases it lacks positioning accuracy. Beaudet et al. use the product of gradient and curvature to describe corner points and detect corner points [Beaudet 1978], and SUSAN detector proposed by Smith et al. uses the size, center and moment information of feature regions to detect corner points. The scale-invariant feature detection operator SIFT proposed by Lowe is a popular algorithm at present [SIFT2004]. The advantage of SIFT is that it can effectively extract features such as rotation and scaling to a certain extent, thus greatly reducing the dependence of feature detection algorithm on environment and image quality. Koser et al. further expanded the concept of view-invariant features from SIFT's thought [Koser2007].

Feature matching between multiple views is generally needed after feature detection. The performance of feature matching algorithm will be affected by unknown image noise such as lens distortion, lighting environment and scene occlusion. At present, there are two main methods to solve the matching problem. The first idea is to detect feature records in key frames and use tracking algorithm to track the feature set in subsequent frames. Representative algorithms are optical flow-based tracking algorithms such as Lucas-Kanade algorithm [Tomasi 199 1]. The second idea is to detect features independently in multiple views and establish matching feature pairs through data association, which can be realized by simple regional correlation algorithm [Zhang 1995], or by defining an objective function describing similarity and optimizing it by various means [Li 1994].

Dense multi-view matching is also needed for occasions where dense scene reconstruction is needed. The performance of dense multi-view matching algorithm directly affects the final reconstruction quality. When the image sampling points are dense enough, optical flow technology can be used to simulate the pixel or feature displacement between adjacent images. Triangulation of three-dimensional structures can also be simulated by optical flow with point-to-point corresponding information. Under the assumption of dense spatial sampling, optical flow can be effectively approximated by coefficient characteristic displacement [Zucchelli2002].

Through image correction technology, the corresponding epipolar lines in two views can be adjusted to be horizontal and on the same horizontal scanning line, so that the depth information can be recovered by using the traditional binocular algorithm based on horizontal parallax. Under this framework, Markov random fields can be used to model and optimization algorithms based on graph theory can be used to solve problems [Scharstein2002].

Voxel-based reconstruction

In recent years, with the rapid improvement of computing speed and storage performance, volume-based scene structure representation method has become a reality. There are many ways to recover scene volume data from image sequences. A common method is to restore the visual shell of the foreground object from multiple views as the reconstruction approximation of the object. Generally speaking, the size of the visual shell decreases monotonously with the increase of the number of images participating in the calculation. The common method is to separate the foreground region and the background region from each image, project the foreground region back to the three-dimensional space and intersect it to get the visual hull [Szeliski93]. Snow proposed a voxel occupation algorithm, which realized 3D segmentation by graph cutting algorithm based on voxel label [Snow2000]. For images with obvious color distinguishing features, color compatibility can also be considered, that is, only the space voxels with color compatibility are reserved to solve the three-dimensional information [Seitz 1999]. In order to simplify spatial slicing based on visibility, Seitz et al. put forward the constraint of ordered visibility of camera position [Seitz 1999]. As a further improvement of the above framework, Prock proposed a multi-resolution voxel coloring scheme [Prock 1998], and Culbertson and others proposed a generalized color compatibility model [Culbertson 1999] which can accurately calculate visibility.

Compared with image-based reconstruction technology, voxel-based reconstruction technology can deal with occlusion problem more effectively without matching display features, but its potential disadvantage is that huge memory consumption will limit the reconstruction accuracy to some extent. In some cases, ordered visibility constraints are too strong.

Object-based reconstruction

Different from the idea of discretizing the scene with voxels in the voxel-based reconstruction algorithm, the object-based reconstruction technology focuses on directly recovering the surface model of objects in the scene. Faugeras and others proposed that level set reconstruction is the first object-oriented multi-view 3D restoration technology [Faugeras 1998], which extended the variational principle of depth restoration to a curve evolution problem that can be solved by level set [Roberts 1996]. The original framework of this work must assume diffuse reflection surface, and Lin et al.' s subsequent work weakened this requirement, making it possible to solve the problems in specular reflection and transparent environment [Lin 2002].

③ 3D reconstruction technology based on camera online calibration.

In many cases, such as the lack of calibration equipment or the constant change of camera parameters, there is not enough data to support offline camera calibration, so online camera calibration technology is needed to reconstruct this kind of scene from multiple perspectives. The main difference between online calibration and offline calibration framework lies in the method of calibrating camera or estimating camera parameters. In most literatures, off-line calibration technology is called self-calibration. Self-calibration methods can be roughly divided into two categories: self-calibration based on scene constraints and self-calibration based on geometric constraints.

Self-calibration based on scene constraints

Proper scene constraints can often simplify the difficulty of self-calibration to a great extent. For example, parallel lines widely existing in buildings or artificial scenes can help to provide vanishing point and vanishing line information in three main orthogonal directions, and can give algebraic or numerical solutions of camera internal parameters based on them [Caprile 1990]. The vanishing point can be solved by voting and searching for the maximum value. Barnard used Gaussian sphere to construct solution space [Barnard 1983]. Quan, Lutton, Routher and others gave further optimization strategies [Quan 1989, Lutton 1994, Routher 2000]. The document [Quan 1989] gives a direct algorithm for searching the solution space, and the improved algorithm given by Heuvel adds a forced orthogonal condition [Heuvel 1998]. Caprile gave a geometric parameter estimation method based on three main orthogonal vanishing points, and Hartley used calibration curves to calculate the focal length [Hartley2003]. Liebowitz and others further construct the constraint of absolute conic from the vanishing point position, and solve the calibration matrix by Cox decomposition [Liebowitz 1999].

Self-calibration based on geometric constraints

Self-calibration based on geometric constraints does not need external scene constraints, but only depends on the internal geometric constraints of multiple views to complete the calibration task. The theory and algorithm of self-calibration using absolute quadric surface were first put forward by Riggs [Riggs 1997]. Solving camera parameters based on Kruppa equation began with the work of Faugeras of Maybank [Faugeras 1992, Maybank 1992]. Hartley gave the basic matrix and deduced another derivation of Kruppa equation [Hartley 1997]. The literature [Sturm2000] discussed the uncertainty of Kruppa equation theoretically. Hierarchical self-calibration technology is used to upgrade from projection reconstruction to metric reconstruction [Faulgers1992]. One of the main difficulties of self-calibration technology is that it is not unlimited for any image or video sequence. In fact, there is a certain motion sequence or spatial feature distribution, which leads to the degradation and singular solution of the self-calibration solution framework. The literature [Sturm 1997] gives a detailed discussion and classification of degradation. For the discussion on the existence of some special solvable cases and their solutions, please refer to the literature [Wilesde 1996] and so on.

Fourth, the industry application

The three-dimensional model of photo modeling has met the requirements of 3D printing, and the texture and color of the model are realistic, so it is widely used.

1, 3D printing application, especially portrait 3D printing studio application, has obvious advantages and bright colors, and can realize instant capture by combining with camera array. Compared with the existing 3D scanning equipment, it has the advantages of low cost, convenient operation and strong sense of reality.

2.3D display application. 3D display generally requires small model file and realistic texture color, so it is widely used in e-commerce, advertising media, 3D production, virtual reality, 3D fitting and other fields.

3. In other applications, photo modeling can quickly generate the modeling of large scenes, and photos obtained by aerial photography can quickly generate three-dimensional landforms, which can be used for three-dimensional maps, military structures, mines and soil piles.