1. The industry needs high-precision, low-cost 3D space measurement and positioning
With the booming of smart home, industry 4.0, computer-aided medicine and VR/AR, more and more scenarios require high-precision, low-cost 3D space measurement and positioning technology.
There are two main application scenarios of this technology: the first is to solve the high-precision measurement of the size, orientation, and attitude of the object, which is particularly applicable in industrial, medical, and commercial-grade applications that require relatively high accuracy. The second category is to provide cost-effective human-computer interaction technology in the field of human-computer interaction that requires convenient, fast and accurate, which is very important in the field of industrial robot control and VR/AR.
Specifically, common application scenarios include:
1) In the industrial field, it is necessary to measure the three-dimensional dimensions of the components on the production line to determine whether their geometric dimensions and positional deviations are qualified;
Figure 1: Position Measurements in Aircraft Wind Tunnel Testing
2) In computer-assisted surgery, it is necessary to accurately measure and locate the three-dimensional spatial position of the scalpel to cooperate with computer-assisted imaging to help doctors complete various operations;
Figure 2: Measuring and positioning the scalpel in computer-assisted surgery
Figure 3: motion dection of security camera
4) In the standard teaching system of industrial robots, by providing a convenient and fast human-computer interaction method, the teaching of robots can be realized, thus forming a more efficient supplement to the existing programming teaching methods;
Figure 4: Traditional industrial robot teaching system
5) In the VR field, whether it is an inside-out tracking solution or an outside-in tracking solution, real-time positioning and tracking of the controller is required. This positioning and tracking requires that no matter how fast the controller moves, whether it is blocked or not, it must be accurately positioned, tracked stably, and cannot be lost.
Figure 5: HTC Spatial Positioning System
In the above fields, there is a very strong demand for the measurement of the three-dimensional space size, orientation, and attitude of an object, which is a rigid demand.
At present, in the industrial field, the representative three-dimensional space measurement and positioning scheme is the OptoTrack system of Canadian NDI Corporation. The system needs to stick luminous marking points on the object to be measured, and uses a vision scheme for spatial positioning, and its measurement and positioning accuracy can reach 0.1mm. However, it is suitable for offline measurement but not for online measurement due to the need to stick markers on the object to be measured. In the field of online measurement in industrial production, the accuracy and real-time performance of measurement are very important, which requires that no marks should be attached to the measured object, so that there will be a wider application space.
In the field of intelligent security, the current chip manufacturers provide the most basic motion detection scheme based on computer vision, but this scheme only detects the pixel-level brightness of the image and cannot identify high-level image semantics, so it will cause many false positives , such as the sun in the sky is blocked by clouds, etc., may cause false reports. The detection scheme based on pyroelectricity will also cause false alarms due to the passage of external heating objects (such as cars). Therefore, a more urgent need in the field of intelligent security is how to adopt a more stable and reliable solution for moving object detection.
In the field of VR, HTC, Oculus and Sony currently provide an outside-in controller and its tracking solution based on laser, monocular vision and binocular vision. Microsoft's Holographic project also provides an inside-out controller and its tracking solution. . Although the current solution has done a good job of positioning accuracy, the cost has remained high. Then, how to provide a lower-cost positioning solution without reducing the existing positioning accuracy or even increasing it further , is what the giants in the VR industry are trying to do.
From the above application scenarios, it can be seen that, according to the corresponding application scenarios, solutions with suitable accuracy, cost, convenience and high robustness can be provided, which will surely be favored by the market. Please note that the "suitable" here is very important, because it is "hooligan" to talk about performance and convenience without the cost of the application scenario. Therefore, for spatial measurement and positioning technology, the solution must be flexible, and it must be adjusted according to the needs of customers and application scenarios, so that it has the strongest competitiveness in the vertical field.
2. The advantages and disadvantages of common 3D space measurement and positioning schemes
Common three-dimensional space measurement and positioning solutions are roughly divided into two categories: laser and vision, which are subdivided, including: ToF, structured light, binocular, and monocular measurement. Among them, the first three have been introduced in many analysis articles, but there is not much introduction to monocular measurement. This section will focus on the principle and performance of monocular measurement.
1. ToF measurement
ToF measurement is actually a laser measurement. Its principle is to measure the three-dimensional space information by measuring the time-of-flight of the object reflected after the laser is emitted and shot into the receiver. Typical representatives are the Kinect 2, and the rear depth sensor of the iPhone 8, which will be available in the second half of this year. Because the speed of light is too fast, the measurement flight time is extremely short, so the pixel size of the sensor is very large, which results in a low resolution of the sensor. Therefore, the measurement accuracy is not high, and can only reach the centimeter level.
2. Structured light and binocular measurement
The principles of structured light and binocular measurement are actually similar. They both use triangulation measurement, which essentially compares two patterns, and calculates the depth by measuring the difference in contrast. The difference is that structured light compares the difference between the projected pattern and the preset pattern, while binocular contrast is the images captured by the left and right cameras. The accuracy of structured light and binocular measurement is generally at the centimeter level. Their major problem is that the amount of calculation is relatively large, which is a test for consumer-level and mobile devices. In addition, another problem with binocular measurement is that the measurement accuracy has a great relationship with the binocular baseline distance. The closer the baseline distance is, the worse the measurement accuracy is, and the farther the target object is, the worse the measurement accuracy. In the world, among the binocular measurement equipment that has done well, if the binocular baseline distance is 25mm, at a distance of 1m, the depth error is 0.45cm, at a distance of 3m, the error will reach 4.05cm, and at a distance of 4m In the distance, the error will reach 7.2cm. The representative product of structured light is Kinect 1. The representative of binocular vision, there are many companies both at home and abroad, and the representative companies include Israel's inuitive. Relatively speaking, binocular vision has the lowest entry threshold, the algorithm is relatively simple and the easiest to do, but the accuracy and cost are not satisfactory.
3. Monocular measurement
Compared with the above measurement methods, monocular measurement is the most difficult. Because it does not have as high-precision measuring components as lasers, and at the same time, the amount of information collected is not as much as that of binocular or structured light, so to achieve three-dimensional space positioning, there must be some different methods. Generally speaking, there are two ways to achieve monocular positioning. One is to use the multi-frame positioning method combined with the IMU sensor, so that the camera continuously collects multi-frame information during the movement process, and compares the IMU information with the multi-frame image. , to calculate the motion parameters of the camera itself and estimate the position of the object. Common monocular SLAM algorithms do this. The second is to perform single-frame measurement, which is based on the PnP principle for positioning and measurement. Single-frame measurement does not require IMU as an auxiliary sensor, but uses the PnP principle for measurement and positioning. This requires the geometric model of the object to be measured in advance. The more accurate the geometric model is, the more accurate the positioning accuracy will be. When measuring, it is necessary to first extract at least four interesting points that are not coplanar on the measured object, and then according to the geometric model constraints between these points, the spatial position, posture and geometric size of the object can be uniquely solved.
Figure 6: Single-frame-based monocular space measurement and localization
Compared with the previous measurement methods, since the single-frame-based monocular measurement needs to know the geometric model of the object to be measured in advance, constraints can be introduced in the calculation process for verification, so the spatial position of the object can be measured very accurately. attitude. For example, the positioning accuracy of Oculus' positioning system can reach 2-3mm, which is an order of magnitude higher than that of Sony PSVR's binocular camera. The following is a simulation experiment we did to compare the positioning accuracy of monocular and binocular:
Figure 7: Comparative simulation of monocular and binocular positioning accuracy
But also because of this advantage, there are certain limitations when it is used, and the geometric model of the object to be measured needs to be estimated or mastered in advance. But this is not a problem in many scenarios. For example, in the industrial field, the object to be measured is known in advance, and even in a domestic environment, if the object to be measured is a known rough model, it can also be used to estimate the position and attitude, but the accuracy is not so high. . However, there are many benefits of monocular, including: relatively small amount of computation, large FoV field of view, unlike binocular vision, there will be blind spots.
Figure 8: Monocular and binocular field of view comparison
This article is the preamble of high-precision three-dimensional space measurement. In the following, we will focus on the problems to be solved in monocular space measurement and positioning, as well as the application of AI technology in position tracking.