This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
tamiwiki:projects:scanning-tami [2025/02/03 20:28] – created wissotsky | tamiwiki:projects:scanning-tami [2025/02/24 13:24] (current) – wissotsky | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | <WRAP center round todo 60%> | ||
+ | Very WIP page(as of 24th February 2025), you can help by asking questions in the telegram group or in-person on Mondays | ||
+ | </ | ||
+ | |||
3d scan of tami | 3d scan of tami | ||
- | camera | + | {{tamiwiki: |
+ | |||
+ | the scan is based on a phone video recording of me walking around tami. | ||
+ | |||
+ | Now to get from a video to a mesh/ | ||
+ | (aka Global Registration) | ||
+ | |||
+ | The first two steps consist of feature detection and matching. | ||
+ | |||
+ | The rolling shutter and overexposure from the video were too complex for the SIFT based colmap feature detection. | ||
+ | Given that we have 3014 frames from the video, the exhaustive matching (3014**2 matches) from colmap is way too slow. | ||
+ | |||
+ | We can make assumptions about our data, such as the fact that time moves forward and that our camera | ||
+ | to reduce the amount of matches we have to do. | ||
+ | |||
+ | At first I tried using RoMa which is the current SOTA model combining feature detection and matching(https:// | ||
+ | |||
+ | After that I switched to superpoint | ||
+ | |||
+ | {{tamiwiki: | ||
+ | |||
+ | First I matched every frame with its 30 nearest frames(by time). | ||
+ | Then I manually found cases where I revisit the same place in the video and matched between such frames to serve as loop closure points. | ||
+ | After converting the point data into a colmap database, I used colmap' | ||
+ | |||
+ | Then used GLOMAP on the image pairs to get the global poses and a sparse point cloud. | ||
+ | |||
+ | This scene can already be used for 3d gaussian splatting, but it had alot of issues recovering fine details anywhere outside the initial sparse point cloud, random initialization helped but then it struggled with big floaters and incredibly long training times. | ||
+ | |||
+ | I tried other 3dgs pruning schedules/ | ||
+ | |||
+ | I decided to try making a dense point cloud, but the colmap mvs was way too slow to compute. | ||
+ | |||
+ | I tried projecting points from a monocular depth estimate(apple depth pro) but it was too globally unstable between frames. | ||
+ | |||
+ | Fortunately OpenMVS' | ||
+ | |||
+ | I was able to run a quick low resolution depth estimate that I then used as ground truth for realigning a neural monocular depth estimate. | ||
+ | |||
+ | OpenMVS .dmap files include a depth map and a confidence map, I was able to use this with RANSAC to fit a polynomial which offset the neural depth map | ||
+ | The confidence map was used to change the contribution of samples and allowed me to fit even very hard cases. | ||
+ | |||
+ | Now I was able to reproject all the realigned depths into a very dense pointcloud for the scene. Much denser that would even be possible with patchmatch | ||
+ | |||
+ | Albeit this point cloud had visible layers because of small discrepancies between the depths and views | ||
+ | |||
+ | Fortunately 3dgs is specifically good at optimizing these kinds of cases into a multiview consistent scene. | ||
+ | |||
+ | After running 3dgs the results were quite good, and I decided to move onto creating a mesh that could be used in software like blender. | ||
+ | |||
+ | Trying a bunch of papers, the best results I got were from ones using TSDF integration for their meshing workflow, but none of them had all the features that I wanted so I decided to do it myself. | ||
+ | |||
+ | After rendering depth maps from 3dgs and integrating them into a tsdf volume I saw that there were alot of " | ||
+ | |||
+ | I was able smooth the depth over by rasterizing the median depth(according to RaDe-GS) | ||
+ | |||
+ | {{tamiwiki: | ||
+ | |||
+ | But I still had some outliers in the depth data. | ||
+ | |||
+ | I tried again fitting neural depth estimates to my 3dgs depth data, but at this point their level of detail was lower than what I was getting from 3dgs. | ||
+ | |||
+ | I then tried PromptDA which is meant to upscale depths from smarthpones and lidars using a low-rez depth + rgb image pairs -> hi-rez depth. | ||
+ | |||
+ | But the problem I got there is that the outliers were visibly clearly still in the depth data but brought into the distribution and blended into it. | ||
- | coarse patchmatch mvs depth estimation | + | After plotting the rasterized median |
- | alignment of neural monocular | + | I was able to fit a kernel density |
- | dense point cloud integration | + | {{tamiwiki: |
- | 3dgs scene refinement | + | After removing the depth outliers I was able to get much cleaner results |
- | 3dgs depth rasterization | + | To get a mesh from the depth images I used TSDF integration, |
- | depth kernel density outlier detection | + | But the gpu vram wasnt enough for me to extract mesh detail down to 1mm. And running the integration purely on cpu was too slow. |
- | tsdf integration | + | So I ended up computing the tsdf volume in batches on the gpu and them merging them onto a uniform voxel grid on the cpu, where there was overlap between the grids I used trilinear interpolation. |
- | mesh compresson | + | #TODO mesh compresson |
voxelization | voxelization |