Comments Page - Show HN: Fast and Exact Algorithm for Image Merging

« Back Show HN: Fast and Exact Algorithm for Image Merginggithub.comSubmitted by C-Naoki 10 months ago

scottdupoy 10 months ago
Interesting to see something like this!
My computer science masters thesis was based on the same goal. I used a 2D convolution which meant you can merge images with inexact overlaps. I had to run a high-pass filter first to limit the image details to their edges only or else the convolution incorrectly matched bright areas.
In reality merging pictures is further complicated because the source images may be slightly rotated relative to each other and also due to the images being slightly curved due to lens distortion.
My supervisor wanted me to do a PHD on the topic!
- gsliepen 10 months ago
  I used this for several applications. Note that 2D convolution can be done efficiently using FFTs, and filtering can be combined with this very efficiently: if you see your high-pass filter as a convolution of its own, you can pre-calculate its FFT, and just multiply it almost for free in the frequency domain with the two images you want to convolve.
  scottdupoy 9 months ago
  That's exactly how it worked, hand rolled FFT and filtering following the method in "Numerical Recipes for C"
  gsliepen 9 months ago
  Oh, Numerical Recipes is nice but their algorithm implementations are not really state-of-the-art. I highly recommend using FFTW (https://fftw.org/) as it will likely give you a substantial performance improvement.
- C-Naoki 10 months ago
  Thank you for your comments! For sure, the CNN is expressive for learning the characteristics of images. However, in this development, I tried to not use deep-learning because I believe that it is important to provide fast, consistent results without the need for training data. If you are particularly interested in this app, I would be glad if you could create a pull request to extend the algorithm.
  jdhwosnhw 10 months ago
  The parent comment said nothing about using deep learning. Convolution is not the same as using a CNN. I interpreted their comment as meaning they used a 2D convolution (presumably a 2D cross correlation, actually) to find regions of overlap
  scottdupoy 9 months ago
  Yes you're right it was a 2D cross-correlation which is very analogous to a convolution
  r_hanz 9 months ago
  If memory serves… the only difference is that one of the kernels being convolved is reversed for convolution.
- sitkack 9 months ago
  The images might not be coplanar and the overlapping composition should be 2d planes in 3d space or go full gaussian splat.
mightyham 10 months ago
What are the practical applications for this tool? Typically stitching images for something like panoramas requires significantly more advanced image processing algorithms because the pixels do not perfectly overlap.
- jdiff 10 months ago
  Even in web browsers that support screenshotting an entire page, websites often unload elements that are off-screen. A solution like this can take a bunch of screen-length images and stitch them into a full view of the document.
  hackernewds 9 months ago
  There are Chrome extensions that do this well already
- C-Naoki 10 months ago
  Thank you for comments! Certainly, this application may not be able to handle any kinds of images. However, I tried to stitch images without using deep-learning. Therefore, the strength of this app is that when this app receives the same images, it always produces consistent results. In the future, I will try to develop a more effective image merging method in more generalized scenario.
  jasonjmcghee 9 months ago
  Is deep learning state of the art for something like this?
  Would have expected it to just be kernel based.
  Regardless, you can have fully deterministic deep learning approaches. You can use integers, run on a CPU, and seed everything.
tobr 10 months ago
Interesting! The example shows two images that appear to have a pixel-perfect matching region. Is that a requirement or does it work with images that are only somewhat similar?
- asadm 10 months ago
  seems to be doing some mean-square error to find best matching region.
therobot24 10 months ago
look at those for loops! should look into fft-based correlation, can even do so with melon transform for scale and circular harmonic transform for rotation
fullspectrumdev 9 months ago
I’ve been looking for something like this for creating surveys using drone footage - extract every “n” frames from the video, then stitch ‘em up somehow to make a “layer”.
There’s existing software for this kind of work, but I’ve been in the mood to reinvent the wheel a bit for some strange reason.
martinmaly21 9 months ago
Nice work!
What's the latest state of the art in image stitching these days? From what I can tell, there was a bunch of research done on it in the past, but with all the recent advancements in AI, not much has changed on this front. I'd love to be wrong though!
mathisd 9 months ago
Nice project of yours! I am a data science student but I never looked into Computer Vision. Until a few days ago, when I started watching a series of short courses on a YouTube channel called First Principles of Computer Vision [0]. I found it fascinating and the math behind is truly beautiful, concise and efficient.
[0] https://www.youtube.com/@firstprinciplesofcomputerv3258 strongly recommend to check-out any playlist. Best courses I have had since a long time.
sorenjan 10 months ago
Related to this, is there a name for the effect when you stitch together video frames into a static background while keeping the moving objects moving? The best example I can think of is this Bigfoot video[0, 1], where the shaky footage has been combined into a bigger canvas with "Bigfoot" moving through it. It's a combination of video stabilization and image panorama, but with some smarts to only keep one version of the moving object in each finished frame.
[0] https://www.youtube.com/watch?v=Q60mSMmhTZU [1] https://x.com/rowancheung/status/1641519493447819268
- iamjackg 9 months ago
  A long time ago I did some work to do exactly this in an automated fashion using ffmpeg. It wasn't perfect, but it was better than nothing. I tried going back through my bash history, and the last related entry was this command line:
  ffmpeg -i C0119.MP4 -vf vidstabtransform=interpol=no:crop=black:optzoom=0:zoom=0:smoothing=0:debug=1:input="weirdzoom.trf",unsharp=5:5:0.8:3:3:0.4 kittens-stabilized.mp4
  I think the trick was to set all the stabilization parameters to 0 and crop=black to force ffmpeg to move the image around as much as necessary and zoom everything out.
  EDIT: nevermind, it was more complicated than that. I actually wrote a Python script that modified the motion tracking information generated by ffmpeg to reduce the zoom amount and fit everything within a 1920x1080 frame. Man, I wish I'd added comments to this.
  The https://www.reddit.com/r/ImageStabilization/ subreddit has a lot of posts in that style, but from the research I did it seems like it's mostly done manually by lining up each frame as a separate layer and then rendering an animation that adds one layer per frame.
- chompychop 9 months ago
  I believe the term in research literature for this is "Dynamic Video Synopsis". Check this CVPR 2006 paper for instance: https://www.cs.huji.ac.il/~peleg/papers/cvpr06-synopsis.pdf
- undefined 9 months ago
  [deleted]
tsumnia 10 months ago
Nicely done and keep up the practice. I recall during my Masters needing to translate facial landmark points from a Cartesian coordinate system into points that could would appear on their respective images. It wasn't for anything major, I just wanted a visual representation of my work. Its these little "neat" projects that help build larger breakthroughs.
wmanley 9 months ago
See also: Hugin - Panorama photo stitcher. I used to use a lot back in ~2006 for making panoramas. It automatically finds "control points" in your photos, figures out which ones are shared between the photos and uses that information to determine the relative positions of the photos, and your lens parameters.
Once it does that it can stitch the photos together. It does this by projecting the photos onto a sphere, and then taking a picture of that sphere using whatever lens parameters you want.
https://hugin.sourceforge.io/
- kouru225 9 months ago
  This app never works for me and I don’t know why. Photoshop’s auto merge always works but this one doesn’t
lugao 9 months ago
Why a naive pixel matching library got so many likes here?
- debo_ 9 months ago
  Because it's nice to see people trying things for themselves, even if they aren't novel?
kouru225 9 months ago
Oh shit thank you so much. I’ve always had to use photoshop for this
- C-Naoki 9 months ago
  I'm glad it was helpful!
a257 10 months ago
In the biomedical sciences, we typically use a tool called BigStitcher [0], which is bundled with ImageJ [1]
[0] https://www.nature.com/articles/s41592-019-0501-0 [1] https://imagej.net/plugins/bigstitcher/
- undefined 9 months ago
  [deleted]