Fast Blur #2302

torfmaster · 2024-08-02T08:55:43Z

PR

This is an attempt to solve #986 as the current solution (using
fastblur) is fragile and requires the consumers to access raw buffers
of the image which is undesirable.
The implementation is inspired by
this article
which is in turn based on Kovesi, P.: Fast Almost-Gaussian Filtering The Australian Pattern
Recognition Society Conference: DICTA 2010. December 2010. Sydney.

The short story is that the real gaussian blur takes about 30 second for
real-world photos which is undesirable. The algorithm here is quick and
has pretty good results.

Questions

Is this PR ready?

No, but I want to have some feedback on the implementation so I can move on (in the right direction).

Are you aware of #2238?

Yes, but I don't know how to deal with it. blur is not yet deprecated and
users of the library will search for a fast alternative in this crate,
not in imageproc. I am open to move my code there now or later.
The advantage of having it here would be that I do not suffer from
the non_exhaustive of DynamicImage and can implement blur
for this type (if desired).

Why didn't you simply c&p the code from the blurslice crate (or the article)?

The blurslice crate is distributed under the MIT license and
re-licencing the code under MIT/Apache2 would require the approval
of all contributors of the crate.
On the other hand it doesn't integrate well with the principal
structures DynamicImage and ImageBuffer. That's why I
decided to re-implement the algorithm. I have to admit here are some
details in both implementations that I don't understand and where
I couldn't find an explanation. Hence I just did my own implementation
that I think I understand.

What about the performance?

I didn't check the performance (and didn't optimize it).
But opposed to the blur implementation the blurring finishes
in a reasonable amount of time.
As soon as the interface is fixed (i.e. whether to accept GenericImageView or
DynamicImage) I can start to performance tweaks.

What about tests?

I can't really imagine how to represent functional tests. However, I plan to

write benchmarks as soon I dig into performance tuning
write trivial tests (sigma=1 is a no op)
write kani tests that make sure that the potentially panicking slice access doesn't panic

Is the interface fixed?

Currently, I accept an ImageBuffer simply because it lets me operate on slices rather
than using another interface to access pixels. However, it might make sense
to also accept an DynamicImage (can be easily accomplished) or a
GenericImageView (requires a complete rewrite of the algorithm).

Will the "old" blur stay?

The fast blur algorithm is inaccurate but for enough for many applications. I would recommend
that the "real" blur stays. It is not much code anyways as it is implemented general convolution
with gaussian kernel.

License

I license past and future contributions under the dual MIT/Apache-2.0 license,
allowing licensees to choose either at their option.

Todos

add reference to fast_blur in docs for blur
implement fast_blur for DynamicImage
work on clippy failures
fix public_private_dependencies job
write tests for index handling (kani or quickcheck)
write benchmarks and optimize performance
rebase interactively to create atomic commits

ripytide · 2024-08-02T16:58:08Z

Wow great work!

Are you aware of #2238?
Yes, but I don't know how to deal with it. blur is not yet deprecated and
users of the library will search for a fast alternative in this crate,
not in imageproc. I am open to move my code there now or later.
The advantage of having it here would be that I do not suffer from
the non_exhaustive of DynamicImage and can implement blur
for this type (if desired).

Long-term I think this function should belong in imageproc since it is a image processing operation, for short-term, since a lot of imageops are yet to be deprecated it could be added temporarily in image and then copied across later but that seems like extra work to me.

Maybe we could add a note in the docs of the "slow" blur method in this crate that points to this faster method in the imageproc crate?

Regarding DynamicImage you can use the dynamic_map macro, (not sure if it's currently exported from the crate but it's easy enough to redefine yourself):

use image::{DynamicImage, ImageBuffer, Pixel};

macro_rules! dynamic_map(
        ($dynimage: expr, $image: pat => $action: expr) => ({
            use DynamicImage::*;
            match $dynimage {
                ImageLuma8($image) => ImageLuma8($action),
                ImageLumaA8($image) => ImageLumaA8($action),
                ImageRgb8($image) => ImageRgb8($action),
                ImageRgba8($image) => ImageRgba8($action),
                ImageLuma16($image) => ImageLuma16($action),
                ImageLumaA16($image) => ImageLumaA16($action),
                ImageRgb16($image) => ImageRgb16($action),
                ImageRgba16($image) => ImageRgba16($action),
                ImageRgb32F($image) => ImageRgb32F($action),
                ImageRgba32F($image) => ImageRgba32F($action),
                _ => unreachable!(),
            }
        });

        ($dynimage: expr, $image:pat_param, $action: expr) => (
            match $dynimage {
                DynamicImage::ImageLuma8($image) => $action,
                DynamicImage::ImageLumaA8($image) => $action,
                DynamicImage::ImageRgb8($image) => $action,
                DynamicImage::ImageRgba8($image) => $action,
                DynamicImage::ImageLuma16($image) => $action,
                DynamicImage::ImageLumaA16($image) => $action,
                DynamicImage::ImageRgb16($image) => $action,
                DynamicImage::ImageRgba16($image) => $action,
                DynamicImage::ImageRgb32F($image) => $action,
                DynamicImage::ImageRgba32F($image) => $action,
                _ => unreachable!(),
            }
        );
);

fn main() {
    let image1 = DynamicImage::new_luma8(3, 3);
    let image2 = DynamicImage::new_rgb8(3, 3);

    let image1 = dynamic_map!(image1, i => none(i));
    let image2 = dynamic_map!(image2, i => none(i));
}

fn none<P: Pixel, Container>(
    _image: ImageBuffer<P, Vec<Container>>,
) -> ImageBuffer<P, Vec<Container>> {
    todo!()
}

torfmaster · 2024-08-02T20:16:03Z

Thank you for your comments. I added tasks for your (and my points) so we can track progress more easily.

torfmaster · 2024-08-03T20:50:32Z

I have pushed a rather coarse fix for the private dependency checks. However, I don't know how to deal with these checks. I found no easy way of systematically fix this. The main problem is, that I can't use the public dependency property in the regular Cargo.toml as the feature is unstable and can't be used in stable Rust. Hence, to use this feature we would have to

add the cargo feature in the github action (as it is currently done) using a shell script
specify the rayon crate as public using a shell script (very ugly)

To make this PR mergeable I hacked an "allow" for this llint to buffer_par knowing that clippy won't complain about allowing an unknown lint. However, this solution is likely to fail in the future.

torfmaster · 2024-08-04T11:49:48Z

I did some initial benchmarking: on my intel i9 machine the bench (1024x768 rgb) takes 680ms with the true gaussian blur whereas fast_blur takes 57 ms. I checked in both benchmarks.
Next I will try to find some bottlenecks in the implementation.

etemesi254 · 2024-08-04T17:08:46Z

Hi, some ways to improve noted in my comments on zune-imageprocs (author of the library)

https://github.com/etemesi254/zune-image/blob/734bdc7bfcd58860b0a92dde0a98055a1f94b422/crates/zune-imageprocs/src/box_blur.rs#L197-L220

Never did the loop unrolling to save ilp, but others were implemented, really nice properties like no bounds check in inner loop, may be worth investigating (see https://godbolt.org/z/zexPEd976)

See https://fgiesen.wordpress.com/2012/07/30/fast-blurs-1/ and https://fgiesen.wordpress.com/2012/08/01/fast-blurs-2/ for the theories behind it

torfmaster · 2024-08-04T17:53:50Z

I invested some time into performance optimization:

simplify leftmost value by r * f(0)
get rid of bound checks by asserting the length of the samples
get rid of conditionals checking for lower and upper bounds

None of these measures produced any results. I am leaving the code as it is for now (if no one comes up with another idea, of course).

torfmaster · 2024-08-04T17:57:15Z

Furthermore, I see no need for kani/quickcheck tests. First the bounds are obviously correct. On the other hand there is no point in using unsafe so securing the code using kani or quickcheck is unnecessary.
Regarding the other tests I added tests for edge cases of sigma. If you have more ideas regarding reasonable tests I am happy to implement them.
I feel the PR is close to "ready for merge" (apart from squashing and reordering the commits) and your comments, of course.

torfmaster · 2024-08-05T09:42:20Z

@etemesi254:
I now implemented the performance improvements I am able to understand (that is I used transposing to improve usage of CPU caches). The algorithmic tweaks were already in place, though. I total I could speed up the blurring by 15 per cent. I feel I am not able to understand the more complicated improvements as I am lacking deeper assembly language knowledge.

etemesi254 · 2024-08-05T10:07:21Z

Nice.

FYI these were the benchmark results for gaussian blur for big images before this commit . (image-rs, libvips,zune-image)

https://etemesi254.github.io/assets/criterion/imageprocs_%20gaussian%20blur/report/index.html

torfmaster · 2024-08-05T20:35:28Z

My work here is done for now and I would be happy about a code review

torfmaster · 2024-08-12T14:34:26Z

@ripytide sorry to bother you (but you already commented on the issue): Do you know what would be the next steps to merge this PR (and get it released)?
Thank you!

ripytide · 2024-08-12T17:26:59Z

You'll need one of the maintainers signoff, of which I am not one.

Also, I don't think this function belongs in this library and should instead be in the imageproc library.

torfmaster · 2024-08-13T17:29:42Z

Thank you for your feedback. I created a PR in imageproc as well (image-rs/imageproc#683).

HeroicKatora

Code review time :)

src/imageops/fast_blur.rs

examples/fast_blur/lenna.png

src/imageops/fast_blur.rs

src/imageops/mod.rs

torfmaster · 2024-09-04T12:39:14Z

I just wanted to give a short status report (I also cross post this comment to image-rs/imageproc#683)

I added a functional test showing (for an example) that the effect of fast_blur is close to the Gaussian blur implementation in this library
I added a benchmark that shows that fast blur is much faster than the current blur implementation
I think I addressed all code review comments

What would be the next steps?

torfmaster · 2024-09-11T04:17:34Z

@HeroicKatora sorry for pinging you directly but you already started a review. Is there anything I can do to support the review of this PR? Thank you a lot!

src/buffer_par.rs

Shnatsel · 2024-09-12T12:06:53Z

I've written the following example code for blur - feel free to include it in the PR under examples/:

use std::env;
use std::error::Error;
use std::path::Path;
use std::process;

fn main() -> Result<(), Box<dyn Error>> {
    // Collect command-line arguments
    let args: Vec<String> = env::args().collect();

    // Ensure that we have 2 CLI arguments
    if args.len() != 3 {
        eprintln!("Usage: {} <path> <radius>", args[0]);
        process::exit(1);
    }

    let radius: f32 = args[2].parse()?;

    // Load the input image
    let path_str = &args[1];
    let path = Path::new(path_str);
    let input = image::open(path)?;

    // Correct but slow Gaussian blur
    let mut out_path_gauss = path.to_owned();
    out_path_gauss.set_extension("gaussian.png");

    let correct_but_slow = input.blur(radius);
    correct_but_slow.save(out_path_gauss)?;

    // Approximate but fast blur
    let mut out_path_fast = path.to_owned();
    out_path_fast.set_extension("fast.png");

    let approximate_but_fast = input.fast_blur(radius);
    approximate_but_fast.save(out_path_fast)?;

    Ok(())
}

With this test image both blur implementation show artifacts:

blur-test-input

It's just a white rectangle in the middle, surrounded by fully transparent pixels.

Both Gaussian blur in image and the fast blur implementation bleed color from fully transparent pixels into the white rectangle, resulting in the following image:

The fully transparent pixels have R set to 255 while all the other channels are set to 0, and that red color bleeds into the white rectangle.

The expected blur as produced by GIMP is like this - blurred white rectangle without any red color:

blurred-gimp-20px

To avoid such artifacts, the pixel's contribution to the color change should be weighted by its alpha channel value. Fully transparent pixels should get multiplied by 0 and not contribute to the changes in non-alpha channels at all.

But since both blur implementations are incorrect, I'll leave deciding how to handle this up to the maintainers.

torfmaster mentioned this pull request Aug 2, 2024

blur function is too slow #986

Open

torfmaster force-pushed the feature/fast-blur branch 7 times, most recently from 7ac2d69 to 2339f81 Compare August 3, 2024 13:06

torfmaster force-pushed the feature/fast-blur branch from 8dcc049 to 7d1683e Compare August 4, 2024 11:01

torfmaster force-pushed the feature/fast-blur branch from 7d1683e to da68a1b Compare August 4, 2024 11:51

torfmaster force-pushed the feature/fast-blur branch 2 times, most recently from fc50c64 to cdd617d Compare August 5, 2024 20:29

torfmaster mentioned this pull request Aug 13, 2024

Add fast blur algorithm image-rs/imageproc#683

Open

HeroicKatora reviewed Aug 15, 2024

View reviewed changes

torfmaster force-pushed the feature/fast-blur branch from fa64780 to e7eefdb Compare August 20, 2024 21:03

torfmaster force-pushed the feature/fast-blur branch 2 times, most recently from 1592ff5 to d672675 Compare September 4, 2024 12:24

torfmaster force-pushed the feature/fast-blur branch 3 times, most recently from 25902b9 to 4737c29 Compare September 4, 2024 12:35

initial implementation of fast blur

e5672ff

torfmaster force-pushed the feature/fast-blur branch from 4737c29 to b4f75eb Compare September 4, 2024 12:40

Shnatsel reviewed Sep 12, 2024

View reviewed changes

src/buffer_par.rs Outdated Show resolved Hide resolved

torfmaster force-pushed the feature/fast-blur branch from b4f75eb to e5672ff Compare September 12, 2024 11:06

Shnatsel mentioned this pull request Sep 13, 2024

Blur for images with alpha is incorrect, bleeds color from transparent pixels #2324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Blur #2302

Fast Blur #2302

torfmaster commented Aug 2, 2024 •

edited

Loading

ripytide commented Aug 2, 2024

torfmaster commented Aug 2, 2024

torfmaster commented Aug 3, 2024

torfmaster commented Aug 4, 2024

etemesi254 commented Aug 4, 2024 •

edited

Loading

torfmaster commented Aug 4, 2024

torfmaster commented Aug 4, 2024 •

edited

Loading

torfmaster commented Aug 5, 2024

etemesi254 commented Aug 5, 2024 •

edited

Loading

torfmaster commented Aug 5, 2024

torfmaster commented Aug 12, 2024

ripytide commented Aug 12, 2024

torfmaster commented Aug 13, 2024

HeroicKatora left a comment

torfmaster commented Sep 4, 2024

torfmaster commented Sep 11, 2024

Shnatsel commented Sep 12, 2024 •

edited

Loading

Fast Blur #2302

Are you sure you want to change the base?

Fast Blur #2302

Conversation

torfmaster commented Aug 2, 2024 • edited Loading

PR

Questions

Is this PR ready?

Are you aware of #2238?

Why didn't you simply c&p the code from the blurslice crate (or the article)?

What about the performance?

What about tests?

Is the interface fixed?

Will the "old" blur stay?

License

Todos

ripytide commented Aug 2, 2024

torfmaster commented Aug 2, 2024

torfmaster commented Aug 3, 2024

torfmaster commented Aug 4, 2024

etemesi254 commented Aug 4, 2024 • edited Loading

torfmaster commented Aug 4, 2024

torfmaster commented Aug 4, 2024 • edited Loading

torfmaster commented Aug 5, 2024

etemesi254 commented Aug 5, 2024 • edited Loading

torfmaster commented Aug 5, 2024

torfmaster commented Aug 12, 2024

ripytide commented Aug 12, 2024

torfmaster commented Aug 13, 2024

HeroicKatora left a comment

Choose a reason for hiding this comment

torfmaster commented Sep 4, 2024

torfmaster commented Sep 11, 2024

Shnatsel commented Sep 12, 2024 • edited Loading

torfmaster commented Aug 2, 2024 •

edited

Loading

etemesi254 commented Aug 4, 2024 •

edited

Loading

torfmaster commented Aug 4, 2024 •

edited

Loading

etemesi254 commented Aug 5, 2024 •

edited

Loading

Shnatsel commented Sep 12, 2024 •

edited

Loading