Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy evaluation/cache for most acme objects #21

Open
ppizarror opened this issue Apr 5, 2023 · 5 comments
Open

Lazy evaluation/cache for most acme objects #21

ppizarror opened this issue Apr 5, 2023 · 5 comments

Comments

@ppizarror
Copy link
Contributor

Hi; during the profiling of our app, I noticed that a significant amount of samples during segment collinear checking belong to toUnitVector. This method is called million times; thus, I thought of caching the unit vector, vector, and length just to test how the performance changes.

As you can see in #20, I stored these values (by calling a new method update()) and got a massive performance change. My test:

#include <gtest/gtest.h>
#include <acme/acme.h>
#include <chrono>

using namespace acme;
using std::chrono::duration;
using std::chrono::high_resolution_clock;

TEST(AcmeTest, CachedSegmentUnitVector) {
  auto t1 = high_resolution_clock::now();
  segment s(acme::point(0, 0, 0), acme::point(1, 1, 1));
  for (int i = 0; i < 1000000; i += 1) {
    ASSERT_TRUE(IsCollinear(s, s));
  }
  duration<double, std::milli> ms_double = high_resolution_clock::now() - t1;
  std::cout << "Segment collinear time " << ms_double.count() / 1000 << " s" << std::endl;
}

Without using any cache (the current acme):

Segment collinear time 2.27808 s

Using cache:

Segment collinear time 0.646207 s

That is a massive x3.5 performance improvement. However, my proposal in #20 is not the best approach, mainly because I need to call update() each time a vertex is changed. However, if the segment vertex reference is removed:

//! Get segment i-th vertex reference
    point &
    vertex(
      integer i //!< Input segment i-th vertex index
    );

And we introduce a method named segment.updateVertex(point &p1, point &p2), this method can call update() automatically. Also, we can further improve the performance by assembling each value granularly; that is, we compute the vector once it is required (lazy evaluation).

This cache could also be propagated to the rest of the acme objects. Let me know what you think

@ppizarror
Copy link
Contributor Author

ppizarror commented Apr 5, 2023

@StoccoDavide, I've implemented the version with updateVertex in #22. As a reference, for our software, the difference between segment cache/without cache:

  • Without segment cache:
    Graph initialization finished after 13.58 s

  • With cache:
    Graph initialization finished after 10.63 s

As a reference, the graph contains more than 300.000 segments and up to 2 million triangles, which must verify a hundred million wall crossings. On average, this small #21 PR produces a 1.25x performance improvement. I don't know how much it can be improved if acme::plane, acme::point, acme::ray, acme::triangle, and others incorporate lazy evaluation and cache.

I can implement the cache on every object if you agree with these changes 😄 (for new v5). Now I'll update our acme objects and see which is the performance improvement by adding cache to triangle.

PD: obviously, all tests passed

@StoccoDavide
Copy link
Owner

Hi!
Thank you for your updates share! You nailed it, respect! 🫡
I agree with you that this new feature should be propagated on the other acme objects. It should improve the performance also in our applications.
If you are willing to implement the cache on he other objects you are welcome, I'll pull your updates asap!

Cheers,
Davide

@StoccoDavide
Copy link
Owner

Let me know which is the performance improvement on your software! 😁

@ppizarror
Copy link
Contributor Author

ppizarror commented Apr 5, 2023

I think the #21 approach is better, as manually calling update() can lead to errors. Also, in the end-user case, it should not introduce many changes.

Now, to maintain the same consistency on getting non-const references, maybe point &vertex( integer i); can throw an acme error that says "use updateVertex" instead... I tested each object, but the rest of the acme objects do not benefit from the cache, except acme::triangle. For example, caching ray.toUnitVector() does not improve performance.

Now, for cache, I stored centroid and normal. With the following test:

TEST(AcmeTest, CachedTriangle) {
  auto t1 = high_resolution_clock::now();
  triangle t(acme::point(0, 0, 0), acme::point(1, 1, 0), acme::point(0.5, 0.5, 0));
  for (int i = 0; i < 1000000; i += 1) {
    ASSERT_TRUE(IsParallel(t, t));
  }
  duration<double, std::milli> ms_double = high_resolution_clock::now() - t1;
  std::cout << "Triangle parallel " << ms_double.count() / 1000 << " s" << std::endl;
}

I got the following results:

Triangle parallel 2.06987 s
Cached triangle parallel 0.242824 s

Which represents an x8.5 improvement (note that this factor should be near 1 when comparing two triangles once, but generally, we tend to compare the same triangle with many others, many times). To do so, I needed to remove point &vertex( integer i) and point &operator[]( integer i) from triangle. It is up to you which approach we should follow. Delete (as PR), or throw an exception.

Now, for my software, I made a comparison with cache & non-cache (measures in seconds), running it many times:

No. NON CACHE CACHE
1 11.89 11.57
2 11.31 10.85
3 12.5 11.52
4 12.01 11.52
5 11.46 10.73
6 12.64 10.61
7 12.04 11.21
8 13.52 10.53
9 11.23 10.65
10 11.33 11.01
11 11.76 10.66
12 11.41 10.69
13 11.25 10.7
14 12.02 10.64
15 12.57 10.8
16 11.54 10.55
17 13.01 10.6
18 11.92 10.96
19 11.46 10.65
20 11.52 11.05
21 11.51 10.61
22 11.34 10.8
23 11.69 10.76
24 11.51 11.15
25 11.3 10.77
MEAN 11.83 10.86
SPEEDUP - 1.089

This should be the performance improvement for a real-case scenario. Now, I have to do more checking between the same segments (for multiple pipelines); thus, the performance should increase as the segments are stored as pointers, thus, the cached properties will be reused many times.

@StoccoDavide
Copy link
Owner

I agree with you on the fact that point &vertex( integer i); can throw an acme error... Caching I ray.toUnitVector()is not useful because the ray direction in already stored as a unit vector and this method just return the reference, which is cheap in terms of computation time. On the other hand, as you have noticed the centroid and normal of the triangle are relatively expensive to compute. For the same reason, in the enve project I inherited the triangle and stored some cache in the child class triangleground.

I don't know if at this point it would be better for you to implement an extended (inherited) class of the acme objects with the cached data that you need (like I did). In this way we leave acme untouched thus keeping it more generalist and less specialised for the particular cases like enve and you software. What you do think?

Cheers, Davide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants