Lazy evaluation/cache for most acme objects #21

ppizarror · 2023-04-05T14:27:51Z

Hi; during the profiling of our app, I noticed that a significant amount of samples during segment collinear checking belong to toUnitVector. This method is called million times; thus, I thought of caching the unit vector, vector, and length just to test how the performance changes.

As you can see in #20, I stored these values (by calling a new method update()) and got a massive performance change. My test:

#include <gtest/gtest.h>
#include <acme/acme.h>
#include <chrono>

using namespace acme;
using std::chrono::duration;
using std::chrono::high_resolution_clock;

TEST(AcmeTest, CachedSegmentUnitVector) {
  auto t1 = high_resolution_clock::now();
  segment s(acme::point(0, 0, 0), acme::point(1, 1, 1));
  for (int i = 0; i < 1000000; i += 1) {
    ASSERT_TRUE(IsCollinear(s, s));
  }
  duration<double, std::milli> ms_double = high_resolution_clock::now() - t1;
  std::cout << "Segment collinear time " << ms_double.count() / 1000 << " s" << std::endl;
}

Without using any cache (the current acme):

Segment collinear time 2.27808 s

Using cache:

Segment collinear time 0.646207 s

That is a massive x3.5 performance improvement. However, my proposal in #20 is not the best approach, mainly because I need to call update() each time a vertex is changed. However, if the segment vertex reference is removed:

//! Get segment i-th vertex reference
    point &
    vertex(
      integer i //!< Input segment i-th vertex index
    );

And we introduce a method named segment.updateVertex(point &p1, point &p2), this method can call update() automatically. Also, we can further improve the performance by assembling each value granularly; that is, we compute the vector once it is required (lazy evaluation).

This cache could also be propagated to the rest of the acme objects. Let me know what you think

The text was updated successfully, but these errors were encountered:

ppizarror · 2023-04-05T14:49:37Z

@StoccoDavide, I've implemented the version with updateVertex in #22. As a reference, for our software, the difference between segment cache/without cache:

Without segment cache:
Graph initialization finished after 13.58 s
With cache:
Graph initialization finished after 10.63 s

As a reference, the graph contains more than 300.000 segments and up to 2 million triangles, which must verify a hundred million wall crossings. On average, this small #21 PR produces a 1.25x performance improvement. I don't know how much it can be improved if acme::plane, acme::point, acme::ray, acme::triangle, and others incorporate lazy evaluation and cache.

I can implement the cache on every object if you agree with these changes 😄 (for new v5). Now I'll update our acme objects and see which is the performance improvement by adding cache to triangle.

PD: obviously, all tests passed

StoccoDavide · 2023-04-05T15:00:37Z

Hi!
Thank you for your updates share! You nailed it, respect! 🫡
I agree with you that this new feature should be propagated on the other acme objects. It should improve the performance also in our applications.
If you are willing to implement the cache on he other objects you are welcome, I'll pull your updates asap!

Cheers,
Davide

StoccoDavide · 2023-04-05T15:02:22Z

Let me know which is the performance improvement on your software! 😁

ppizarror · 2023-04-05T17:27:43Z

I think the #21 approach is better, as manually calling update() can lead to errors. Also, in the end-user case, it should not introduce many changes.

Now, to maintain the same consistency on getting non-const references, maybe point &vertex( integer i); can throw an acme error that says "use updateVertex" instead... I tested each object, but the rest of the acme objects do not benefit from the cache, except acme::triangle. For example, caching ray.toUnitVector() does not improve performance.

Now, for cache, I stored centroid and normal. With the following test:

TEST(AcmeTest, CachedTriangle) {
  auto t1 = high_resolution_clock::now();
  triangle t(acme::point(0, 0, 0), acme::point(1, 1, 0), acme::point(0.5, 0.5, 0));
  for (int i = 0; i < 1000000; i += 1) {
    ASSERT_TRUE(IsParallel(t, t));
  }
  duration<double, std::milli> ms_double = high_resolution_clock::now() - t1;
  std::cout << "Triangle parallel " << ms_double.count() / 1000 << " s" << std::endl;
}

I got the following results:

Triangle parallel 2.06987 s
Cached triangle parallel 0.242824 s

Which represents an x8.5 improvement (note that this factor should be near 1 when comparing two triangles once, but generally, we tend to compare the same triangle with many others, many times). To do so, I needed to remove point &vertex( integer i) and point &operator[]( integer i) from triangle. It is up to you which approach we should follow. Delete (as PR), or throw an exception.

Now, for my software, I made a comparison with cache & non-cache (measures in seconds), running it many times:

No.	NON CACHE	CACHE
1	11.89	11.57
2	11.31	10.85
3	12.5	11.52
4	12.01	11.52
5	11.46	10.73
6	12.64	10.61
7	12.04	11.21
8	13.52	10.53
9	11.23	10.65
10	11.33	11.01
11	11.76	10.66
12	11.41	10.69
13	11.25	10.7
14	12.02	10.64
15	12.57	10.8
16	11.54	10.55
17	13.01	10.6
18	11.92	10.96
19	11.46	10.65
20	11.52	11.05
21	11.51	10.61
22	11.34	10.8
23	11.69	10.76
24	11.51	11.15
25	11.3	10.77
MEAN	11.83	10.86
SPEEDUP	-	1.089

This should be the performance improvement for a real-case scenario. Now, I have to do more checking between the same segments (for multiple pipelines); thus, the performance should increase as the segments are stored as pointers, thus, the cached properties will be reused many times.

StoccoDavide · 2023-04-05T19:36:41Z

I agree with you on the fact that point &vertex( integer i); can throw an acme error... Caching I ray.toUnitVector()is not useful because the ray direction in already stored as a unit vector and this method just return the reference, which is cheap in terms of computation time. On the other hand, as you have noticed the centroid and normal of the triangle are relatively expensive to compute. For the same reason, in the enve project I inherited the triangle and stored some cache in the child class triangleground.

I don't know if at this point it would be better for you to implement an extended (inherited) class of the acme objects with the cached data that you need (like I did). In this way we leave acme untouched thus keeping it more generalist and less specialised for the particular cases like enve and you software. What you do think?

Cheers, Davide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy evaluation/cache for most acme objects #21

Lazy evaluation/cache for most acme objects #21

ppizarror commented Apr 5, 2023

ppizarror commented Apr 5, 2023 •

edited

Loading

StoccoDavide commented Apr 5, 2023

StoccoDavide commented Apr 5, 2023

ppizarror commented Apr 5, 2023 •

edited

Loading

StoccoDavide commented Apr 5, 2023

Lazy evaluation/cache for most acme objects #21

Lazy evaluation/cache for most acme objects #21

Comments

ppizarror commented Apr 5, 2023

ppizarror commented Apr 5, 2023 • edited Loading

StoccoDavide commented Apr 5, 2023

StoccoDavide commented Apr 5, 2023

ppizarror commented Apr 5, 2023 • edited Loading

StoccoDavide commented Apr 5, 2023

ppizarror commented Apr 5, 2023 •

edited

Loading

ppizarror commented Apr 5, 2023 •

edited

Loading