Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globe picking is very slow #8481

Open
IanLilleyT opened this issue Dec 19, 2019 · 19 comments
Open

Globe picking is very slow #8481

IanLilleyT opened this issue Dec 19, 2019 · 19 comments

Comments

@IanLilleyT
Copy link
Contributor

IanLilleyT commented Dec 19, 2019

The current approach for globe picking is intersect the ray against all rendered tiles (using their bounding spheres) and sort the intersected tiles by closeness to ray. Then, for the closest tile do ray-triangle intersection tests against all of its triangles and return the closest. If there was none, test the next tile. And repeat.

This could be optimized some more. ScreenSpaceCameraController does one or two globe picks per frame and that alone can slow down Cesium severely with detailed terrain providers like ArcGIS.

Some ideas on how to improve this:

  • Custom intersection test for heightmap providers since we know their triangles are on a grid
  • Per-tile octree for non-heightmap providers (too slow to generate?)
  • Remove TerrainEncoding quantization for faster position decode from vertex buffer (only affects nearby tiles)
  • Cache recently used triangles in a form that makes future picks faster, maybe by pre-calculating some of the math from IntersectionTests.rayTriangleParametric.
  • Unroll picking math

Sandcastle (console prints how long each pick takes with left click). Select ArcGIS from the dropdown for some seriously slow speeds.

WIP branch

@OmarShehata
Copy link
Contributor

This actually came up twice this week on the forum:

@stuartmarsden
Copy link

I have the same issue trying to use ARCGIS terrain. Is there a way to limit the resolution to get acceptable speed as a trade off? I tried disabling the camera collision check and it did not make much difference.

@IanLilleyT
Copy link
Contributor Author

@stuartmarsden Unfortunately I don't think there's a way to limit the tile resolution, though adding a maxLevel parameter to TerrainProvider could be a useful feature. And there are a few different systems that do globe picking, so I'm not surprised you're still seeing slowness problems. A hacky way to turn off all collision checks is to return undefined inside GlobeSurfaceTile.prototype.pick, but I don't know how many systems that will break.

But there has been some progress on this issue. I've implemented an octree acceleration structure that does very fast ray-tile intersection tests and hooks in with all terrain providers seamlessly, but it has to generate the octree at tile load time which can cause serious delays (ArcGIS has about 130k triangles per tile and takes 100-300ms to process, whereas Cesium World Terrain has about 10k triangles per tile and 3-20ms to process). Maybe the system could be optimized some more, but I'm feeling like generating the octree at tile load time won't work for ArcGIS. It might not even work for CWT.

But there might be a solution that doesn't involve an octree. Since we know the ArcGIS data is a heightmap and thus well structured, we can find a line of triangles within the tile that have potential to be intersected. We can start with one triangle and search along its neighbors that have vertices on either side of the ray, stopping once the edge of the tile is reached. It feels like we could get away with testing even fewer triangles than that, but I haven't thought it through yet. I don't think this will be faster than the octree intersection, but it will be a lot better than testing every triangle.
Screen Shot 2020-06-02 at 11 32 12 AM

So ultimately there will be a different solution for CWT/quantized mesh and ArcGIS/heightmaps. CWT will probably include the octree structure in the tile payload and ArcGIS/heightmaps will use this heightmap-only algorithm. Not sure when all of this will be ready, but I'll keep working on it in the background.

@stuartmarsden
Copy link

It would seem that this is supposed to do what I want but it does not seem to make a difference.

https://cesium.com/docs/cesiumjs-ref-doc/TerrainProvider.html#.heightmapTerrainQuality

@IanLilleyT
Copy link
Contributor Author

Ah I forgot about that option. I was messing around with it and couldn't get it do much either. I think it's supposed to control which LODs get rendered, but since all ArcGIS tiles have the same number of triangles regardless of LOD, it will make no performance difference for picking.

@smills2929
Copy link

Is this something that is getting any attention? It definitely causes significant performance issues for us.

@IanLilleyT
Copy link
Contributor Author

IanLilleyT commented Jul 23, 2020

Sorry for the lack of updates. The strategy is still the same but I haven't had time to work on this (and not sure when I will). The main piece that's missing is the ray-heightmap intersection which would be similar to the algorithm described here: https://www.gamedev.net/forums/topic/672036-intersect-ray-with-heightmap/5254121/. We're dealing with a 2D grid on a curved surface (earth), so everything needs to be done with Cartographic math. If anyone would like to implement such an algorithm, even if it's not fully hooked into the cesium terrain system, that would be very helpful.

@stuartmarsden
Copy link

It would be good in the interim to have a simple way to reduce the resolution for heightmaps. I don't know where to look in the code. I assume it just makes a simple grid based on the resolution of the heightmap image. So should just be possible to only sample every set number of pixels or better if you could average. You will get worse terrain but at least it would be usable. Obviously expose the divisor number to let it be tweaked.

@IanLilleyT
Copy link
Contributor Author

The API gets in the way here. All terrain providers, heightmap or not, go through the public TerrainProvider interface. Any property we add to that has to be supported by non-heightmap terrain as well and that becomes a lot more difficult.

We could potentially get around this by adding some decimation property to HeightmapTerrainData's structure input, but then the user would have to manually change this property inside the ArcGISTiledElevationTerrainProvider.js source for it to work. If someone is using their own heightmap terrain provider, this wouldn't be a problem.

One other hacky solution that nearly worked was to change the stride value inside ArcGISTiledElevationTerrainProvider.js to something greater than 1. This effectively skips over pixels in the heightmap. However, the width/height of the heightmap 257, a prime number, so there is no possible stride to pick that doesn't completely butcher the data. I tried and got a black screen. If using a heightmap that's non-prime, this might work.

So I don't really see a great solution here. Adding a public decimation property to ArcGISTiledElevationTerrainProvider is the best solution out of them all, but it doesn't feel great to me. If anyone wants to dig into this some more, for a fork of cesium at the very least, also check out HeightmapTessellator.js for where the heightmap grid is created.

@DanielLeone
Copy link
Contributor

Hi @IanLilleyT 😄

I had a further crack at the octree approach, continuing from where you left off on the fastPicking branch.

The immediate bottleneck I could see, was just transferring the octree data structure across the worker threads. This caused both delays on the worker thread, and the main thread as it read the response. (main thread picture here)

image

I've spent the last few days (learning what heck an octree is) and packing the data structure into a few array buffers before we transfer if across threads. After getting that working, we get really good worker thread transfer speeds 👍 (both on the send and receive sides)

image

And picking time is still just incredible 👍 (hats off for getting the initial octree implementation working!)
image

As you've already mentioned, the bottleneck is now just creating the octree, as I've captured here: 👎
image

However, let's just keep in mind, we're talking about less than 10 frames worth of time here compared to the current Globe.pick() performance 😛 . So even though ~500ms of worker thread time looks bad on paper, we've made that time back ridiculously quickly from the main thread.

For some reference points, these are the pick() times I was getting with ArcGIS terrain and no octree

01:45:11.538 GlobeSurfaceTile.js:212 normal pick: 65.52734375ms
01:45:11.603 GlobeSurfaceTile.js:212 normal pick: 61.9560546875ms
01:45:11.672 GlobeSurfaceTile.js:212 normal pick: 66.257080078125ms
01:45:11.740 GlobeSurfaceTile.js:212 normal pick: 64.552001953125ms
01:45:11.807 GlobeSurfaceTile.js:212 normal pick: 63.507080078125ms
01:45:11.874 GlobeSurfaceTile.js:212 normal pick: 63.15087890625ms

And with the octree on ArcGIS


01:54:08.202 GlobeSurfaceTile.js:178 triangle pick: 0.092041015625ms
01:54:08.239 GlobeSurfaceTile.js:178 triangle pick: 0.090087890625ms
01:54:08.251 GlobeSurfaceTile.js:178 triangle pick: 0.09326171875ms
01:54:08.280 GlobeSurfaceTile.js:178 triangle pick: 0.09619140625ms
01:54:08.285 GlobeSurfaceTile.js:178 triangle pick: 0.10498046875ms
01:54:08.326 GlobeSurfaceTile.js:178 triangle pick: 0.1142578125ms

Compared to Cesium World Terrain without octree

01:32:14.332 GlobeSurfaceTile.js:212 normal pick: 2.52392578125ms
01:32:14.349 GlobeSurfaceTile.js:212 normal pick: 2.48193359375ms
01:32:14.366 GlobeSurfaceTile.js:212 normal pick: 2.51220703125ms
01:32:14.382 GlobeSurfaceTile.js:212 normal pick: 2.505859375ms
01:32:14.399 GlobeSurfaceTile.js:212 normal pick: 2.529296875ms
01:32:14.415 GlobeSurfaceTile.js:212 normal pick: 2.5009765625ms
01:32:14.532 

Cesium World Terrain with octree:

01:53:18.248 GlobeSurfaceTile.js:178 triangle pick: 0.06689453125ms
01:53:18.264 GlobeSurfaceTile.js:178 triangle pick: 0.0888671875ms
01:53:18.281 GlobeSurfaceTile.js:178 triangle pick: 0.066162109375ms
01:53:18.298 GlobeSurfaceTile.js:178 triangle pick: 0.0693359375ms
01:53:18.315 GlobeSurfaceTile.js:178 triangle pick: 0.067138671875ms
01:53:18.331 GlobeSurfaceTile.js:178 triangle pick: 0.06591796875ms
01:53:18.414 

This is quite the improvement for ArcGIS (and CWT) terrain picking performance, compared to the currently unusable setup for us; even with the slow octree generation.

What are your thoughts on this?

From my perspective, I'd be happy to spend a week buttoning up the code and getting this into master in some form.
It seems like a great improvement for both ArcGIS and Cesium World Terrain in terms of main thread performance given the amount of Globe picking Cesium does per frame.

Do you think we'd have a chance of merging something like this?

I think with some more work we can improve the octree generation time 👍

Here's the branch diff if you're interested; it really is just a first pass at proving we could flatten the octree into a typed array. Apart from the that I've hacked it all together :)

Thanks :)

@IanLilleyT
Copy link
Contributor Author

IanLilleyT commented Jul 29, 2020

Wow, thanks for taking this further! Flattening the octree works wonders! FYI for better runtime performance testing, go to the High Dynamic Range sandcastle, which I've hijacked for picking.

10 frames of latency for generating an octee for ArcGIS... still too much I think. It just feels wrong that it takes up almost the entire worker time. Also we don't want to slow down Cesium World Terrain by too much for a feature that won't affect the average user (unless they are doing a lot of picking calls). Even a few frames there can make Cesium feel less responsive. But I feel like there are several areas for optimization. Here's some ideas that come to mind, not sure which will actually work:

  • Add triangles to octree in world space instead of local space. This will avoid having to transform the three triangle vertices. There will need to be a corresponding change in the ray->node intersect code.
  • Reduce the number of allocations and pushes. Use a ManagedArray or something similar that doubles once it gets full.
  • Descend fewer levels in the tree. This would slow down runtime picking though.
  • Faster way to figure out which of the 8 children a triangle belongs to. Perhaps without needing to calculate the triangle's AABB.
  • Make a different version of addTriangles that is faster for heightmaps (and doesn't need triangleVerticesCallback.
  • Avoid calling getOverlap again on an overflow triangle.
  • Loop unrolling inside nodeAddTriangleToChildren?
  • WebAssembly?

There might even be ways of doing this that take a completely different approach than the one here. Open to ideas. I feel like it's possible to make the octree generation at least 4X faster.

@IanLilleyT
Copy link
Contributor Author

Also, feel free to clean up and make a PR with what you have or any new optimizations. It doesn't have to be final. That would at least help this get to the finish line.

@DanielLeone
Copy link
Contributor

Thanks for all the suggestions @IanLilleyT

Unfortunately I can't work on this during the day at the moment, so it's a very spare time thing right now. If yourself or someone else is keen to get this done; then feel free to continue from my additions or not :)

My current plan/work in progress is getting some integration/unit tests working for a CWT tile(s) and ArcGIS tile(s); before I continue optimising the code; and make sure we have no regressions.

I'll add my own two ideas to the optimisation list as well:

  • Make the recursive function iterative
  • Inline the internal functions (getOverlap, nodeAddTriangleToChildren)

I like your ideas above:

  • Add triangles to octree in world space instead of local space. This will avoid having to transform the three triangle vertices. There will need to be a corresponding change in the ray->node intersect code.
    • This is an interesting one, since to reduce the storage space of each node in the octree I've leaned on the fact it's in local space. Good idea though, I'll think about it some more.
  • Reduce the number of allocations and pushes. Use a ManagedArray or something similar that doubles once it gets full.
    • Yep, worth testing.
  • Descend fewer levels in the tree. This would slow down runtime picking though.
    • Yep, definitely need to play with those heuristics.
  • Faster way to figure out which of the 8 children a triangle belongs to. Perhaps without needing to calculate the triangle's AABB.
    • Yep; I'm not sure how to tackle that one though.
  • Make a different version of addTriangles that is faster for heightmaps (and doesn't need triangleVerticesCallback.
    • Do you mind clarifying how this might be possible?
  • Avoid calling getOverlap again on an overflow triangle.
    • Yep, I noticed this as well. Nice one.
  • Loop unrolling inside nodeAddTriangleToChildren?
    • Yep
  • WebAssembly?
    • I'll have to dig more into how the draco decoder is setup? I think that's the only current use of web assembly in Cesium right? I'm thinking once the addTriangles function is a little more optimized it might actually end up simpler that its current state, and would be a good candidate for rewriting in a more suitable language. Would at least be worth testing!

I'll let you know when I get more time to work on this again 👍

@IanLilleyT
Copy link
Contributor Author

IanLilleyT commented Aug 4, 2020

Those two optimization ideas are good. I agree with the plan to optimize the JS version first, and then convert to WebAssembly if necessary. And yeah, draco is the only one so far. So then it's a matter of deciding what programming language to use, etc, etc and may involve input from other team members. So that's more for the distant future, even though I think it could speed things up a decent amount.

Make a different version of addTriangles that is faster for heightmaps (and doesn't need triangleVerticesCallback.

I'd like to avoid calling triangleVerticesCallback for every triangle by instead passing a vertex grid directly to a new function called addTrianglesFromHeightmap. Currently triangleVerticesCallback inside HeightmapTessellator figures out the vertex index from the triangle index, but it's not very optimized.

I'm in the same situation as you, only able to work on this in my spare time. I want to get back to this in around 2-3 weeks, but even then I'm not positive. I'll give an update then.

@jony89
Copy link
Contributor

jony89 commented Feb 12, 2023

@DanielLeone Hey - is there any chance we are getting back to this? your work seems to have dramatic performance change which is necessary for many applications.

@JoshuaKahn
Copy link

Indeed, this is necessary to even begin to use ArcGIS terrain. Please say that this will be implemented in some fashion...

@DanielLeone
Copy link
Contributor

DanielLeone commented Mar 30, 2023

@jony89 @JoshuaKahn

The most production ready version of the code was here: #9961

but as per: #9961 (comment) it was a requirement that it worked with the Dynamic Terrain Exaggeration feature, which it does not.

I suggested some alternatives #9961 (comment)
In summary:

  • Ian suggested some pseudo-code to make it work with on-the-fly exaggeration; but it went over my head, and I selfishly don't have a use for terrain exaggeration so the time investment wasn't there. This can possibly still be done and everyone wins.
  • I suggested we make some Cesium API changes so the improved performance can be implemented in user-land, or we just make it an opt-in for those who don't need dynamic terrain exaggeration.
  • That was it, sorry. I'm guessing ArcGIS terrain performance is really low on the priority list 🤷
@JoshuaKahn
Copy link

@DanielLeone If I didn't care about using Dynamic Terrain Exaggeration - would it still be possible to use that code branch for regular work, or is it too unstable/in dev to be used like that?

@DanielLeone
Copy link
Contributor

@DanielLeone If I didn't care about using Dynamic Terrain Exaggeration - would it still be possible to use that code branch for regular work, or is it too unstable/in dev to be used like that?

You could try, but I wouldn't use it for anything important. It's fairly old now, potentially out of date. I didn't test it a great deal so there's definitely bugs somewhere. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment