Optimization: Cash kernels and move creation to correct device #227
Labels
enhancement
Making some part of the codebase better without introduction of new features
feature
New feature or request
Is your feature request related to a problem? Please describe.
Most metrics use some kind of kernels for extraction of image features.
Now those kernels are created each time when metric is called. That's slow and redundant when the function is called hundreds of times.
Another problem related to performance is creation of temporal tensors first on CPU and later moving them to GPU if needed (
.to(x.device)
).Describe the solution you'd like
kernel
param to functional API. Create and store kernel in class API.to(x.device)
and simmular calls to explicit creation of tensor on target device. Liketorch.ones(N, N, device=x.device)
.Additional context
For metrics that use more than one kernel implementation details can be discussed.
The text was updated successfully, but these errors were encountered: