You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are having a lot of discussion around lora/quantization and other PEFT strategies that require replacing layers (often dense layers), with parameter efficient replacements.
In torch, a nn.Module will track submodules by name. So you can run module.sub_module = new_sub_module without issue. The old child will be booted out for the new. Same for tf.Module.
This is not the case with Keras layers. Tracking is currently "append only," not by attr name, and locked after build(). To work around this for a LoRA implementation, we currently do the following.
My thoughts are that since both torch and tf allow variable and submodule reassignment, we should probably do the same. Consistency here will cause less headaches. This could also come up for code like this...
defbuild():
self.bias=self.add_weight(...)
...
ifsome_complex_case:
self.bias=None# Nevermind no bias!
I am less sure about locking the tracker after build. We could...
Allow reassignment only.
Not lock tracking at all.
Force an explicit unlock, e.g. layer.unlock().
Have no public API support for this. Only possible via _tracker shenanigans.
This is tricky, because this kind of mutation after fit/predict/evaluate could land you in hot water. Optimizer state will be invalid. Compiled functions too.
We are having a lot of discussion around lora/quantization and other PEFT strategies that require replacing layers (often dense layers), with parameter efficient replacements.
In torch, a
nn.Module
will track submodules by name. So you can runmodule.sub_module = new_sub_module
without issue. The old child will be booted out for the new. Same fortf.Module
.This is not the case with Keras layers. Tracking is currently "append only," not by attr name, and locked after
build()
. To work around this for a LoRA implementation, we currently do the following.We should consider whether we want to allow this for Keras layers with public APIs. Potentially by extending the tracker to track by attr name.
The text was updated successfully, but these errors were encountered: