-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model weights are not de-duplicated on save #18419
Comments
This seems like a bug, and is at very least confusing. Unsure what the fix should be, particularly with compat concerns in mind. |
The level at which sharing is supported is at the layer instance level. You may reuse layer instances multiple times. However you are not supposed to share a variable instance across multiple independent layer instances. Each variable must be owned by exactly one layer. What happens if you try to share a layer instance instead? |
Doing this at the layer level does look like it works in keras-core, thought no fully in tf.keras. Sounds like overall this is "works as intended." I do worry that there are not guardrails for this though. It's confusing that shared weights "just work" for training, with proper deduplication, and only start misbehaving on save. Can we just warn if we see duplicate weights on a save call? Is there ever a valid reason for it? |
How would we detect duplicate weights? |
I have some comments for this issue. There is some cases where the layers can have only one shared weight but each layer has its own other weights. An example of that is TransformerXL model where each decoder layer has its own weights but the relative position bias can be shared between them. |
Hi @mattdangerw Considering that scenarios , have you encouontered any impact in model performance by doing |
might be related case |
This is a priori now not possible by design in keras 3 after the layer is built. See for instance this issue: keras-team/keras#18419 (comment) where it is advised to embed the layer whose weights we want to share. In our usecase (reproduce a given model by splitting the activations into separate layers but keeping the weights to get synchronized with original model), this is not a solution. We implement this workaround, even though using a private method for that. A feature request has been done on keras 3 repo: keras-team/keras#18821
This is a priori now not possible by design in keras 3 after the layer is built. See for instance this issue: keras-team/keras#18419 (comment) where it is advised to embed the layer whose weights we want to share. In our usecase (reproduce a given model by splitting the activations into separate layers but keeping the weights to get synchronized with original model), this is not a solution. We implement this workaround, even though using a private method for that. A feature request has been done on keras 3 repo: keras-team/keras#18821
This is a priori now not possible by design in keras 3 after the layer is built. See for instance this issue: keras-team/keras#18419 (comment) where it is advised to embed the layer whose weights we want to share. In our usecase (reproduce a given model by splitting the activations into separate layers but keeping the weights to get synchronized with original model), this is not a solution. We implement this workaround, even though using a private method for that. A feature request has been done on keras 3 repo: keras-team/keras#18821
This is a priori now not possible by design in keras 3 after the layer is built. See for instance this issue: keras-team/keras#18419 (comment) where it is advised to embed the layer whose weights we want to share. In our usecase (reproduce a given model by splitting the activations into separate layers but keeping the weights to get synchronized with original model), this is not a solution. We implement this workaround, even though using a private method for that. A feature request has been done on keras 3 repo: keras-team/keras#18821
This is a priori now not possible by design in keras 3 after the layer is built. See for instance this issue: keras-team/keras#18419 (comment) where it is advised to embed the layer whose weights we want to share. In our usecase (reproduce a given model by splitting the activations into separate layers but keeping the weights to get synchronized with original model), this is not a solution. We implement this workaround, even though using a private method for that. A feature request has been done on keras 3 repo: keras-team/keras#18821
When a model shares a single variable across many layers,
model.weights
is deduped. Only one weight will be returned no matter how many times it is referenced.However when a model is saved, every layer that has a ref to variable will save their own copy. Weights are not deduped on save.
This can lead to a funny flow where things appear to be working, but the saved weights are actually duplicated on disk, and a single variable is assigned multiple times during load. This colab shows the issue ->
https://colab.research.google.com/gist/mattdangerw/2fabb09d53bd19cb6407d219e34155ab/weight-sharing-weirdness.ipynb
The text was updated successfully, but these errors were encountered: