-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-gpu Parameter Server Strategy new Optimizer #18624
Comments
Parameter server training is not supported in Keras 3, generally speaking. The feature had very low usage. If you need it, I recommend you stick to tf.keras and use the legacy optimizers. |
For context on usage I've seen it at couple companies for large recommender system models where embedding tables may be several GB and largest I've worked with had hundreds of gigabyte for 1 embedding table that relied on variable partitioner to split variable across PS servers. In that scenario I think most other strategies that mirror variables are difficult to use. I mostly haven't seen PS outside of very large embedding tables. If PS is deprecated and unsupported reasonable to close ticket, although would be helpful to document somewhere (unless I missed it). |
@hmc-cs-mdrissi , Since |
My main question is tf.keras.optimizers in tf 2.11 refers to new/experimental optimizers. Even if I use tf.keras explicitly those optimizers when used with parameter server strategy encounter this bug. Will improvements to make tf.keras.optimizers support PS be accepted (conditional on tf backend when needed)? This is one ps bug but I’ve also found a couple more PS bugs when using new optimizers. As for tf.keras.optimizers.legacy, the legacy optimizers my understanding was they don’t get additional features like weight decay. The other issue is variable saving behavior with checkpoints was improved in a key way that fixes a bug specific to them for my usage. The new optimizers are autotrackable while legacy optimizers are not leading to different checkpoint behavior. As both legacy and experimental optimizer have different bugs. If I encounter bugs using optimizer often related to PS strategy (legacy/new) and that are tf specific which ones should be reported and where? edit: If tf.keras.optimizers.legacy still accepts bug fixes that can work. I'm mainly unsure mostly as legacy bug is also one significant difference vs tf.keras.optimizers.experimental optimizer (slot variable checkpointing). edit 2: Other aspect is this bug is "nicer" bug in that it crashes with relevant error message. There's one tf.keras.optimizers.experimental.Optimizer bug with PS that silently produces incorrect gradient update steps that can lead to model quality worsening and was very hard to notice. At moment that can run and appear to work, but surprise user. And if user doesn't explicitly pick class type and does, model.compile(optimizer="adam") it will automatically pick keras.optimizers.Adam/tf.keras.optimizers.experimental.Adam regardless of strategy being used. |
Summary
The recent optimizer api, tf 2.11 default/tf.optimizers.experimental.Optimizer does not support multi-gpu per worker parameter server training. The old optimizer api, tf.optimizers.legacy.Optimizer does support this.
The issue is lack of
aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA
for iterations variable + learning rate variable. This leads to a crash on this line. The former had aggregation method explicitly specified in tf.optimizers.legacy.Optimizer while latter was not represented as a variable with legacy Optimizer.I'm unsure how to fix this with Variable changes for supporting multiple backends. Before that the fix was small add aggregation to lines that make tf.Variable. Now options I see are either,
The text was updated successfully, but these errors were encountered: