Cannot resume in offline mode due to lack of `sys/id` field #588

wjaskowski · 2021-05-28T11:51:33Z

import neptune.new as neptune
run = neptune.init(mode='offline')
run.sync()
run.wait()
rid = run['sys/id'].fetch()
run = neptune.init(mode='offline', run=rid)
rid = run['sys/id'].fetch()

ends up with:

offline/1b7c5e70-695d-4d1c-8587-a5ca2e3d222c
Traceback (most recent call last):
  File "err4.py", line 5, in <module>
    run.sync()
  File "/home/wojciech/miniconda3/envs/nori/lib/python3.8/site-packages/neptune/new/run.py", line 453, in sync
    attributes = self._backend.get_attributes(self._uuid)
  File "/home/wojciech/miniconda3/envs/nori/lib/python3.8/site-packages/neptune/new/internal/backends/offline_neptune_backend.py", line 42, in get_attributes
    raise NeptuneOfflineModeFetchException
neptune.new.exceptions.NeptuneOfflineModeFetchException: 

----NeptuneOfflineModeFetchException---------------------------------------------------

It seems you are trying to fetch data from the server, while working in an offline mode.
You need to work in non-offline connection mode to fetch data from the server.

The thing is that I don't try to fetch data from the server but from the run, whenever it stores its data.

The text was updated successfully, but these errors were encountered:

Herudaio · 2021-06-07T12:52:40Z

(I've removed my previous comment)

@wjaskowski initially we didn't plan to enable resuming runs in the offline mode. If I may ask why do you need to resume an offline run? Are you working with a multiprocessing / multi-script setup or is there a time break between the execution of the script and it's resume?

wjaskowski · 2021-06-07T12:57:18Z

The truth is that I just wanted to use resuming in debug mode which initially did not work for me so I tried offline mode, which also failed.

…

On Mon, 7 Jun 2021 at 14:52, Marcin Mycek ***@***.***> wrote: (I've removed my previous comment) @wjaskowski <https://github.com/wjaskowski> initially we didn't plan to enable resuming runs in the offline mode. If I may ask why do you need to resume an offline run? Are you working with a multiprocessing / multi-script setup or is there a time break between the execution of the script and it's resume? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#588 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFZEHOGCF3OQW6347DHD3TTRS6KVANCNFSM45WOFQ3A> .

Diagrama3 · 2022-10-04T06:59:45Z

Switching from spreadssheets to
Neptune.ai and How it Pushed...

Diagrama3 · 2022-10-04T07:02:34Z

Switching from spreadssheets to Neptune.ai and How it Pushed...

Blaizzy · 2022-10-04T09:04:45Z

Hi @Diagrama3

How can I help you?

ljstrnadiii · 2022-12-12T16:31:41Z

@Blaizzy I would also like to be able to resume an init_project in debug mode for testing purposes. Can this be achieved?

Blaizzy · 2022-12-12T20:43:45Z

Hi @ljstrnadiii,

Thanks for reaching out.

Yes, it is.

Example:

import neptune.new as neptune
project = neptune.init_project(mode="debug")

Docs: https://docs.neptune.ai/api/neptune/#init_project

ljstrnadiii · 2022-12-12T21:26:45Z

@Blaizzy , I tried to stop and init_project again in a separate process, but the key was not present.

Blaizzy · 2022-12-13T11:26:01Z

@ljstrnadiii by key you mean api_token, right?

If so, you can read more about setting your api_token here:
https://docs.neptune.ai/setup/setting_api_token/

Blaizzy · 2022-12-17T09:01:34Z

Hey there!
Just checking in to see if you still need help with this or if you need help with anything else. Feel free to drop me a message. 😊

ljstrnadiii · 2022-12-18T15:35:58Z

@Blaizzy thanks for checking in. What I want to do is use debug mode in two separate processes:

# in one process
import neptune.new as neptune
project = neptune.init_project(mode="debug")
project['key1'] = 1
project.stop()

# then in another process (a test script)
import neptune.new as neptune
project = neptune.init_project(mode="debug")
assert project['key1'] == 1
project.stop()

but this is not possible from what I understand (even though it seems some files get written to tmp somewhere).

Blaizzy · 2022-12-19T12:54:58Z

In debug mode, no data is stored or sent anywhere.
Docs: https://docs.neptune.ai/api/connection_modes/

For the use case you want to test, currently, you have to log metadata to Neptune servers in async or sync mode.

But I can definitely see your point and I'll submit your comment as a feature request to the product team.

Blaizzy · 2022-12-21T14:32:35Z

Hey @ljstrnadiii!

Just checking in to see if you still need help with this or if you need help with anything else. Feel free to drop me a message. 😊

ljstrnadiii · 2022-12-22T13:53:18Z

@Blaizzy that is what I thought. We test in debug mode and use a neptune run in debug mode as a fixture where we can and that works well, but for some e2e tests, we can only pass a reference to a neptune run or project location. We have created a tests project in neptune for our e2e tests to keep things isolated a bit.

Thanks for the clarification!

Blaizzy · 2022-12-27T13:41:47Z

It's my pleasure :)

You are most welcome @ljstrnadiii!

Your solution is quite interesting, and I would love to learn more about it if you don't mind. I think it could provide us with valuable insight that we can incorporate into the product.

Let me know what you think

bg4xsd · 2023-01-22T07:19:37Z

The function of resuming offline runs is very useful. Many guys are using commercial GPU servers to train their models, the GPU server often has the longest running time limit for a single run, for example, Kaggle's time limit is 12 hours, so we have to divide the training work into several parts. While using the offline model, the training speed will be faster and the offline mode is preferred. When the work is done, the offline training data will be uploaded to the Neptune server.

For my code
run = neptune.init_run( mode="offline", custom_run_id='test-offline', .... }

Neptune will generate several offline outputs to .neptune directory. I use the command:
neptune sync --path .neptune --project aaa/bbb --offline-only

It is executed ok, but only the last run is displayed on the website. It seems the last run overwrites the prior one.

Blaizzy · 2023-01-23T09:43:41Z

Hi @bg4xsd

Thanks for reaching out and sharing your use case!

I have also passed it as feedback to the product team.

Regarding your code, I notice that you are using the custom_run_id argument in offline mode. Currently, offline runs have no sys/id; consequently, custom_run_id doesn't work.

Each time you run that script and then use the neptune sync CLI command, it will create a separate run.

But I can see your point; thanks to your feedback and others, we can now start thinking of a potential solution to this use case.

bg4xsd · 2023-01-23T10:16:21Z

Hi @Blaizzy ,
Thanks for your quick response.
For the students in University, in the lab, the GPU server always lacks, because training a neural network is time-consuming work, and the training process often is terminated by other students, so I think the function of resume offline run must be useful and popular, :-).
Further, you know that tensorboard's graph and table are ugly and low resolution, they can not be used in the thesis directly. Neptune's beautiful diagrams are welcome and its export function is very easy to use.
Many years before, I have to draw, compare and adjust the graph manually, and now, I am going to move from tensorboard to Neptune this year.
Come on and have a nice day.

Blaizzy · 2023-01-23T12:53:58Z

Most welcome and thank you for your kind words!

I'm happy you enjoy using Neptune as much as we love making for you :)

Blaizzy · 2023-01-23T12:54:30Z

I will let you know here once the feature is released.

Other than that, is there anything else I could help you with?

bg4xsd · 2023-01-23T12:57:33Z

Hi @Blaizzy

Hope to hear from you soon. By now, no more questions.

Anyway, thank you again.

Blaizzy · 2023-01-23T13:13:09Z

Perfect, have a great week! :)

wouterzwerink · 2023-06-02T08:13:14Z

Hi @Blaizzy ! Is this feature still on the radar? We train on cloud instances that somewhat frequently get interrupted. This prevents us from using offline mode, as we can not resume the same run in offline mode.

Blaizzy · 2023-06-02T13:37:58Z

Hi @wouterzwerink

This feature is on the radar. However, at the moment, we don't have an ETA for it.

Could you share the tracebacks for the times your training gets interrupted?

Blaizzy · 2023-06-05T13:00:39Z

Hi @wouterzwerink ,

Do you still need help with this?

bg4xsd · 2023-06-06T01:56:40Z

The offline resume is useful for offline logging. Using online mode will decrease the long-time training speed. For using cloud GPU services, such as Kaggle, and Google's colab, the training procedure will be interrupted every 10~12 hours, so the offline resume function is meaningful.

Blaizzy · 2023-06-06T15:08:17Z

@bg4xsd

I understand.

Could you share the tracebacks for the times your training gets interrupted?

wouterzwerink · 2023-07-10T20:05:26Z

@Blaizzy I seem to have missed your question, sorry!
The training interruptions are not due to neptune at all!
The interruptions are from using spot instances. We train with fault tolerance, so the training continues after the interruption. However, to keep neptune fault tolerant, we have to use async mode instead of offline mode.
So I don't need help with this, but thanks for asking! Looking forward to this feature once it is complete

Blaizzy · 2023-07-11T09:13:26Z

@wouterzwerink great to hear!

If anything pops up feel free to let me know. I'll be happy to help :)

pprobst · 2024-02-15T17:57:02Z

I am interested in this feature. It'd be very useful for multi-script programs.

wouterzwerink · 2024-02-15T23:25:51Z

Since its been a while, I'll add that I'm still very interested in this feature

kamil-kaczmarek assigned Herudaio Oct 20, 2021

Herudaio added the feature request label Feb 7, 2022

SiddhantSadangi added the api label Oct 10, 2023

SiddhantSadangi assigned parthpankajtiwary and unassigned Herudaio Feb 15, 2024

SiddhantSadangi assigned AurimasGr and unassigned parthpankajtiwary Aug 12, 2024

SiddhantSadangi assigned dzwiedziu and unassigned AurimasGr Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot resume in offline mode due to lack of `sys/id` field #588

Cannot resume in offline mode due to lack of `sys/id` field #588

wjaskowski commented May 28, 2021

Herudaio commented Jun 7, 2021

wjaskowski commented Jun 7, 2021 via email

Diagrama3 commented Oct 4, 2022 •

edited

Loading

Diagrama3 commented Oct 4, 2022

Blaizzy commented Oct 4, 2022

ljstrnadiii commented Dec 12, 2022

Blaizzy commented Dec 12, 2022 •

edited

Loading

ljstrnadiii commented Dec 12, 2022

Blaizzy commented Dec 13, 2022

Blaizzy commented Dec 17, 2022

ljstrnadiii commented Dec 18, 2022

Blaizzy commented Dec 19, 2022

Blaizzy commented Dec 21, 2022

ljstrnadiii commented Dec 22, 2022

Blaizzy commented Dec 27, 2022

bg4xsd commented Jan 22, 2023 •

edited

Loading

Blaizzy commented Jan 23, 2023 •

edited

Loading

bg4xsd commented Jan 23, 2023

Blaizzy commented Jan 23, 2023

Blaizzy commented Jan 23, 2023

bg4xsd commented Jan 23, 2023

Blaizzy commented Jan 23, 2023

wouterzwerink commented Jun 2, 2023

Blaizzy commented Jun 2, 2023

Blaizzy commented Jun 5, 2023

bg4xsd commented Jun 6, 2023

Blaizzy commented Jun 6, 2023

wouterzwerink commented Jul 10, 2023

Blaizzy commented Jul 11, 2023

pprobst commented Feb 15, 2024 •

edited

Loading

wouterzwerink commented Feb 15, 2024

Cannot resume in offline mode due to lack of sys/id field #588

Cannot resume in offline mode due to lack of sys/id field #588

Comments

wjaskowski commented May 28, 2021

Herudaio commented Jun 7, 2021

wjaskowski commented Jun 7, 2021 via email

Diagrama3 commented Oct 4, 2022 • edited Loading

Diagrama3 commented Oct 4, 2022

Blaizzy commented Oct 4, 2022

ljstrnadiii commented Dec 12, 2022

Blaizzy commented Dec 12, 2022 • edited Loading

ljstrnadiii commented Dec 12, 2022

Blaizzy commented Dec 13, 2022

Blaizzy commented Dec 17, 2022

ljstrnadiii commented Dec 18, 2022

Blaizzy commented Dec 19, 2022

Blaizzy commented Dec 21, 2022

ljstrnadiii commented Dec 22, 2022

Blaizzy commented Dec 27, 2022

bg4xsd commented Jan 22, 2023 • edited Loading

Blaizzy commented Jan 23, 2023 • edited Loading

bg4xsd commented Jan 23, 2023

Blaizzy commented Jan 23, 2023

Blaizzy commented Jan 23, 2023

bg4xsd commented Jan 23, 2023

Blaizzy commented Jan 23, 2023

wouterzwerink commented Jun 2, 2023

Blaizzy commented Jun 2, 2023

Blaizzy commented Jun 5, 2023

bg4xsd commented Jun 6, 2023

Blaizzy commented Jun 6, 2023

wouterzwerink commented Jul 10, 2023

Blaizzy commented Jul 11, 2023

pprobst commented Feb 15, 2024 • edited Loading

wouterzwerink commented Feb 15, 2024

Cannot resume in offline mode due to lack of `sys/id` field #588

Cannot resume in offline mode due to lack of `sys/id` field #588

Diagrama3 commented Oct 4, 2022 •

edited

Loading

Blaizzy commented Dec 12, 2022 •

edited

Loading

bg4xsd commented Jan 22, 2023 •

edited

Loading

Blaizzy commented Jan 23, 2023 •

edited

Loading

pprobst commented Feb 15, 2024 •

edited

Loading