geowatch.utils.process_context module¶
Defines the ProcessContext object, which is what mlops expects jobs to
be wrapped in.
Todo
[ ] Make “most” telemetry opt-in
- class geowatch.utils.process_context.ProcessContext(name=None, type='process', args=None, config=None, extra=None, track_emissions=False, request_all_telemetry=True, request_most_telemetry=True)[source]¶
Bases:
objectContext manager to track the context under which a result was computed.
This tracks things like start / end time. The command line that can reproduce the process (assuming an appropriate environment. The configuration the process was run with. The machine details the process was run on. The power usage / carbon emissions the process used, and other information.
- Parameters:
args (str | List[str]) – This should be the sys.argv or the command line string that can be used to rerun the process
config (Dict) – This should be a configuration dictionary (likely based on sys.argv)
name (str) – the name of this process
type (str) – The type of this process (usually keep the default of process)
request_all_telemetry (bool) – if False, telemetry is disabled. This is forced to False if PROCESS_CONTEXT_DISABLE_MOST_TELEMETRY is in the environment.
request_most_telemetry (bool) – if False, telemetry is disabled. This is forced to False if PROCESS_CONTEXT_DISABLE_ALL_TELEMETRY is in the environment.
Note
This module provides telemetry, which records user-identifiable information. While useful, it does raise ethical concerns about user privacy, and the people running this code have a right to know about it and opt out. In the future we will change our policy to opt-in, but for system stability, we are not changing defaults.
Note
There are two levels of telemetry.
Enviornment telemetry. These are things like the machine the code was run on. Use PROCESS_CONTEXT_DISABLE_MOST_TELEMETRY=0 to opt-out.
The start / stop / sys.argv / config objects are necessary for mlops to do anything. But these can leak information by containing system paths. Emissions is also in this category. Use PROCESS_CONTEXT_DISABLE_ALL_TELEMETRY to opt out.
CommandLine
xdoctest -m geowatch.utils.process_context ProcessContext
Example
>>> from geowatch.utils.process_context import * >>> import torch >>> import rich >>> device = torch.device(0) if torch.cuda.is_available() else torch.device('cpu') >>> # Adding things like disk info an tracking emission usage >>> self = ProcessContext(track_emissions='offline') >>> obj1 = self.start().stop() >>> self.add_disk_info('.') >>> self.add_device_info(device) >>> # >>> # Telemetry can be mostly disabled >>> self = ProcessContext(track_emissions='offline', request_most_telemetry=False) >>> obj2 = self.start().stop() >>> self.add_disk_info('.') >>> self.add_device_info(device) >>> # Telemetry can be completely disabled >>> self = ProcessContext(track_emissions='offline', request_all_telemetry=False) >>> obj3 = self.start().stop() >>> self.add_disk_info('.') >>> self.add_device_info(device) >>> rich.print('full_telemetry = {}'.format(ub.urepr(obj1, nl=3))) >>> rich.print('some_telemetry = {}'.format(ub.urepr(obj2, nl=3))) >>> rich.print('no_telemetry = {}'.format(ub.urepr(obj3, nl=3)))
Example
>>> from geowatch.utils.process_context import * >>> # flush can measure intermediate progress >>> self = ProcessContext(track_emissions='offline') >>> self.add_disk_info('.') >>> obj1 = self.start().flush() >>> obj1_orig = obj1.copy() >>> obj2 = self.stop()
- write_invocation(invocation_fpath)[source]¶
Write a helper file that contains a locally reproducable invocation of this process.