geowatch.mlops.manager module¶
This is the CLI for expt_state
Synchronize DVC states across the machine.
This is a new Phase2 Variant of this script.
Example
export DVC_DATA_DPATH=$(geowatch_dvc –tags=”phase3_data”) export DVC_EXPT_DPATH=$(geowatch_dvc –tags=”phase3_expt”) cd $DVC_EXPT_DPATH
python -m geowatch.mlops.manager “status” –dataset_codes “Aligned-Drop4-2022-08-08-TA1-S2-WV-PD-ACC”
python -m geowatch.mlops.manager “status” –dataset_codes “Drop6” python -m geowatch.mlops.manager “push packages” –dataset_codes “Drop6” python -m geowatch.mlops.manager “push packages” –dataset_codes “Drop6-MeanYear10GSD”
python -m geowatch.mlops.manager “pull packages” –dataset_codes “Aligned-Drop4-2022-08-08-TA1-S2-WV-PD-ACC”
python -m geowatch.mlops.manager “push packages” python -m geowatch.mlops.manager “status packages”
python -m geowatch.mlops.manager “status” –dataset_codes Drop4-SC
python -m geowatch.mlops.manager “list” –dataset_codes Drop4-BAS python -m geowatch.mlops.manager “list” –dataset_codes Aligned-Drop4-2022-08-08-TA1-S2-WV-PD-ACC python -m geowatch.mlops.manager “list” –dataset_codes Drop6 Drop4-BAS python -m geowatch.mlops.manager “list” –dataset_codes Drop6-MeanYear10GSD python -m geowatch.mlops.manager “list” –dataset_codes Drop6 Drop6-MeanYear10GSD-V2 python -m geowatch.mlops.manager “list” –dataset_codes Drop6 Drop6-MedianSummer10GSD
python -m geowatch.mlops.manager “push packages” –dataset_codes Drop6-MeanYear10GSD –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop6-MeanYear10GSD-V2 –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop6-MedianSummer10GSD –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop6-NoWinterMedian10GSD –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop7-MedianNoWinter10GSD –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop7-MedianNoWinter10GSD-NoMask –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop7-Cropped2GSD –yes python -m geowatch.mlops.manager “push packages” –dataset_codes Drop7-Cropped2GSD-V2 –yes
HACK_SAVE_ANYWAY=1 python -m geowatch.mlops.manager “push packages” –dataset_codes Drop7-Cropped2GSD –yes
python -m geowatch.mlops.manager “list packages” –dataset_codes Drop7-Cropped2GSD
python -m geowatch.mlops.manager “status” –dataset_codes Drop6-MeanYear10GSD-V2 python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop6-MeanYear10GSD –yes python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop6-MeanYear10GSD-V2 –yes python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop6-MedianSummer10GSD –yes python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop6-NoWinterMedian10GSD –yes python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop7-MedianNoWinter10GSD –yes python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop7-MedianNoWinter10GSD-NoMask –yes
python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop6-MeanYear10GSD-V2 –yes
python -m geowatch.mlops.manager “list packages” –dataset_codes Drop7-MedianNoWinter10GSD python -m geowatch.mlops.manager “list packages” –dataset_codes Drop7-MedianNoWinter10GSD-NoMask
# On training machine python -m geowatch.mlops.manager “push packages” –dataset_codes Drop6 python -m geowatch.mlops.manager “push packages” –dataset_codes “Aligned-Drop4-2022-08-08-TA1-S2-WV-PD-ACC”
# On testing machine python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop6 python -m geowatch.mlops.manager “pull packages” –dataset_codes Drop7-MedianNoWinter10GSD-NoMask Drop7-MedianNoWinter10GSD –yes python -m geowatch.mlops.manager “list packages” –dataset_codes Drop7-MedianNoWinter10GSD-NoMask Drop7-MedianNoWinter10GSD Drop7-MedianNoWinter10GSD-iMERIT –yes python -m geowatch.mlops.manager “status”
# Run evals on testing machine python -m geowatch.mlops.manager “evaluate” –dataset_codes “Aligned-Drop4-2022-08-08-TA1-S2-WV-PD-ACC”
# On testing machine python -m geowatch.mlops.manager “push evals”
# On analysis machine python -m geowatch.mlops.manager “pull evals”
Todo
### Make the Experiment Evaluation Reporter more robust and generalize to ### more problems.
It should quickly show the best models for various metric and it should be easy for the user to inspect them further. For example say the best model of interest was:
MODEL_OF_INTEREST=”Drop4_BAS_Retrain_V002_epoch=45-step=23552” MODEL_OF_INTEREST=”Drop4_BAS_Continue_15GSD_BGR_V004_epoch=78-step=323584”
# TODO: # There is a problem with multiple .pt suffixes, just dont use any
# You should be able to pull things wrt to that model
python -m geowatch.mlops.manager “pull packages” –model_pattern=”${MODEL_OF_INTEREST}*” python -m geowatch.mlops.manager “pull evals” –model_pattern=”${MODEL_OF_INTEREST}*” python -m geowatch.mlops.manager “status” –model_pattern=”${MODEL_OF_INTEREST}*”
python -m geowatch.mlops.manager “status” –dataset_codes=Drop6 python -m geowatch.mlops.manager “add packages” –dataset_codes=Drop6
- class geowatch.mlops.manager.ManagerConfig(*args, **kwargs)[source]¶
Bases:
DataConfig
Manage trained models in the GeoWATCH experiment DVC repo.
Certain parts of these names have special nomenclature to make them easier to work with in Python and Bash.
Valid options: []
- Parameters:
*args – positional arguments for this data config
**kwargs – keyword arguments for this data config
- default = {'command': <Value(None)>, 'dataset_codes': <Value('*')>, 'dvc_remote': <Value('aws')>, 'expt_dvc_dpath': <Value('auto')>, 'expt_pattern': <Value('*')>, 'model_pattern': <Value('*')>, 'yes': <Value(False)>}¶
- main(**kwargs)¶
- class geowatch.mlops.manager.DVCExptManager(expt_dvc_dpath, dvc_remote='aws', dataset_codes='*', model_pattern='*', expt_pattern='*')[source]¶
Bases:
NiceRepr
Implements an API around our DVC structure, which can be described as follows.
Todo
[ ] If we can somehow generate the output paths based on the
pipeline, then we will be in a very good position.
Notes
- <expt_dvc_dpath>
- A breakdown of the packages dir is:
packages/<expt_name>/<model_name.pt>
Example
>>> # xdoctest: +REQUIRES(env:DVC_EXPT_DPATH) >>> from geowatch.mlops.manager import * # NOQA >>> import geowatch >>> expt_dvc_dpath = geowatch.find_dvc_dpath(tags='phase2_expt') >>> dataset_codes = ['Drop4-BAS'] >>> manager = DVCExptManager(expt_dvc_dpath=expt_dvc_dpath, dataset_codes=dataset_codes) >>> manager.list() >>> manager.summarize()
self = manager.stats[0] self.list() util_pandas.pandas_truncate_items(self.staging_table(), paths=0, max_length=32)[0]
- add_packages(yes=None)¶
TODO: break this up into smaller components.
- class geowatch.mlops.manager.ExperimentState(expt_dvc_dpath, dataset_code='*', dvc_remote=None, data_dvc_dpath=None, model_pattern='*', expt_pattern='*', storage_dpath=None)[source]¶
Bases:
NiceRepr
- VERSIONED_COLUMNS = ['type', 'has_dvc', 'has_raw', 'needs_pull', 'is_link', 'is_broken', 'unprotected', 'needs_push', 'raw', 'dvc', 'dataset_code']¶
- STAGING_COLUMNS = ['ckpt_exists', 'is_packaged', 'is_copied', 'needs_package', 'needs_copy']¶
- staging_rows()[source]¶
A staging item are items that are the result of non-deterministic processes like training. These are not versioned or recomputable. These are things in the training directory that need to be repackaged or copied into the versioned folder.
- versioned_rows(with_attrs=1, types=None, notypes=None)[source]¶
Versioned items are things that are tracked with DVC. These are packages and evaluation measures.
- versioned_table(**kw)[source]¶
Get a list of dictionaries with information for each known evaluation.
Information includes its real path if it exists, its dvc path if it exists and what sort of actions need to be done to synchronize it.
- gather_packages(yes=None)[source]¶
This does what repackage used to do. Repackages checkpoints as torch packages, copies them to the DVC repo, and then adds them to DVC.
- add_packages(yes=None)¶
This does what repackage used to do. Repackages checkpoints as torch packages, copies them to the DVC repo, and then adds them to DVC.
- push_packages(yes=None)[source]¶
This does what repackage used to do. Repackages checkpoints as torch packages, copies them to the DVC repo, and then adds them to DVC.
>>> # xdoctest: +REQUIRES(env:DVC_EXPT_DPATH) >>> from geowatch.mlops.manager import * # NOQA >>> import geowatch >>> expt_dvc_dpath = geowatch.find_dvc_dpath(tags='phase2_expt') >>> data_dvc_dpath = geowatch.find_dvc_dpath(tags='phase2_data') >>> dataset_code = 'Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC' >>> self = ExperimentState(expt_dvc_dpath, dataset_code, data_dvc_dpath) >>> self.summarize()
- geowatch.mlops.manager.checkpoint_filepath_info(fname)[source]¶
Finds information encoded in the checkpoint/model file path.
Todo
We need to ensure this info is encoded inside the file header as well!
CommandLine
xdoctest -m geowatch.mlops.manager checkpoint_filepath_info
Example
>>> from geowatch.mlops.manager import * # NOQA >>> fnames = [ >>> 'epoch1_step10.foo', >>> 'epoch=2-step=10.foo', >>> 'epoch=3-step=10-v2.foo', >>> 'epoch=4-step=10', >>> 'epoch=5-step=10-v2', >>> 'junkepoch=6-step=10.foo', >>> 'junk/epoch=7-step=10-v2.foo', >>> 'junk-epoch=8-step=10', >>> 'junk_epoch=9-step=10-v2', >>> 'epoch10_val_loss.925.ckpt.ckpt', >>> 'epoch11_val_loss1.925.ckpt', >>> 'epoch=12_val_loss=1.925.ckpt', >>> 'epoch=25-val_loss=1.995.ckpt', >>> ] >>> for fname in fnames: >>> info = checkpoint_filepath_info(fname) >>> print(f'info={info}') info={'epoch': 1, 'step': 10, 'ckpt_ver': 'v0'} info={'epoch': 2, 'step': 10, 'ckpt_ver': 'v0'} info={'epoch': 3, 'step': 10, 'ckpt_ver': 'v2'} info={'epoch': 4, 'step': 10, 'ckpt_ver': 'v0'} info={'epoch': 5, 'step': 10, 'ckpt_ver': 'v2'} info={'epoch': 6, 'step': 10, 'ckpt_ver': 'v0'} info={'epoch': 7, 'step': 10, 'ckpt_ver': 'v2'} info={'epoch': 8, 'step': 10, 'ckpt_ver': 'v0'} info={'epoch': 9, 'step': 10, 'ckpt_ver': 'v2'} info={'epoch': 10, 'val_loss': 0.925, 'ckpt_ver': 'v0', 'step': None} info={'epoch': 11, 'val_loss': 1.925, 'ckpt_ver': 'v0', 'step': None} info={'epoch': 12, 'val_loss': 1.925, 'ckpt_ver': 'v0', 'step': None} info={'epoch': 25, 'val_loss': 1.995, 'ckpt_ver': 'v0', 'step': None}