Using the MLOps tool to run GeoWATCH

NOTE: THE PATH FORMAT IS IN HIGH FLUX. DOCS MAY BE OUTDATED

MLOps is a wrapper around the cmd_queue library that provides a single entrypoint for all steps of the GeoWATCH TA-2 pipeline, starting with a model checkpoint and the program data:

  1. Predict BAS on low-res data (S2/L8)

  2. Pixel evaluation metrics on BAS

  3. Convert to polygons in IARPA site model geojson format

  4. Score on IARPA metrics [using a heuristic for SC phases]

  5. Visualize BAS

  6. Create cropped SC dataset from BAS predictions with high-res data (S2/WV/PD)

  7. Predict SC on high-res data

  8. Pixel evaluation metrics on SC

  9. Convert to polygons in IARPA site model geojson format (again)

  10. Score on IARPA metrics (again)

  11. Visualize SC

You can invoke any one of these things through python -m geowatch... without MLOps, but it handles all the plumbing of feeding jobs’ input/output to each other and spawning them to use all your available compute resources.

using DVC and geowatch_dvc

(Note: geowatch_dvc will be replaced by sdvc from the simple_dvc package.)

geowatch_dvc is an alias for python -m geowatch.cli.find_dvc. MLOps uses it to manage paths to your DVC repos.

To get started: geowatch_dvc list

Do you see a data and expt repo with exists=True? If not,

a) you have an existing local DVC checkout, but it’s not listed

geowatch_dvc add --name=my_data_repo --path=path/to/smart_data_dvc --hardware=hdd --priority=100 --tags=phase2_data
geowatch_dvc add --name=my_expt_repo --path=path/to/smart_expt_dvc --hardware=hdd --priority=100 --tags=phase2_expt

b) you don’t have an existing local DVC checkout

clone the source repos: https://gitlab.kitware.com/smart/smart_data_dvc https://gitlab.kitware.com/smart/smart_expt_dvc

and put them in the recommended places.

these can also be overridden by environment variables; do so at your own risk

if this worked, you can:

cd $(geowatch_dvc --tags="phase2_expt")

cd doesn’t work when the terminal is too narrow for 1 line, because it’ll prettyprint with linebreaks

using mlops to checkout a model

geowatch mlops is an alias for python -m geowatch.cli.mlops_cli or python -m geowatch.mlops.expt_manager. The entrypoint in code is watch/mlops/expt_manager.py.

Let’s choose a model and do stuff with it!

geowatch mlops --help

MODEL="Drop4_BAS_Retrain_V002"

geowatch mlops "pull packages" --model_pattern="${MODEL}*"
geowatch mlops "pull evals" --model_pattern="${MODEL}*"
geowatch mlops "status"
python -m geowatch.mlops.expt_manager "list" --model_pattern="${MODEL_OF_INTEREST}*"  # for more info
python -m geowatch.mlops.expt_manager "report" --model_pattern="${MODEL_OF_INTEREST}*"  # for more info

The “volatile” table shows model predictions, which are not versioned for space reasons, but are an intermediate product of running pixel and polygon evaluations. The “versioned” table shows models and evaluations which should now exist in your local DVC repo.

the “versioned” table in more detail:

  • type - “pkg_fpath” (model) or “eval_*”

  • has_raw - the file exists

  • has_dvc - the .dvc file exists

  • needs_pull - the .dvc file is not backed by data on this machine

  • needs_push - the .dvc file is not backed by data on the remote

TODO does “staging” still exist?

models on disk live in $(geowatch_dvc --tags="phase2_expt")/models/fusion/${DATASET}/packages.

predictions on disk live in $(geowatch_dvc --tags="phase2_expt")/models/fusion/${DATASET}/pred.

evals on disk live in $(geowatch_dvc --tags="phase2_expt")/models/fusion/${DATASET}/eval.

using mlops to run an evaluation

geowatch mlops “evaluate” should work as an alias but doesn’t right now

if you haven’t used this TEST_DATASET yet, remember to unzip its splits.zip first

DATASET_CODE=Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC
DATA_DVC_DPATH=$(geowatch_dvc --tags="phase2_data")
DVC_EXPT_DPATH=$(geowatch_dvc --tags="phase2_expt")
TEST_DATASET=$DATA_DVC_DPATH/$DATASET_CODE/data.kwcoco.json
python -m geowatch.mlops.schedule_evaluation \
    --params="
        matrix:
            trk.pxl.model:
                - $DVC_EXPT_DPATH/models/fusion/Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC/packages/Drop4_BAS_Retrain_V002/Drop4_BAS_Retrain_V002_epoch=31-step=16384.pt.pt
            trk.pxl.data.test_dataset:
                - $TEST_DATASET
            trk.pxl.data.window_scale_space: 15GSD
            trk.pxl.data.time_sampling:
                - "contiguous"
            trk.pxl.data.input_scale_space:
                - "15GSD"
            trk.poly.thresh:
                - 0.07
                - 0.1
                - 0.12
                - 0.175
            crop.src:
                - /home/joncrall/remote/toothbrush/data/dvc-repos/smart_data_dvc/online_v1/kwcoco_for_sc_fielded.json
            crop.regions:
                - trk.poly.output
            act.pxl.data.test_dataset:
                - /home/joncrall/remote/toothbrush/data/dvc-repos/smart_expt_dvc/models/fusion/Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC/crop/online_v1_kwcoco_for_sc_fielded/trk_poly_id_0408400f/crop_f64d5b9a/crop_id_59ed6e1b/crop.kwcoco.json
            act.pxl.data.input_scale_space:
                - 3GSD
            act.pxl.data.time_steps:
                - 3
            act.pxl.data.chip_overlap:
                - 0.35
            act.poly.thresh:
                - 0.01
            act.poly.use_viterbi:
                - 0
            act.pxl.model:
                - $DVC_EXPT_DPATH/models/fusion/Aligned-Drop4-2022-08-08-TA1-S2-WV-PD-ACC/packages/Drop4_SC_RGB_scratch_V002/Drop4_SC_RGB_scratch_V002_epoch=99-step=50300-v1.pt.pt
        include:
            - act.pxl.data.chip_dims: 256,256
              act.pxl.data.window_scale_space: 3GSD
              act.pxl.data.input_scale_space: 3GSD
              act.pxl.data.output_scale_space: 3GSD
    " \
    --enable_pred_trk_pxl=1 \
    --enable_pred_trk_poly=1 \
    --enable_eval_trk_pxl=1 \
    --enable_eval_trk_poly=1 \
    --enable_crop=0 \
    --enable_pred_act_pxl=0 \
    --enable_pred_act_poly=0 \
    --enable_eval_act_pxl=0 \
    --enable_eval_act_poly=0 \
    --enable_viz_pred_trk_poly=1 \
    --enable_viz_pred_act_poly=0 \
    --enable_links=1 \
    --devices="0,1" --queue_size=2 \
    --queue_name='secondary-eval' \
    --backend=serial --skip_existing=0 \
    --run=0

the enable pred, eval, crop, and viz flags are the various stages of the GeoWATCH TA2 system. They are all enabled by default. Any params for a step with --enable_*=0 will be ignored, and duplicate params will be grid-searched over.

The params are namespaced for convenience

  • trk.pxl - BAS predict and pixel evaluate

  • trk.poly - BAS tracking and IARPA poly evaluate

  • crop - create SC dataset

  • act.pxl - SC predict and pixel evaluate

  • act.poly - SC tracking and IARPA poly evaluate

First, dry-run with run=0, and if it looks ok, set run=1. –backend=tmux is also highly recommended instead of serial, it makes the running queue much easier to debug (backend=”slurm” is also available if your machine supports it). --virtualenv_cmd="conda activate watch" or something to that effect is also needed if your bashrc does not automatically drop you into the watch virtualenv.

debugging/examining a running mlops invocation

This assumes you’re using –backend=tmux.

check on a running queue with

tmux a
<C-b> s # switch between sessions
<C-b> d # quit back to main window

You’ll see each session start with the command source /long/path/_cmd_queue_schedule/${QUEUE_NAME}_etc/etc/etc.sh. The paths will also be printed to the main window for convenience, along with the commands to kill any leftover tmux sessions or drop into them. These sourced scripts define what is being run in each tmux session (“job”). In serial mode, they will all run in the main window.

When each job finishes, the result will be cached and its dependencies will start running.

What does an mlops invocation run?

You can dig through the sourced scripts to piece together what a full GeoWATCH TA2 run looks like, minus the boilerplate and error handling. (There are plenty of examples in the mlops source code, but here’s how to tell exactly what you were running.) For example, the pipeline above turns into:

# prediction (omitted)
# python -m geowatch.tasks.fusion.predict ...

# tracking
ROOT="/home/local/KHQ/matthew.bernstein/data/dvc-repos/smart_expt_dvc-ssd/models/fusion/Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC/pred/trk/Drop4_BAS_Retrain_V002_epoch=31-step=16384.pt.pt/Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC_data.kwcoco"
PXL_EXPT="${ROOT}/trk_pxl_b788335d"
TRK_EXPT="trk_poly_ca4372e1"
python -m geowatch.cli.run_tracker \
        "${TRK_ROOT}/pred.kwcoco.json" \
        --default_track_fn saliency_heatmaps \
        --track_kwargs '{"thresh": 0.12, "moving_window_size": null, "polygon_fn": "heatmaps_to_polys"}' \
        --clear_annots \
        --out_sites_dir "${TRK_ROOT}/${TRK_EXPT}/sites" \
        --out_site_summaries_dir "${TRK_ROOT}/${TRK_EXPT}/site-summaries" \
        --out_sites_fpath "${TRK_ROOT}/${TRK_EXPT}/site_tracks_manifest.json" \
        --out_site_summaries_fpath "${TRK_ROOT}/${TRK_EXPT}/site_summary_tracks_manifest.json" \
        --out_kwcoco "${TRK_ROOT}/${TRK_EXPT}/tracks.kwcoco.json"

# pxl eval
python -m geowatch.tasks.fusion.evaluate \
        --true_dataset=/home/local/KHQ/matthew.bernstein/data/dvc-repos/smart_data_dvc-ssd/Aligned-Drop4-2022-08-08-TA1-S2-L8-ACC/data.kwcoco.json \
        --pred_dataset="${TRK_ROOT}/pred.kwcoco.json" \
        --eval_dpath="${TRK_ROOT}"
        --score_space=video \
        --draw_curves=True \
        --draw_heatmaps=True \
        --viz_thresh=0.2 \
        --workers=2

# iarpa poly eval
python -m geowatch.cli.run_metrics_framework \
        --merge=True \
        --name "Drop4_BAS_Retrain_V002_epoch=31-step=16384.pt.pt-trk_pxl_b788335d-${TRK_EXPT}" \
        --true_site_dpath "/home/local/KHQ/matthew.bernstein/data/dvc-repos/smart_data_dvc-ssd/annotations/site_models" \
        --true_region_dpath "/home/local/KHQ/matthew.bernstein/data/dvc-repos/smart_data_dvc-ssd/annotations/region_models" \
        --pred_sites "${TRK_ROOT}/${TRK_EXPT}/site_tracks_manifest.json" \
        --tmp_dir "${TRK_ROOT}/${TRK_EXPT}/_tmp" \
        --out_dir "${TRK_ROOT}/${TRK_EXPT}" \
        --merge_fpath "${TRK_ROOT}/${TRK_EXPT}/merged/summary2.json" \

# viz
geowatch visualize \
        "${TRK_ROOT}/${TRK_EXPT}/tracks.kwcoco.json" \
        --channels="red|green|blue,salient" \
        --stack=only \
        --workers=avail/2 \
        --workers=avail/2 \
        --extra_header="\ntrk_pxl_b788335d-${TRK_EXPT}" \
        --animate=True