SMART Activity Characterization Tutorial

This document is a tutorial on Activity Characterization (AC) in the context of the SMART project. It goes over:

  1. Access AC data via DVC.

  2. Modifying the data / predicting features on the data

  3. Train a model geowatch.tasks.fusion

  4. Evaluate a model with geowatch.mlops

1. Access AC Data

Due to an issue that I don’t fully understand, the pre-clustered and pre-cropped Drop7 AC training data is in a different DVC repo.

Navigate to where you would like to store it and grab the DVC repo.

git clone git@gitlab.kitware.com:smart/smart_drop7.git

To ensure commands in this tutorial are runnable, be sure to register this new repo with geowatch using the “drop7_data” tag. (Important, I’m assuming you have not changed directories after you ran git clone, make sure the path is correctly set in the following command. Also change the hardware or name params to your liking, the only thing that matters is that the tag is exactly “drop7_data” and the path is correct).

geowatch_dvc add drop7_data_ssd --path="$(pwd)/smart_drop7" --tags drop7_data --hardware ssd

Now that you have that setup, pull the data

AC_DATA_DVC_DPATH=$(geowatch_dvc --tags drop7_data)
# Make sure this prints the expected path to the repo, otherwise the rest of
# the tutorial will not work.
echo "AC_DATA_DVC_DPATH=$AC_DATA_DVC_DPATH"

# Navigate to the DVC repo
cd $AC_DATA_DVC_DPATH

# Run DVC pull on Drop7-Cropped2GSD to grab the cropped raw bands.
# (in the future I may add precomputed team features here)
dvc pull -r aws -R Drop7-Cropped2GSD
dvc pull -r toothbrush_ssd -R Drop7-Cropped2GSD

2. Modify AC Data

The raw bands AC data is training-ready as is, but you may want to compute team features on it, or update the annotations in some way.

The following is a loose (untested) way of accomplishing this. Using prepare_teamfeats will requires that your feature is registered with it (which hopefully it is).

AC_DATA_DVC_DPATH=$(geowatch_dvc --tags drop7_data)

export CUDA_VISIBLE_DEVICES="0,1"
DVC_DATA_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware=auto)
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
BUNDLE_DPATH=$AC_DATA_DVC_DPATH/Drop6-MeanYear10GSD-V2
python -m geowatch.cli.prepare_teamfeats \
    --base_fpath "$AC_DATA_DVC_DPATH"/imganns-*[0-9].kwcoco.zip \
    --expt_dvc_dpath="$DVC_EXPT_DPATH" \
    --with_landcover=1 \
    --with_invariants2=1 \
    --with_sam=1 \
    --with_materials=0 \
    --with_depth=0 \
    --with_cold=0 \
    --skip_existing=1 \
    --assets_dname=teamfeats \
    --gres=0, --tmux_workers=1 --backend=tmux --run=0

Alternatively, we can write a bash script that loops over regions, and submits jobs to cmd-queue which can then be inspected before being executed. You can get pretty fancy here.

TODO: show example of actually doing a feature computation here.

REGION_IDS=(KR_R001 KR_R002 AE_R001 PE_R001 US_R007 BH_R001 BR_R001 BR_R002 BR_R004 BR_R005 CH_R001 LT_R001 NZ_R001 US_C010 US_C011 US_C012 US_C016 US_R001 US_R004 US_R005 US_R006)

# Grab the regular DVC repo to get acces to the truth
TRUTH_DVC_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware='auto')

# Create a new queue
python -m cmd_queue new "modify_ac_queue"

for REGION_ID in "${REGION_IDS[@]}"; do

    python -m cmd_queue submit --jobname="feature-$REGION_ID" -- modify_ac_queue \
        ... THE COMMAND TO COMPUTE YOUR FEATURE ...

    python -m cmd_queue submit --jobname="reproject-$REGION_ID" --depends="feature-$REGION_ID" -- modify_ac_queue \
        geowatch reproject_annotations \
            --src "$DST_BUNDLE_DPATH/$REGION_ID/$REGION_ID.kwcoco.zip" \
            --dst "$DST_BUNDLE_DPATH/$REGION_ID/imgannots-$REGION_ID.kwcoco.zip" \
            --io_workers="avail/2" \
            --region_models="$TRUTH_DVC_DPATH/annotations/drop6_hard_v1/region_models/${REGION_ID}.geojson" \
            --site_models="$TRUTH_DVC_DPATH/annotations/drop6_hard_v1/site_models/${REGION_ID}_*.geojson"

done

# Show the generated script
python -m cmd_queue show "modify_ac_queue"

# Execute the generated script
python -m cmd_queue run --workers=8 "modify_ac_queue"

Lastly, after you update per-region kwcoco files you will need to write new kwcoco train/validation splits that use these updated files (because the ones that exist in the repo only reference raw bands).

# TODO:
# * Modify the suffix depending on the team feats
# * Modify the base fpath to be correct.
python -m geowatch.cli.prepare_splits \
    --base_fpath "$AC_DATA_DVC_DPATHVC_DATA_DPATH"/Drop7-Cropped2GSD/*/imgannots-*.kwcoco.zip \
    --dst_dpath "$AC_DATA_DVC_DPATH"/Drop7-Cropped2GSD \
    --suffix=rawbands --run=1 --workers=2

Note: see ../../scripts/prepare_drop7.sh for details on how this dataset was initially computed.

3. Train an AC Model

The following is a training run that I recently ran, and I have no idea if its params are good or not, but it provides an example of how to train an AC model

Be sure to grab a pretrained model to start from:

DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
python -m geowatch.utils.simple_dvc request \
    "$DVC_EXPT_DPATH"/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V08/Drop7-Cropped2GSD_SC_bgrn_split6_V08_epoch336_step28982.pt
export CUDA_VISIBLE_DEVICES=1
DVC_DATA_DPATH=$(geowatch_dvc --tags='drop7_data' --hardware='auto')
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
echo "DVC_EXPT_DPATH = $DVC_EXPT_DPATH"
WORKDIR=$DVC_EXPT_DPATH/training/$HOSTNAME/$USER
DATASET_CODE=Drop7-Cropped2GSD
KWCOCO_BUNDLE_DPATH=$DVC_DATA_DPATH/$DATASET_CODE
TRAIN_FPATH=$KWCOCO_BUNDLE_DPATH/data_train_rawbands_split6.kwcoco.zip
VALI_FPATH=$KWCOCO_BUNDLE_DPATH/data_vali_rawbands_split6.kwcoco.zip
CHANNELS="(L8,S2):(blue|green|red|nir),(WV):(blue|green|red),(WV,WV1):pan"
EXPERIMENT_NAME=Drop7-Cropped2GSD_SC_bgrn_split6_V11
DEFAULT_ROOT_DIR=$WORKDIR/$DATASET_CODE/runs/$EXPERIMENT_NAME
TARGET_LR=1e-4
WEIGHT_DECAY=$(python -c "print($TARGET_LR * 0.01)")
echo "WEIGHT_DECAY = $WEIGHT_DECAY"
MAX_STEPS=80000
WATCH_GRID_WORKERS=0 python -m geowatch.tasks.fusion fit --config "
data:
    select_videos          : $SELECT_VIDEOS
    num_workers            : 5
    train_dataset          : $TRAIN_FPATH
    vali_dataset           : $VALI_FPATH
    window_dims            : '224,224'
    time_steps             : 9
    time_sampling          : soft4
    time_kernel            : '(-1.08y,-1y,-0.25y,-0.08y,0.0y,0.08y,0.25y,1y,1.08y)'
    window_resolution     : 2.0GSD
    input_resolution      : 2.0GSD
    output_resolution     : 2.0GSD
    neg_to_pos_ratio       : 1.0
    batch_size             : 2
    normalize_perframe     : false
    normalize_peritem      : 'blue|green|red|nir|pan'
    max_epoch_length       : 1000000
    channels               : '$CHANNELS'
    min_spacetime_weight   : 0.6
    temporal_dropout       : 0.5
    mask_low_quality       : False
    mask_samecolor_method  : None
    observable_threshold   : 0.1
    quality_threshold      : 0.0
    weight_dilate          : 10
    use_centered_positives : True
    use_grid_positives     : False
    use_grid_negatives     : False
    normalize_inputs       : 1024
    balance_areas          : True
model:
    class_path: MultimodalTransformer
    init_args:
        #saliency_weights      : '1:1'
        #class_weights         : auto
        tokenizer              : linconv
        arch_name              : smt_it_stm_p16
        decoder                : mlp
        positive_change_weight : 1
        negative_change_weight : 0.01
        stream_channels        : 16
        class_loss             : 'dicefocal'
        saliency_loss          : 'focal'
        saliency_head_hidden   : 6
        change_head_hidden     : 6
        class_head_hidden      : 6
        global_change_weight   : 0.00
        global_class_weight    : 1.00
        global_saliency_weight : 0.00001
        multimodal_reduce      : learned_linear
optimizer:
    class_path: torch.optim.AdamW
    init_args:
        lr           : $TARGET_LR
        weight_decay : $WEIGHT_DECAY
        betas:
            - 0.85
            - 0.998
lr_scheduler:
  class_path: torch.optim.lr_scheduler.OneCycleLR
  init_args:
    max_lr: $TARGET_LR
    total_steps: $MAX_STEPS
    anneal_strategy: cos
    pct_start: 0.3
    div_factor: 10
    final_div_factor: 10000
    cycle_momentum: false
trainer:
    accumulate_grad_batches: 48
    default_root_dir     : $DEFAULT_ROOT_DIR
    accelerator          : gpu
    devices              : 0,
    limit_val_batches    : 256
    limit_train_batches  : 2048
    num_sanity_val_steps : 0
    max_epochs           : 560
    callbacks:
        - class_path: pytorch_lightning.callbacks.ModelCheckpoint
          init_args:
              monitor: val_loss
              mode: min
              save_top_k: 5
              filename: '{epoch}-{step}-{val_loss:.3f}.ckpt'
              save_last: true

torch_globals:
    float32_matmul_precision: auto

initializer:
    init: $DVC_EXPT_DPATH/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V08/Drop7-Cropped2GSD_SC_bgrn_split6_V08_epoch336_step28982.pt
"

4. Evaluate an AC Model with MLOps

The following code runs an AC-only mlops evaluation using the ground truth polygons as a proxy for the polygons that come out of BAS. This provides a consistent way to compare models, but a full evaluation of BAS+SV+AC is needed for final evaluation (TODO, add this).

The following command only runs over KR1 and KR2, add more regions as necessary.

This also includes 3 existing baseline SC models (which you will need to pull from the dvc expt repo) to compare your model against. Put the path to your packaged model in the grid and adjust parameters as desired.

python -m geowatch.mlops.manager "list" --dataset_codes Drop7-Cropped2GSD

HIRES_DVC_DATA_DPATH=$(geowatch_dvc --tags='drop7_data' --hardware=auto)
TRUTH_DVC_DATA_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware=auto)
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware=auto)

kwcoco stats \
    $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip \
    $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip \
    $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip

geowatch stats $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip
geowatch stats $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip
geowatch stats $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip

python -m geowatch.mlops.schedule_evaluation --params="
    matrix:
        ########################
        ## AC/SC PIXEL PARAMS ##
        ########################

        sc_pxl.test_dataset:
          - $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip
          - $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip
          - $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip

        sc_pxl.package_fpath:
            - $DVC_EXPT_DPATH/models/fusion/Drop4-SC/packages/Drop4_tune_V30_8GSD_V3/Drop4_tune_V30_8GSD_V3_epoch=2-step=17334.pt.pt
            #- $DVC_EXPT_DPATH/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V07/Drop7-Cropped2GSD_SC_bgrn_split6_V07_epoch73_step6364.pt
            #- $DVC_EXPT_DPATH/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V11/Drop7-Cropped2GSD_SC_bgrn_split6_V11_epoch444_step19135.pt

        sc_pxl.tta_fliprot: 0.0
        sc_pxl.tta_time: 0.0
        sc_pxl.chip_overlap: 0.3
        #sc_pxl.input_space_scale: 2GSD
        #sc_pxl.window_space_scale: 2GSD
        #sc_pxl.output_space_scale: 2GSD
        #sc_pxl.time_span: 6m
        #sc_pxl.time_sampling: auto
        #sc_pxl.time_steps: 12
        #sc_pxl.chip_dims: auto
        sc_pxl.set_cover_algo: null
        sc_pxl.resample_invalid_frames: 3
        sc_pxl.observable_threshold: 0.0
        sc_pxl.mask_low_quality: true
        sc_pxl.drop_unused_frames: true
        sc_pxl.num_workers: 12
        sc_pxl.batch_size: 1
        sc_pxl.write_workers: 0

        ########################
        ## AC/SC POLY PARAMS  ##
        ########################

        sc_poly.thresh: 0.07
        sc_poly.boundaries_as: polys
        #sc_poly.resolution: 2GSD
        sc_poly.min_area_square_meters: 7200

        #############################
        ## AC/SC POLY EVAL PARAMS  ##
        #############################

        sc_poly_eval.true_site_dpath: $TRUTH_DVC_DATA_DPATH/annotations/drop6/site_models
        sc_poly_eval.true_region_dpath: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models

        ##################################
        ## HIGH LEVEL PIPELINE CONTROLS ##
        ##################################
        sc_pxl.enabled: 1
        sc_pxl_eval.enabled: 1
        sc_poly.enabled: 1
        sc_poly_eval.enabled: 1
        sc_poly_viz.enabled: 0

    submatrices:
        - sc_pxl.test_dataset: $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip
          sc_poly.site_summary: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models/KR_R001.geojson
        - sc_pxl.test_dataset: $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip
          sc_poly.site_summary: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models/KR_R002.geojson
        - sc_pxl.test_dataset: $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip
          sc_poly.site_summary: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models/CH_R001.geojson
    " \
    --pipeline=sc \
    --root_dpath="$DVC_EXPT_DPATH/_demo_ac_eval" \
    --queue_name "_demo_ac_eval" \
    --devices="0,1" \
    --backend=tmux --tmux_workers=6 \
    --cache=1 --skip_existing=1 --run=1

After mlops evaluation completes you can inspect your results with mlops aggregate to produce reports and gain insight.

DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware=auto)
python -m geowatch.mlops.aggregate \
    --pipeline=sc \
    --target "
        - $DVC_EXPT_DPATH/_demo_ac_eval
    " \
    --output_dpath="$DVC_EXPT_DPATH/_demo_ac_eval/aggregate" \
    --resource_report=0 \
    --eval_nodes="
        - sc_poly_eval
    " \
    --plot_params="
        enabled: 0
        stats_ranking: 0
        min_variations: 1
        params_of_interest:
            - params.sc_poly.thresh
    " \
    --stdout_report="
        top_k: 13
        per_group: 1
        macro_analysis: 0
        analyze: 0
        print_models: True
        reference_region: final
        concise: 0
        show_csv: 0
    "

    #\
    #--rois="KR_R002,NZ_R001,CH_R001,KR_R001"