SMART Activity Characterization Tutorial¶
This document is a tutorial on Activity Characterization (AC) in the context of the SMART project. It goes over:
Access AC data via DVC.
Modifying the data / predicting features on the data
Train a model geowatch.tasks.fusion
Evaluate a model with geowatch.mlops
1. Access AC Data¶
Due to an issue that I don’t fully understand, the pre-clustered and pre-cropped Drop7 AC training data is in a different DVC repo.
Navigate to where you would like to store it and grab the DVC repo.
git clone git@gitlab.kitware.com:smart/smart_drop7.git
To ensure commands in this tutorial are runnable, be sure to register this new repo with geowatch using the “drop7_data” tag. (Important, I’m assuming you have not changed directories after you ran git clone, make sure the path is correctly set in the following command. Also change the hardware or name params to your liking, the only thing that matters is that the tag is exactly “drop7_data” and the path is correct).
geowatch_dvc add drop7_data_ssd --path="$(pwd)/smart_drop7" --tags drop7_data --hardware ssd
Now that you have that setup, pull the data
AC_DATA_DVC_DPATH=$(geowatch_dvc --tags drop7_data)
# Make sure this prints the expected path to the repo, otherwise the rest of
# the tutorial will not work.
echo "AC_DATA_DVC_DPATH=$AC_DATA_DVC_DPATH"
# Navigate to the DVC repo
cd $AC_DATA_DVC_DPATH
# Run DVC pull on Drop7-Cropped2GSD to grab the cropped raw bands.
# (in the future I may add precomputed team features here)
dvc pull -r aws -R Drop7-Cropped2GSD
dvc pull -r toothbrush_ssd -R Drop7-Cropped2GSD
2. Modify AC Data¶
The raw bands AC data is training-ready as is, but you may want to compute team features on it, or update the annotations in some way.
The following is a loose (untested) way of accomplishing this. Using prepare_teamfeats will requires that your feature is registered with it (which hopefully it is).
AC_DATA_DVC_DPATH=$(geowatch_dvc --tags drop7_data)
export CUDA_VISIBLE_DEVICES="0,1"
DVC_DATA_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware=auto)
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
BUNDLE_DPATH=$AC_DATA_DVC_DPATH/Drop6-MeanYear10GSD-V2
python -m geowatch.cli.prepare_teamfeats \
--base_fpath "$AC_DATA_DVC_DPATH"/imganns-*[0-9].kwcoco.zip \
--expt_dvc_dpath="$DVC_EXPT_DPATH" \
--with_landcover=1 \
--with_invariants2=1 \
--with_sam=1 \
--with_materials=0 \
--with_depth=0 \
--with_cold=0 \
--skip_existing=1 \
--assets_dname=teamfeats \
--gres=0, --tmux_workers=1 --backend=tmux --run=0
Alternatively, we can write a bash script that loops over regions, and submits jobs to cmd-queue which can then be inspected before being executed. You can get pretty fancy here.
TODO: show example of actually doing a feature computation here.
REGION_IDS=(KR_R001 KR_R002 AE_R001 PE_R001 US_R007 BH_R001 BR_R001 BR_R002 BR_R004 BR_R005 CH_R001 LT_R001 NZ_R001 US_C010 US_C011 US_C012 US_C016 US_R001 US_R004 US_R005 US_R006)
# Grab the regular DVC repo to get acces to the truth
TRUTH_DVC_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware='auto')
# Create a new queue
python -m cmd_queue new "modify_ac_queue"
for REGION_ID in "${REGION_IDS[@]}"; do
python -m cmd_queue submit --jobname="feature-$REGION_ID" -- modify_ac_queue \
... THE COMMAND TO COMPUTE YOUR FEATURE ...
python -m cmd_queue submit --jobname="reproject-$REGION_ID" --depends="feature-$REGION_ID" -- modify_ac_queue \
geowatch reproject_annotations \
--src "$DST_BUNDLE_DPATH/$REGION_ID/$REGION_ID.kwcoco.zip" \
--dst "$DST_BUNDLE_DPATH/$REGION_ID/imgannots-$REGION_ID.kwcoco.zip" \
--io_workers="avail/2" \
--region_models="$TRUTH_DVC_DPATH/annotations/drop6_hard_v1/region_models/${REGION_ID}.geojson" \
--site_models="$TRUTH_DVC_DPATH/annotations/drop6_hard_v1/site_models/${REGION_ID}_*.geojson"
done
# Show the generated script
python -m cmd_queue show "modify_ac_queue"
# Execute the generated script
python -m cmd_queue run --workers=8 "modify_ac_queue"
Lastly, after you update per-region kwcoco files you will need to write new kwcoco train/validation splits that use these updated files (because the ones that exist in the repo only reference raw bands).
# TODO:
# * Modify the suffix depending on the team feats
# * Modify the base fpath to be correct.
python -m geowatch.cli.prepare_splits \
--base_fpath "$AC_DATA_DVC_DPATHVC_DATA_DPATH"/Drop7-Cropped2GSD/*/imgannots-*.kwcoco.zip \
--dst_dpath "$AC_DATA_DVC_DPATH"/Drop7-Cropped2GSD \
--suffix=rawbands --run=1 --workers=2
Note: see ../../scripts/prepare_drop7.sh for details on how this dataset was initially computed.
3. Train an AC Model¶
The following is a training run that I recently ran, and I have no idea if its params are good or not, but it provides an example of how to train an AC model
Be sure to grab a pretrained model to start from:
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
python -m geowatch.utils.simple_dvc request \
"$DVC_EXPT_DPATH"/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V08/Drop7-Cropped2GSD_SC_bgrn_split6_V08_epoch336_step28982.pt
export CUDA_VISIBLE_DEVICES=1
DVC_DATA_DPATH=$(geowatch_dvc --tags='drop7_data' --hardware='auto')
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
echo "DVC_EXPT_DPATH = $DVC_EXPT_DPATH"
WORKDIR=$DVC_EXPT_DPATH/training/$HOSTNAME/$USER
DATASET_CODE=Drop7-Cropped2GSD
KWCOCO_BUNDLE_DPATH=$DVC_DATA_DPATH/$DATASET_CODE
TRAIN_FPATH=$KWCOCO_BUNDLE_DPATH/data_train_rawbands_split6.kwcoco.zip
VALI_FPATH=$KWCOCO_BUNDLE_DPATH/data_vali_rawbands_split6.kwcoco.zip
CHANNELS="(L8,S2):(blue|green|red|nir),(WV):(blue|green|red),(WV,WV1):pan"
EXPERIMENT_NAME=Drop7-Cropped2GSD_SC_bgrn_split6_V11
DEFAULT_ROOT_DIR=$WORKDIR/$DATASET_CODE/runs/$EXPERIMENT_NAME
TARGET_LR=1e-4
WEIGHT_DECAY=$(python -c "print($TARGET_LR * 0.01)")
echo "WEIGHT_DECAY = $WEIGHT_DECAY"
MAX_STEPS=80000
WATCH_GRID_WORKERS=0 python -m geowatch.tasks.fusion fit --config "
data:
select_videos : $SELECT_VIDEOS
num_workers : 5
train_dataset : $TRAIN_FPATH
vali_dataset : $VALI_FPATH
window_dims : '224,224'
time_steps : 9
time_sampling : soft4
time_kernel : '(-1.08y,-1y,-0.25y,-0.08y,0.0y,0.08y,0.25y,1y,1.08y)'
window_resolution : 2.0GSD
input_resolution : 2.0GSD
output_resolution : 2.0GSD
neg_to_pos_ratio : 1.0
batch_size : 2
normalize_perframe : false
normalize_peritem : 'blue|green|red|nir|pan'
max_epoch_length : 1000000
channels : '$CHANNELS'
min_spacetime_weight : 0.6
temporal_dropout : 0.5
mask_low_quality : False
mask_samecolor_method : None
observable_threshold : 0.1
quality_threshold : 0.0
weight_dilate : 10
use_centered_positives : True
use_grid_positives : False
use_grid_negatives : False
normalize_inputs : 1024
balance_areas : True
model:
class_path: MultimodalTransformer
init_args:
#saliency_weights : '1:1'
#class_weights : auto
tokenizer : linconv
arch_name : smt_it_stm_p16
decoder : mlp
positive_change_weight : 1
negative_change_weight : 0.01
stream_channels : 16
class_loss : 'dicefocal'
saliency_loss : 'focal'
saliency_head_hidden : 6
change_head_hidden : 6
class_head_hidden : 6
global_change_weight : 0.00
global_class_weight : 1.00
global_saliency_weight : 0.00001
multimodal_reduce : learned_linear
optimizer:
class_path: torch.optim.AdamW
init_args:
lr : $TARGET_LR
weight_decay : $WEIGHT_DECAY
betas:
- 0.85
- 0.998
lr_scheduler:
class_path: torch.optim.lr_scheduler.OneCycleLR
init_args:
max_lr: $TARGET_LR
total_steps: $MAX_STEPS
anneal_strategy: cos
pct_start: 0.3
div_factor: 10
final_div_factor: 10000
cycle_momentum: false
trainer:
accumulate_grad_batches: 48
default_root_dir : $DEFAULT_ROOT_DIR
accelerator : gpu
devices : 0,
limit_val_batches : 256
limit_train_batches : 2048
num_sanity_val_steps : 0
max_epochs : 560
callbacks:
- class_path: pytorch_lightning.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
mode: min
save_top_k: 5
filename: '{epoch}-{step}-{val_loss:.3f}.ckpt'
save_last: true
torch_globals:
float32_matmul_precision: auto
initializer:
init: $DVC_EXPT_DPATH/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V08/Drop7-Cropped2GSD_SC_bgrn_split6_V08_epoch336_step28982.pt
"
4. Evaluate an AC Model with MLOps¶
The following code runs an AC-only mlops evaluation using the ground truth polygons as a proxy for the polygons that come out of BAS. This provides a consistent way to compare models, but a full evaluation of BAS+SV+AC is needed for final evaluation (TODO, add this).
The following command only runs over KR1 and KR2, add more regions as necessary.
This also includes 3 existing baseline SC models (which you will need to pull from the dvc expt repo) to compare your model against. Put the path to your packaged model in the grid and adjust parameters as desired.
python -m geowatch.mlops.manager "list" --dataset_codes Drop7-Cropped2GSD
HIRES_DVC_DATA_DPATH=$(geowatch_dvc --tags='drop7_data' --hardware=auto)
TRUTH_DVC_DATA_DPATH=$(geowatch_dvc --tags='phase2_data' --hardware=auto)
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware=auto)
kwcoco stats \
$HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip \
$HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip \
$HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip
geowatch stats $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip
geowatch stats $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip
geowatch stats $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip
python -m geowatch.mlops.schedule_evaluation --params="
matrix:
########################
## AC/SC PIXEL PARAMS ##
########################
sc_pxl.test_dataset:
- $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip
- $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip
- $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip
sc_pxl.package_fpath:
- $DVC_EXPT_DPATH/models/fusion/Drop4-SC/packages/Drop4_tune_V30_8GSD_V3/Drop4_tune_V30_8GSD_V3_epoch=2-step=17334.pt.pt
#- $DVC_EXPT_DPATH/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V07/Drop7-Cropped2GSD_SC_bgrn_split6_V07_epoch73_step6364.pt
#- $DVC_EXPT_DPATH/models/fusion/Drop7-Cropped2GSD/packages/Drop7-Cropped2GSD_SC_bgrn_split6_V11/Drop7-Cropped2GSD_SC_bgrn_split6_V11_epoch444_step19135.pt
sc_pxl.tta_fliprot: 0.0
sc_pxl.tta_time: 0.0
sc_pxl.chip_overlap: 0.3
#sc_pxl.input_space_scale: 2GSD
#sc_pxl.window_space_scale: 2GSD
#sc_pxl.output_space_scale: 2GSD
#sc_pxl.time_span: 6m
#sc_pxl.time_sampling: auto
#sc_pxl.time_steps: 12
#sc_pxl.chip_dims: auto
sc_pxl.set_cover_algo: null
sc_pxl.resample_invalid_frames: 3
sc_pxl.observable_threshold: 0.0
sc_pxl.mask_low_quality: true
sc_pxl.drop_unused_frames: true
sc_pxl.num_workers: 12
sc_pxl.batch_size: 1
sc_pxl.write_workers: 0
########################
## AC/SC POLY PARAMS ##
########################
sc_poly.thresh: 0.07
sc_poly.boundaries_as: polys
#sc_poly.resolution: 2GSD
sc_poly.min_area_square_meters: 7200
#############################
## AC/SC POLY EVAL PARAMS ##
#############################
sc_poly_eval.true_site_dpath: $TRUTH_DVC_DATA_DPATH/annotations/drop6/site_models
sc_poly_eval.true_region_dpath: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models
##################################
## HIGH LEVEL PIPELINE CONTROLS ##
##################################
sc_pxl.enabled: 1
sc_pxl_eval.enabled: 1
sc_poly.enabled: 1
sc_poly_eval.enabled: 1
sc_poly_viz.enabled: 0
submatrices:
- sc_pxl.test_dataset: $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R001/KR_R001.kwcoco.zip
sc_poly.site_summary: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models/KR_R001.geojson
- sc_pxl.test_dataset: $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/KR_R002/KR_R002.kwcoco.zip
sc_poly.site_summary: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models/KR_R002.geojson
- sc_pxl.test_dataset: $HIRES_DVC_DATA_DPATH/Drop7-Cropped2GSD/CH_R001/CH_R001.kwcoco.zip
sc_poly.site_summary: $TRUTH_DVC_DATA_DPATH/annotations/drop6/region_models/CH_R001.geojson
" \
--pipeline=sc \
--root_dpath="$DVC_EXPT_DPATH/_demo_ac_eval" \
--queue_name "_demo_ac_eval" \
--devices="0,1" \
--backend=tmux --tmux_workers=6 \
--cache=1 --skip_existing=1 --run=1
After mlops evaluation completes you can inspect your results with mlops aggregate to produce reports and gain insight.
DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware=auto)
python -m geowatch.mlops.aggregate \
--pipeline=sc \
--target "
- $DVC_EXPT_DPATH/_demo_ac_eval
" \
--output_dpath="$DVC_EXPT_DPATH/_demo_ac_eval/aggregate" \
--resource_report=0 \
--eval_nodes="
- sc_poly_eval
" \
--plot_params="
enabled: 0
stats_ranking: 0
min_variations: 1
params_of_interest:
- params.sc_poly.thresh
" \
--stdout_report="
top_k: 13
per_group: 1
macro_analysis: 0
analyze: 0
print_models: True
reference_region: final
concise: 0
show_csv: 0
"
#\
#--rois="KR_R002,NZ_R001,CH_R001,KR_R001"