geowatch.cli.queue_cli.prepare_ta2_dataset module

Builds a multi-region dataset.

An end-to-end script for calling all the scripts needed to

  • Pulls the STAC catalog that points to processed large image tiles

  • Creates a virtual Uncropped kwcoco dataset that points to the large image tiles

  • Crops the dataset to create an aligned TA2 dataset

See also

~/code/geowatch/scripts/prepare_drop4.sh ~/code/geowatch/scripts/prepare_drop5.sh

CommandLine

# Create a demo region file, and create vairables that point at relevant
# paths, which are by default written in your ~/.cache folder
xdoctest -m geowatch.demo.demo_region demo_khq_region_fpath
REGION_FPATH="$HOME/.cache/geowatch/demo/annotations/KHQ_R001.geojson"
SITE_GLOBSTR="$HOME/.cache/geowatch/demo/annotations/KHQ_R001_sites/*.geojson"

# The "name" of the new dataset
DATASET_SUFFIX=Demo-TA2-KHQ

# Set this to where you want to build the dataset
DEMO_DPATH=$PWD/prep_ta2_demo

mkdir -p "$DEMO_DPATH"

# This is a string code indicating what STAC endpoint we will pull from
SENSORS="sentinel-2-l2a"

# Depending on the STAC endpoint, some parameters may need to change:
# collated - True for IARPA endpoints, Usually False for public data
# requester_pays - True for public landsat
# api_key - A secret for non public data

export SMART_STAC_API_KEY=""
export GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR

# Construct the TA2-ready dataset
python -m geowatch.cli.queue_cli.prepare_ta2_dataset \
    --dataset_suffix=$DATASET_SUFFIX \
    --cloud_cover=100 \
    --stac_query_mode=auto \
    --sensors "$SENSORS" \
    --api_key=env:SMART_STAC_API_KEY \
    --collated False \
    --requester_pays=True \
    --dvc_dpath="$DEMO_DPATH" \
    --aws_profile=iarpa \
    --region_globstr="$REGION_FPATH" \
    --site_globstr="$SITE_GLOBSTR" \
    --fields_workers=8 \
    --convert_workers=0 \
    --align_workers=26 \
    --cache=1 \
    --skip_existing=0 \
    --ignore_duplicates=1 \
    --target_gsd=30 \
    --visualize=True \
    --max_products_per_region=10 \
    --backend=serial \
    --run=0

geowatch visualize $HOME/data/dvc-repos/smart_watch_dvc/Aligned-Drop2-TA1-2022-02-24/data.kwcoco_c9ea8bb9.json

Todo

handle GE01 and WV01 platforms

CommandLine

xdoctest -m geowatch.cli.queue_cli.prepare_ta2_dataset __doc__:0

Example

>>> from geowatch.cli.queue_cli.prepare_ta2_dataset import *  # NOQA
>>> import ubelt as ub
>>> dpath = ub.Path.appdir('geowatch/tests/prep_ta2_dataset').delete().ensuredir()
>>> from geowatch.geoannots import geomodels
>>> # Write dummy regions / sites
>>> for rng in [0, 1, 3]:
>>>     region, sites = geomodels.RegionModel.random(rng=rng, with_sites=True)
>>>     region_dpath = (dpath / 'region_models').ensuredir()
>>>     site_dpath = (dpath / 'site_models').ensuredir()
>>>     region_fpath = region_dpath / (region.region_id + '.geojson')
>>>     region_fpath.write_text(region.dumps())
>>>     for site in sites:
>>>         site_fpath = site_dpath / (site.site_id + '.geojson')
>>>         site_fpath.write_text(site.dumps())
>>> # Prepare config and test a dry run
>>> kwargs = PrepareTA2Config()
>>> kwargs['dataset_suffix'] = 'DEMO_DOCTEST'
>>> kwargs['run'] = 0
>>> kwargs['stac_query_mode'] = 'auto'
>>> kwargs['regions'] = region_dpath
>>> kwargs['sites'] = site_dpath
>>> kwargs['backend'] = 'serial'
>>> kwargs['visualize'] = 1
>>> kwargs['collated'] = [True]
>>> kwargs['out_dpath'] = '.'
>>> cmdline = 0
>>> PrepareTA2Config.main(cmdline=cmdline, **kwargs)
class geowatch.cli.queue_cli.prepare_ta2_dataset.PrepareTA2Config(*args, **kwargs)[source]

Bases: CMDQueueConfig

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

default = {'align_aux_workers': <Value(0)>, 'align_keep': <Value('img')>, 'align_skip_previous_errors': <Value(False)>, 'align_tries': <Value(2)>, 'align_workers': <Value(0)>, 'api_key': <Value('env:SMART_STAC_API_KEY')>, 'asset_timeout': <Value('4hours')>, 'aws_profile': <Value(None)>, 'backend': <Value('tmux')>, 'cache': <Value(True)>, 'cloud_cover': <Value(10)>, 'collated': <Value([True])>, 'convert_workers': <Value(0)>, 'dataset_suffix': <Value(None)>, 'exclude_channels': <Value(None)>, 'fields_workers': <Value('min(avail,max(all/2,8))')>, 'final_union': <Value(False)>, 'force_min_gsd': <Value(None)>, 'force_nodata': <Value(None)>, 'hack_lazy': <Value(False)>, 'ignore_duplicates': <Value(True)>, 'image_timeout': <Value('8hours')>, 'include_channels': <Value(None)>, 'max_products_per_region': <Value(None)>, 'max_regions': <Value(None)>, 'other_session_handler': <Value('ask')>, 'out_dpath': <Value('auto')>, 'print_commands': <Value('auto')>, 'print_queue': <Value('auto')>, 'propogate_strategy': <Value('NEW-SMART')>, 'qa_encoding': <Value(None)>, 'query_workers': <Value(0)>, 'queue_name': <Value('prep-ta2-dataset')>, 'regions': <Value('annotations/region_models')>, 'remove_broken': <Value(True)>, 'reproject_annotations': <Value(True)>, 'requester_pays': <Value(False)>, 'rpc_align_method': <Value('orthorectify')>, 'run': <Value(False)>, 's3_fpath': <Value(None)>, 'select_images': <Value(False)>, 'sensor_to_time_window': <Value(None)>, 'sensors': <Value('L2')>, 'sites': <Value(None)>, 'skip_existing': <Value(False)>, 'skip_populate_errors': <Value(False)>, 'slurm_options': <Value(None)>, 'splits': <Value(False)>, 'stac_query_mode': <Value(None)>, 'target_gsd': <Value(10)>, 'tmux_workers': <Value(8)>, 'unsigned_nodata': <Value(256)>, 'verbose': <Value(0)>, 'virtualenv_cmd': <Value(None)>, 'visualize': <Value(False)>, 'visualize_only_boxes': <Value(True)>, 'with_textual': <Value('auto')>}
main(**kwargs)
geowatch.cli.queue_cli.prepare_ta2_dataset.main(cmdline=False, **kwargs)[source]