geowatch.tasks.fusion.datamodules.kwcoco_dataset module¶
Defines KWCocoVideoDataset
, a torch Dataset for kwcoco image and video
data.
The configurable input parameters are defined in the
KWCocoVideoDatasetConfig
, which is used to resolve kwargs passed to
the main KWCocoVideoDataset
class. These parameters give the developer
fine grined control over how sampling is done. At the most basic level the
developer should specify:
window_space_scale - the size of the window (possibly in a virtual sample space) used to build the virtual sample grid.
input_space_scale - the scale of the inputs (default to the window space scale, but could be different).
time_kernel - or (time_sampling / time_dims) to indicate how many / distribution of frames sampled over time.
The following doctests provide a crash course on what sort of sampling parameters are available.
CommandLine
xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset __doc__:0 --show
xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset __doc__:1 --show
Example
>>> # Basic Data Sampling
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA
>>> import ndsampler
>>> import kwcoco
>>> coco_dset = kwcoco.CocoDataset.demo('vidshapes1', num_frames=10)
>>> sampler = ndsampler.CocoSampler(coco_dset)
>>> self = KWCocoVideoDataset(sampler, time_dims=4, window_dims=(300, 300),
>>> channels='r|g|b')
>>> self.disable_augmenter = True
>>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][0]]
>>> item = self[index]
>>> # Summarize batch item in text
>>> summary = self.summarize_item(item)
>>> print('item summary: ' + ub.urepr(summary, nl=2))
>>> # Draw batch item
>>> canvas = self.draw_item(item)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(canvas)
>>> kwplot.show_if_requested()
Example
>>> # Basic Data Sampling
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA
>>> import ndsampler
>>> import kwcoco
>>> coco_dset = kwcoco.CocoDataset.demo('vidshapes1', num_frames=10)
>>> sampler = ndsampler.CocoSampler(coco_dset)
>>> self = KWCocoVideoDataset(sampler, window_dims='full', channels='r|g|b')
>>> self.disable_augmenter = True
>>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][0]]
>>> item = self[index]
>>> # Summarize batch item in text
>>> summary = self.summarize_item(item)
>>> print('item summary: ' + ub.urepr(summary, nl=2))
>>> # Draw batch item
>>> canvas = self.draw_item(item)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(canvas)
>>> kwplot.show_if_requested()
Example
>>> # Demo toy data without augmentation
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA
>>> import kwcoco
>>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2-multispectral', num_frames=10)
>>> channels = 'B10,B8a|B1,B8'
>>> self = KWCocoVideoDataset(coco_dset, time_dims=4, window_dims=(300, 300),
>>> channels=channels,
>>> input_space_scale='native',
>>> output_space_scale=None,
>>> window_space_scale=1.2,
>>> augment_space_shift_rate=0.5,
>>> use_grid_negatives=False,
>>> use_grid_positives=False,
>>> use_centered_positives=True,
>>> absolute_weighting=True,
>>> time_sampling='uniform',
>>> time_kernel='-1year,0,1month,1year',
>>> modality_dropout=0.5,
>>> channel_dropout=0.5,
>>> temporal_dropout=0.7,
>>> temporal_dropout_rate=1.0)
>>> # Add weights to annots
>>> annots = self.sampler.dset.annots()
>>> annots.set('weight', 2 + np.random.rand(len(annots)) * 10)
>>> self.disable_augmenter = False
>>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][3]]
>>> item = self[index]
>>> summary = self.summarize_item(item)
>>> print('item summary: ' + ub.urepr(summary, nl=3))
>>> canvas = self.draw_item(item, overlay_on_image=0, rescale=0, max_dim=1024)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(canvas)
>>> kwplot.show_if_requested()
Example
>>> # Demo toy data with augmentation
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA
>>> import kwcoco
>>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2-multispectral', num_frames=10)
>>> channels = 'B10,B8a|B1,B8'
>>> self = KWCocoVideoDataset(coco_dset, time_dims=3, window_dims=(300, 300),
>>> channels=channels,
>>> input_space_scale='native',
>>> output_space_scale=None,
>>> window_space_scale=1.2,
>>> time_sampling='soft2+distribute',
>>> time_kernel='-1y,0,1y',
>>> modality_dropout=0.5,
>>> temporal_dropout=0.5)
>>> assert not self.disable_augmenter
>>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][3]]
>>> item = self[index]
>>> assert item['target']['allow_augment']
>>> print('item summary: ' + ub.urepr(self.summarize_item(item), nl=3))
>>> canvas = self.draw_item(item, overlay_on_image=0, rescale=0)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(canvas)
>>> kwplot.show_if_requested()
SeeAlso:
For notes on spaces, see: ~/code/geowatch/docs/source/manual/development/coding_conventions.rst
Known Issues¶
[ ] FIXME: sensorchan codes should exclude non-specified sensors immediately before temporal sampling. Currently temporal sampling is given everything. E.g. (L8,S2):red|green|blue should not allow WV to be included in sampling.
Roadmap¶
[ ] Get external feedback and suggestions.
[ ] Accept albumentations json or more concise spec for custom augmentation
[ ] Optimize fixed channel case.
[ ] Optimize fixed image size case.
[ ] Optimize fixed video size case.
[ ] Optimize the spacetime grid sampler.
[ ] Allow input resolution to be specified as a fixed pixel size.
[ ] Don’t force compute of width / height if the window_space_dims is “full”.
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.KWCocoVideoDatasetConfig(*args, **kwargs)[source]¶
Bases:
DataConfig
This is the configuration for a single dataset that could be used for train, test, or validation.
In the future this might be convertible to, or handled by omegaconfig
The core spacetime parameters are:
window_space_scale
input_space_scale
output_space_scale
time_steps
time_sampling
chip_dims / window_space_dims
This dataset defines an implicit grid of where it will sample, and it uses these “targets” to request data from ndsampler, which is what gives us amortized random access to the dataset.
- The logic contained in this class concerns:
interacting with the spacetime grids sampler to build a grid
target-level augmentations
data-level augmentations (need more of these)
interacting with ndsampler to read data associated with a grid point
balanced sampling over the targets
mapping targets to a HeterogeneousBatchItem in the most general case.
Valid options: []
- Parameters:
*args – positional arguments for this data config
**kwargs – keyword arguments for this data config
- default = {'absolute_weighting': <Value(False)>, 'augment_space_rot': <Value(True)>, 'augment_space_shift_rate': <Value(0.9)>, 'augment_space_xflip': <Value(True)>, 'augment_space_yflip': <Value(True)>, 'augment_time_resample_rate': <Value(0.8)>, 'balance_areas': <Value(False)>, 'balance_options': <Value(None)>, 'channel_dropout': <Value(0.0)>, 'channel_dropout_rate': <Value(0.0)>, 'channels': <Value(None)>, 'chip_dims': <Value(128)>, 'chip_overlap': <Value(0.0)>, 'default_class_behavior': <Value('background')>, 'dist_weights': <Value(0)>, 'downweight_nan_regions': <Value(True)>, 'dynamic_fixed_resolution': <Value(None)>, 'exclude_sensors': <Value(None)>, 'failed_sample_policy': <Value('warn')>, 'fixed_resolution': <Value(None)>, 'force_bad_frames': <Value(False)>, 'ignore_dilate': <Value(0)>, 'include_sensors': <Value(None)>, 'input_space_scale': <Value(None)>, 'mask_low_quality': <Value(False)>, 'mask_nan_bands': <Value('')>, 'mask_samecolor_bands': <Value('red')>, 'mask_samecolor_method': <Value(None)>, 'mask_samecolor_values': <Value(0)>, 'max_epoch_length': <Value(None)>, 'min_spacetime_weight': <Value(0.9)>, 'modality_dropout': <Value(0.0)>, 'modality_dropout_rate': <Value(0.0)>, 'neg_to_pos_ratio': <Value(1.0)>, 'normalize_perframe': <Value(False)>, 'normalize_peritem': <Value(None)>, 'num_balance_trees': <Value(16)>, 'observable_threshold': <Value(0.0)>, 'output_space_scale': <Value(None)>, 'output_type': <Value('heterogeneous')>, 'prenormalize_inputs': <Value(None)>, 'quality_threshold': <Value(0.0)>, 'reduce_item_size': <Value(False)>, 'resample_invalid_frames': <Value(3)>, 'reseed_fit_random_generators': <Value(True)>, 'sampler_backend': <Value(None)>, 'sampler_workdir': <Value(None)>, 'sampler_workers': <Value('avail/2')>, 'select_images': <Value(None)>, 'select_videos': <Value(None)>, 'set_cover_algo': <Value(None)>, 'temporal_dropout': <Value(0.0)>, 'temporal_dropout_rate': <Value(1.0)>, 'time_kernel': <Value(None)>, 'time_sampling': <Value('contiguous')>, 'time_span': <Value(None)>, 'time_steps': <Value(2)>, 'upweight_centers': <Value(True)>, 'upweight_time': <Value(None)>, 'use_centered_positives': <Value(False)>, 'use_cloudmask': <Value(None)>, 'use_grid_cache': <Value(True)>, 'use_grid_negatives': <Value(True)>, 'use_grid_positives': <Value(True)>, 'use_grid_valid_regions': <Value(True)>, 'weight_dilate': <Value(0)>, 'window_space_scale': <Value(None)>}¶
- normalize()¶
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.TruthMixin[source]¶
Bases:
object
Methods related to drawing truth rasters / training objectives
- ComamndLine:
LINE_PROFILE=1 xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset TruthMixin:0 –bench
Example
>>> # xdoctest: +REQUIRES(--bench) >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import KWCocoVideoDataset >>> import ndsampler >>> import geowatch >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2', num_frames=10) >>> sampler = ndsampler.CocoSampler(coco_dset) >>> self = KWCocoVideoDataset(sampler, mode="fit", time_dims=4, window_dims=(196, 196), >>> channels='r|g|b', neg_to_pos_ratio=0) >>> for index in ub.ProgIter(range(1000)): >>> self.getitem(index)
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.GetItemMixin[source]¶
Bases:
TruthMixin
This mixin defines what is needed for the getitem method.
- getitem(index)[source]¶
This is just the same thing as __getitem__ but it raises an error when it fails, which is handled by __getitem__.
- Parameters:
index (int | Dict) – index or target
- Returns:
Dict
CommandLine
LINE_PROFILE=1 xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset GetItemMixin.getitem
CommandLine
xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset GetItemMixin.getitem --show
Example
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import kwcoco >>> import geowatch >>> coco_dset = geowatch.coerce_kwcoco('geowatch-msi-dates-geodata-heatmap', num_frames=5, image_size=(128, 128), num_videos=1) >>> # Remove two annotations to test new time weights >>> aids = coco_dset.images().take([0]).annots[0].lookup('id') >>> coco_dset.remove_annotations(aids) >>> # >>> # Each sensor uses all of its own channels >>> channels = 'auto' >>> self = KWCocoVideoDataset(coco_dset, time_dims=5, >>> window_resolution='0.09GSD', >>> input_resolution='0.09GSD', >>> window_dims=(128, 128), >>> channels=channels, >>> balance_areas=True, >>> weight_dilate=3, >>> normalize_perframe=False) >>> self.disable_augmenter = True >>> # Pretend that some external object has given us information about desired class weights >>> # this could be frequency based, but we will use random weights here. >>> dataset_stats = self.cached_dataset_stats() >>> import kwarray >>> rng = kwarray.ensure_rng(0) >>> class_keys = dataset_stats['class_freq'] >>> catname_to_weight = {c: rng.rand() for c in class_keys} >>> catname_to_weight['star'] = 2.0 >>> self.catname_to_weight = catname_to_weight >>> # >>> index = 0 >>> index = target = self.sample_grid['targets'][self.sample_grid['positives_indexes'][4]] >>> item = self[index] >>> # xdoctest: +REQUIRES(--show) >>> canvas = self.draw_item(item) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas) >>> kwplot.show_if_requested()
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.IntrospectMixin[source]¶
Bases:
object
Methods for introspection / visualization of data
- draw_item(item, item_output=None, combinable_extra=None, max_channels=5, max_dim=224, norm_over_time='auto', overlay_on_image=False, draw_weights=True, rescale='auto', classes=None, show_summary_text=True, **kwargs)[source]¶
Visualize an item produced by this DataSet.
Each channel will be a row, and each column will be a timestep.
- Parameters:
item (Dict) – An item returned from the torch Dataset.
overlay_on_image (bool) – if True, the truth and prediction is drawn on top of an image, otherwise it is drawn on a black image.
max_dim (int) – max dimension to resize each grid cell to.
max_channels (int) – maximum number of channel rows to draw
item_output (Dict) – Special task keys that we know how to plot. These should be some sort of binary or class prediction from the network. I’m not sure how best to pass the details of how they should be interpreted.
- Known keys:
change_probs saliency_probs class_probs pred_ltrb
classes (kwcoco.CategoryTree | None) – Classes any “class_probs” in the ‘item_output’ dictionary corresponds to. If unspecified uses the classes from the datamodule.
show_summary_text (bool) – if True, draw additional summary debug information. Defaults to True.
**kwargs – additional arguments to
BatchVisualizationBuilder
.
Note
The
self.requested_tasks
controls the task labels returned by getitem, and hence what can be visualized here.Note
In the future, the returned
HeterogeneousBatchItem
will control how it is drawn, removing this responsibility from the dataset itself.Example
>>> # Basic Data Sampling with lots of small objects >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import geowatch >>> anchors = np.array([[0.1, 0.1]]) >>> size = (96, 96) >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes1', num_frames=4, num_tracks=40, anchors=anchors, image_size=size) >>> self = KWCocoVideoDataset(coco_dset, time_dims=4, window_dims=size, default_class_behavior='ignore') >>> self._notify_about_tasks(predictable_classes=['star', 'eff']) >>> self.requested_tasks['change'] = False >>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][0]] >>> item = self[index] >>> canvas = self.draw_item(item, draw_weights=False) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> label_to_color = { >>> node: data['color'] >>> for node, data in self.predictable_classes.graph.nodes.items()} >>> label_to_color = ub.sorted_keys(label_to_color) >>> legend_img = kwplot.make_legend_img(label_to_color) >>> legend_img = kwimage.imresize(legend_img, scale=4.0) >>> show_canvas = kwimage.stack_images([canvas, legend_img], axis=1) >>> kwplot.imshow(show_canvas) >>> kwplot.show_if_requested()
Example
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import kwcoco >>> import kwarray >>> import rich >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2-multispectral', num_frames=5) >>> channels = 'B10|B8a|B1|B8|B11' >>> combinable_extra = [['B10', 'B8', 'B8a']] # special behavior >>> # combinable_extra = None # uncomment for raw behavior >>> mode = 'fit' >>> mode = 'test' >>> coco_dset.clear_annotations() >>> self = KWCocoVideoDataset(coco_dset, mode=mode, time_dims=5, window_dims=(530, 610), channels=channels, balance_areas=True) >>> #index = len(self) // 4 >>> #index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][5]] >>> index = self.sample_grid['targets'][0] >>> # More controlled settings for debug >>> self.disable_augmenter = True >>> item = self[index] >>> item_output = self._build_demo_outputs(item) >>> rich.print('item summary: ' + ub.urepr(self.summarize_item(item), nl=3)) >>> canvas = self.draw_item(item, item_output, combinable_extra=combinable_extra, overlay_on_image=1) >>> canvas2 = self.draw_item(item, item_output, combinable_extra=combinable_extra, max_channels=3, overlay_on_image=0) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas, fnum=1, pnum=(1, 2, 1)) >>> kwplot.imshow(canvas2, fnum=1, pnum=(1, 2, 2)) >>> kwplot.show_if_requested()
- summarize_item(item, stats=False)[source]¶
Return debugging stats about the item
- Parameters:
item (dict) – an item returned by __getitem__
stats (bool) – if True, include statistics on input data.
- Returns:
a summary of the item
- Return type:
Example
>>> # xdoctest: +SKIP >>> from geowatch.tasks.fusion.datamodules import kwcoco_dataset >>> import kwcoco >>> import geowatch >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes1', num_frames=10) >>> self = kwcoco_dataset.KWCocoVideoDataset( >>> coco_dset, time_dims=4, window_dims=(300, 300), >>> channels='r|g|b') >>> self.disable_augmenter = True >>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][0]] >>> item = self[index] >>> item_summary = self.summarize_item(item, stats=True) >>> print(f'item_summary = {ub.urepr(item_summary, nl=-1)}')
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.BalanceMixin[source]¶
Bases:
object
Helpers to build the sample grid and balance it
CommandLine
LINE_PROFILE=1 xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset BalanceMixin:1 --bench
Example
>>> # Test the legacy neg_to_pos_ratio setting (todo: use more general balance_options) >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import KWCocoVideoDataset >>> import ndsampler >>> import geowatch >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2', num_frames=10, rng=0) >>> sampler = ndsampler.CocoSampler(coco_dset) >>> num_samples = 50 >>> neg_to_pos_ratio = 0 >>> self = KWCocoVideoDataset(sampler, mode="fit", time_dims=4, window_dims=(300, 300), >>> channels='r|g|b', neg_to_pos_ratio=neg_to_pos_ratio) >>> self.reseed(0) >>> num_targets = len(self.sample_grid['targets']) >>> positives_indexes = self.sample_grid['positives_indexes'] >>> negatives_indexes = self.sample_grid['negatives_indexes'] >>> print('dataset positive ratio:', len(positives_indexes) / num_targets) >>> print('dataset negative ratio:', len(negatives_indexes) / num_targets) >>> print('specified neg_to_pos_ratio:', neg_to_pos_ratio) >>> sampled_indexes = [self[x]['resolved_index'] for x in range(num_samples)] >>> num_positives = sum([x in positives_indexes for x in sampled_indexes]) >>> num_negatives = num_samples - num_positives >>> print('sampled positive ratio:', num_positives / num_samples) >>> print('sampled negative ratio:', num_negatives / num_samples) >>> assert all([x in positives_indexes for x in sampled_indexes]) >>> assert num_negatives == 0 >>> assert num_positives > num_negatives >>> #... >>> neg_to_pos_ratio = .1 >>> self = KWCocoVideoDataset(sampler, time_dims=4, window_dims=(300, 300), >>> channels='r|g|b', neg_to_pos_ratio=neg_to_pos_ratio) >>> self.reseed(0) >>> num_targets = len(self.sample_grid['targets']) >>> positives_indexes = self.sample_grid['positives_indexes'] >>> negatives_indexes = self.sample_grid['negatives_indexes'] >>> print('dataset positive ratio:', len(positives_indexes) / num_targets) >>> print('dataset negative ratio:', len(negatives_indexes) / num_targets) >>> print('specified neg_to_pos_ratio:', neg_to_pos_ratio) >>> sampled_indexes = [self[x]['resolved_index'] for x in range(num_samples)] >>> num_positives = sum([x in positives_indexes for x in sampled_indexes]) >>> num_negatives = num_samples - num_positives >>> print('sampled positive ratio:', num_positives / num_samples) >>> print('sampled negative ratio:', num_negatives / num_samples) >>> assert num_negatives > 0 >>> assert num_positives > num_negatives
Example
>>> # xdoctest: +REQUIRES(--bench) >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import KWCocoVideoDataset >>> import ndsampler >>> import geowatch >>> import kwcoco >>> coco_fpath = '/media/joncrall/flash1/smart_phase3_data/Drop8-Cropped2GSD-V1/data_vali_rawbands_split6_n004_f9b08cce.kwcoco.zip' >>> coco_fpath = '/media/joncrall/flash1/smart_drop7/Drop7-Cropped2GSD-V2/data_vali_rawbands_split6.kwcoco.zip' >>> coco_dset = kwcoco.CocoDataset(coco_fpath) >>> self = KWCocoVideoDataset(coco_dset, mode="fit", time_dims=4, window_dims=(300, 300), >>> channels='red|green|blue', neg_to_pos_ratio=1.0)
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.PreprocessMixin[source]¶
Bases:
object
Methods related to dataset preprocessing
- cached_dataset_stats(num=None, num_workers=0, batch_size=2, with_intensity=True, with_class=True)[source]¶
Compute the normalization stats, and caches them
Todo
[ ] Does this dataset have access to the workdir?
- [ ] Cacher needs to depend on any part of the config of this
dataset that could impact the pixel intensity distribution.
- compute_dataset_stats(num=None, num_workers=0, batch_size=2, with_intensity=True, with_class=True, with_vidid=True)[source]¶
- Parameters:
num (int | None) – number of input items to compute stats for
CommandLine
xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset KWCocoVideoDataset.compute_dataset_stats:2
Example
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import kwcoco >>> dct_dset = coco_dset = kwcoco.CocoDataset.demo('vidshapes2-multispectral', num_frames=3) >>> self = KWCocoVideoDataset(dct_dset, time_dims=2, window_dims=(256, 256), channels='auto') >>> self.compute_dataset_stats(num_workers=2)
Example
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import kwcoco >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2') >>> self = KWCocoVideoDataset(coco_dset, time_dims=2, window_dims=(256, 256), channels='auto') >>> stats = self.compute_dataset_stats() >>> assert stats['class_freq']['star'] > 0 or stats['class_freq']['superstar'] > 0 or stats['class_freq']['eff'] > 0 >>> #assert stats['class_freq']['background'] > 0
Example
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import geowatch >>> from geowatch.tasks.fusion import datamodules >>> num = 1 >>> datamodule = datamodules.KWCocoVideoDataModule( >>> train_dataset='vidshapes-geowatch', window_dims=64, time_steps=3, >>> num_workers=0, batch_size=3, channels='auto', >>> normalize_inputs=num) >>> datamodule.setup('fit') >>> self = datamodule.torch_datasets['train'] >>> coco_dset = self.sampler.dset >>> print({c.get('sensor_coarse') for c in coco_dset.images().coco_images}) >>> print({c.channels.spec for c in coco_dset.images().coco_images}) >>> num_workers = 0 >>> batch_size = 6 >>> s = (self.compute_dataset_stats(num=num)) >>> print('s = {}'.format(ub.urepr(s, nl=3))) >>> stats1 = self.compute_dataset_stats(num=num, with_intensity=False) >>> stats2 = self.compute_dataset_stats(num=num, with_class=False) >>> stats3 = self.compute_dataset_stats(num=num, with_class=False, with_intensity=False)
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.MiscMixin[source]¶
Bases:
object
TODO: better groups
- reseed(rng='auto')[source]¶
Reinitialize the random number generator
Todo
HELP WANTED: Lack of determenism likely comes from this module and the order it gives data to predict. It would be very nice if we could fix that.
- property coco_dset¶
- make_loader(subset=None, batch_size=1, num_workers=0, shuffle=False, pin_memory=False, collate_fn='identity')[source]¶
Use this to make the dataloader so we ensure that we have the right worker init function.
- Parameters:
subset (None | Dataset) – if specified, the loader is made for this dataset instead of
self
.collate_fn (callable | str) – Can be ‘identity’ or ‘stack’ or a callable. The normal torch default is ‘stack’, but for heterogeneous batch item support, we defaults to ‘identity’.
Example
>>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import kwcoco >>> coco_dset = kwcoco.CocoDataset.demo('vidshapes2-multispectral', num_frames=5) >>> self = KWCocoVideoDataset(coco_dset, time_dims=3, window_dims=(530, 610), channels='auto') >>> loader = self.make_loader(batch_size=2) >>> batch = next(iter(loader))
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.BackwardCompatMixin[source]¶
Bases:
object
Backwards compatibility for modified properties. (These may eventually be deprecated).
- property new_sample_grid¶
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.KWCocoVideoDataset(sampler, mode='fit', test_with_annot_info=False, autobuild=True, **kwargs)[source]¶
Bases:
Dataset
,GetItemMixin
,BalanceMixin
,PreprocessMixin
,IntrospectMixin
,MiscMixin
,SpacetimeAugmentMixin
,BackwardCompatMixin
,SMARTDataMixin
Accepted keyword arguments are specified in
KWCocoVideoDatasetConfig
Example
>>> # Native Data Sampling >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import ndsampler >>> import kwcoco >>> import geowatch >>> coco_dset = geowatch.coerce_kwcoco('geowatch-multisensor-msi', geodata=True) >>> print({c.get('sensor_coarse') for c in coco_dset.images().coco_images}) >>> print({c.channels.spec for c in coco_dset.images().coco_images}) >>> sampler = ndsampler.CocoSampler(coco_dset) >>> self = KWCocoVideoDataset(sampler, time_dims=4, window_dims=(100, 200), >>> input_space_scale='native', >>> window_space_scale='0.05GSD', >>> output_space_scale='native', >>> channels='auto', >>> ) >>> self.disable_augmenter = True >>> target = self.sample_grid['targets'][self.sample_grid['positives_indexes'][3]] >>> item = self[target] >>> canvas = self.draw_item(item, overlay_on_image=0, rescale=0, max_channels=3) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas) >>> kwplot.show_if_requested()
Example
>>> # Target GSD Data Sampling >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import ndsampler >>> import kwcoco >>> import geowatch >>> coco_dset = geowatch.coerce_kwcoco('geowatch', geodata=True) >>> print({c.get('sensor_coarse') for c in coco_dset.images().coco_images}) >>> print({c.channels.spec for c in coco_dset.images().coco_images}) >>> sampler = ndsampler.CocoSampler(coco_dset) >>> self = KWCocoVideoDataset(sampler, window_dims=(100, 100), time_dims=5, >>> input_space_scale='0.35GSD', >>> window_space_scale='0.7GSD', >>> output_space_scale='0.2GSD', >>> channels='auto', >>> ) >>> self.disable_augmenter = True >>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][3]] >>> Box = kwimage.Box >>> index['space_slice'] = Box.from_slice(index['space_slice']).translate((30, 0)).quantize().to_slice() >>> item = self[index] >>> #print('item summary: ' + ub.urepr(self.summarize_item(item), nl=3)) >>> canvas = self.draw_item(item, overlay_on_image=1, rescale=0, max_channels=3) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas) >>> kwplot.show_if_requested()
- Parameters:
sampler (kwcoco.CocoDataset | ndsampler.CocoSampler) – kwcoco dataset
mode (str) – fit or predict
autobuild (bool) – if False, defer potentially expensive initialization. In this case the user must call
._init()
**kwargs – see
KWCocoVideoDatasetConfig
for valid options these options will be stored in the.config
attribute.
- geowatch.tasks.fusion.datamodules.kwcoco_dataset.more_demos()[source]¶
CommandLine
USE_RTREE=1 DVC_DPATH=1 XDEV_PROFILE=1 xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset more_demos:0 USE_RTREE=0 DVC_DPATH=1 XDEV_PROFILE=1 xdoctest -m geowatch.tasks.fusion.datamodules.kwcoco_dataset more_demos:0
Example
>>> # xdoctest: +REQUIRES(env:DVC_DPATH) >>> # Demo with real data >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import geowatch >>> import kwcoco >>> dvc_dpath = geowatch.find_dvc_dpath(tags='phase2_data', hardware='auto') >>> coco_fpath = dvc_dpath / 'Drop6/data_vali_split1.kwcoco.zip' >>> coco_dset = kwcoco.CocoDataset(coco_fpath) >>> ##'red|green|blue', >>> self = KWCocoVideoDataset( >>> coco_dset, >>> time_dims=7, window_dims=(196, 196), >>> window_overlap=0, >>> channels="(S2,L8):blue|green|red|nir", >>> input_space_scale='3.3GSD', >>> window_space_scale='3.3GSD', >>> output_space_scale='1GSD', >>> prenormalize_inputs=True, >>> #normalize_peritem='nir', >>> dist_weights=0, >>> quality_threshold=0, >>> neg_to_pos_ratio=0, time_sampling='soft2', >>> ) >>> self.requested_tasks['change'] = 1 >>> self.requested_tasks['saliency'] = 1 >>> self.requested_tasks['class'] = 0 >>> self.requested_tasks['boxes'] = 1 >>> index = self.sample_grid['targets'][self.sample_grid['positives_indexes'][3]] >>> index['allow_augment'] = False >>> item = self[index] >>> target = item['target'] >>> #for idx in range(100): ... # self[idx] >>> print('item summary: ' + ub.urepr(self.summarize_item(item), nl=3)) >>> # xdoctest: +REQUIRES(--show) >>> canvas = self.draw_item(item, max_channels=10, overlay_on_image=0, rescale=1) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas, fnum=1) >>> kwplot.show_if_requested()
Example
>>> # xdoctest: +REQUIRES(env:DVC_DPATH) >>> # This shows how you can use the dataloader to sample an arbitrary >>> # spacetime volume. >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import geowatch >>> import kwcoco >>> dvc_dpath = geowatch.find_dvc_dpath(tags='phase2_data', hardware='auto') >>> #coco_fpath = dvc_dpath / 'Drop4-BAS/data_vali.kwcoco.json' >>> coco_fpath = dvc_dpath / 'Drop6/data_vali_split1.kwcoco.zip' >>> coco_dset = kwcoco.CocoDataset(coco_fpath) >>> ##'red|green|blue', >>> self = KWCocoVideoDataset( >>> coco_dset, >>> time_dims=7, window_dims=(196, 196), >>> window_overlap=0, >>> channels="(S2,L8):blue|green|red|nir", >>> input_space_scale='3.3GSD', >>> window_space_scale='3.3GSD', >>> output_space_scale='1GSD', >>> #normalize_peritem='nir', >>> dist_weights=0, >>> quality_threshold=0, >>> neg_to_pos_ratio=0, time_sampling='soft2', >>> ) >>> self.requested_tasks['change'] = 1 >>> self.requested_tasks['saliency'] = 1 >>> self.requested_tasks['class'] = 0 >>> self.requested_tasks['boxes'] = 1 >>> target = { >>> 'video_id': 3, >>> 'gids': [529, 555, 607, 697, 719, 730, 768], >>> 'main_idx': 3, >>> 'space_slice': (slice(0, 65, None), slice(130, 195, None)), >>> } >>> item = self[target]
Example
>>> # xdoctest: +REQUIRES(env:DVC_DPATH) >>> # Tests the hard negative sampling >>> from geowatch.tasks.fusion.datamodules.kwcoco_dataset import * # NOQA >>> import geowatch >>> import kwcoco >>> dvc_dpath = geowatch.find_dvc_dpath(tags='phase2_data', hardware='auto') >>> coco_fpath = dvc_dpath / 'Drop6-MeanYear10GSD/data.kwcoco.zip' >>> coco_dset = kwcoco.CocoDataset(coco_fpath) >>> ##'red|green|blue', >>> self = KWCocoVideoDataset( >>> coco_dset, >>> time_dims=5, window_dims=(196, 196), >>> window_overlap=0, >>> channels="(S2,L8):blue|green|red", >>> fixed_resolution='10GSD', >>> normalize_peritem=True, >>> use_grid_negatives='cleared', >>> use_grid_positives=False, >>> use_centered_positives= True, >>> time_kernel='(-2y,-1y,0,1y,2y)', >>> ) >>> self.requested_tasks['change'] = 1 >>> self.requested_tasks['saliency'] = 1 >>> self.requested_tasks['class'] = 0 >>> self.requested_tasks['boxes'] = 1
>>> # Check that all of the negative regions are from cleared videos >>> videos = self.sampler.dset.videos() >>> vidid_to_cleared = ub.udict(ub.dzip(videos.lookup('id'), videos.lookup('cleared', False))) >>> assert self.config['use_grid_negatives'] == 'cleared' >>> positive_idxs = self.sample_grid['positives_indexes'] >>> negative_idxs = self.sample_grid['negatives_indexes'] >>> targets = self.sample_grid['targets'] >>> negative_video_ids = {targets[x]['video_id'] for x in negative_idxs} >>> positive_video_ids = {targets[x]['video_id'] for x in positive_idxs} >>> assert all(vidid_to_cleared.subdict(negative_video_ids).values())
>>> index = 0 >>> item = self[index] >>> target = item['target'] >>> print('item summary: ' + ub.urepr(self.summarize_item(item), nl=3)) >>> # xdoctest: +REQUIRES(--show) >>> canvas = self.draw_item(item, max_channels=10, overlay_on_image=0, rescale=1) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas, fnum=1) >>> kwplot.show_if_requested()
- exception geowatch.tasks.fusion.datamodules.kwcoco_dataset.FailedSample[source]¶
Bases:
Exception
Used to indicate that a sample should be skipped.
- class geowatch.tasks.fusion.datamodules.kwcoco_dataset.Modality(sensor: str, channels: str, domain: str)[source]¶
Bases:
NamedTuple
A modality consists of a domain, a sensor, and a FusedChannelSpec
Create new instance of Modality(sensor, channels, domain)