Roadmap¶
Features¶
[ ] Per-domain normalization: keep track of mean/std per “domain”, which can be user-defined.
[ ] Lineage tracking: design the storage format and parser.
[X] Bounding Box Head: Add support for bounding box heads and associated loss functions. Issue: !9
[ ] Multi-head objectives: Allow multiple non-weight-tied versions of the same head to have different losses. Issue: !10
[X] Balanced Sampling: Better balanced API and code structure.
[ ] Augmentation: Better augmentation API with more options (time symmetry).
[ ] Online Hard Negative Mining: Track per-item loss and update sampling probabilities to target some degree of medium to hard difficulty.
[ ] OTS Models: Off-the-shelf Model wrapper API.
[ ] Distilation: Easy distilation (i.e. student / teacher networks) by settting heatmaps from existing predictions as truth targets. Issue: !17
[X] SMQTK: Integrate with SMQTK by providing it feature descriptors derived from our network activations / heatmaps Issue: !14
[ ] Train Monitoring: Log more weight statistics like rank, magnitude, etc. in tensorboard to better understand the training process.
Quality of Life¶
[ ] Manual specification of input mean / std at train time.
[X] Manual specification of input mean / std at predict time.
[ ] Better checkpoint / package management CLI tools
[ ] Remove old nomenclature (which may involve swapping scriptconfig aliases with the main variable).
[ ] Refactor tracking API. It’s the odd-duck, otherwise everything else follows very similar patterns.
Bugs¶
[X] Delayed Image #1 - bottom-left pixel bug
[ ] Callbacks with DDP can cause system freeze; we can workaround by disabling our callbacks, but results in other limitations.
Performance¶
[ ] Delayed Image #2 - memoize the optimization
[ ] JIT The Network - Or otherwise build efficient inference structure
[ ] Improve augmentation efficiency - Dataloaders can be bottlenecks depending on params
[ ] NDsampler Zarr / HDF5 backend - Zarr is newer, HDF5 works similarly.
[ ] On-disk stitching. Allow predictions to be stitched into context directly on disk (perhaps using an Zarr/HDF5 continer?) instead of always in memory (keep the in-memory option though).
Research¶
[ ] Design experiment to determine if continual learning helps in this context.
[ ] Design experiment to compare heterogeneous network to divided-attention network.
[ ] Reproduce and integrate ScaleMAE.
[ ] Can we find a better way to use SAM as foundational model feature?
[ ] Support “soft” targets for instance segmentation loss.
[ ] Build new KWCoco datasets
[ ] QFabric
[ ] Black Marble
[X] Support for point-based annotations at train time. Build a loss function.
Compatibility¶
[ ] Further subdivide and sequester software dependencies (open ended).
[X] Upgrade pytorch lightning / jsonargparse to latest versions.
Documentation¶
[ ] External review / revision.
[ ] Document how to effectively use MLOps (and potentially improve on).
- Partial Progress:
MLOps used across more projects, which serve as independent examples of its capabilities
System Design¶
[ ] Use the MLops directory structure in smartflow. This will ultimately allow us to gain the caching advantages of mlops with the horiztonal scaling of smartflow.
[ ] Ensure smartflow output can be connected to mlops aggregate.
[ ] Extend mlops to make it easier to test and evaluate ensembles.
[ ] Extend mlops with teamfeats nodes.
[ ] Smartflow tiling to split up regions, run prediction on smaller regions, and then consolidate stitching.
[ ] Better support for training on AWS / HPC systems: https://www.reddit.com/r/MachineLearning/comments/18mfi70/p_kubernetes_plugin_for_mounting_datasets_to/
Algorithmic Exploration¶
[ ] Improve High Resolution “Tracking” (Polygon Extraction / Classification).
[ ] Measure uncertainty.
[ ] Recurrent transformers that can look at previous predictions in a different context, and then update the predictions.
[ ] Add decoder to predict unobserved events.
User Interface¶
[ ] Lightning Extension that replaces the rich progress bar with a textual TUI, the idea is the engineer can manually tweak hyperparameters, or request status / visualizations on the fly.