geowatch.tasks.metrics.merge_iarpa_metrics module¶

Code to consolidate and merge IARPA results across regions.

class geowatch.tasks.metrics.merge_iarpa_metrics.RegionResult(region_id: str, region_model: Dict, site_models: List[Dict], bas_dpath: ubelt.util_path.Path | None = None, sc_dpath: ubelt.util_path.Path | None = None, unbounded_site_status: Literal['completed', 'partial', 'overall'] | None = None)[source]¶

Bases: object

region_id: str¶

region_model: Dict¶

site_models: List[Dict]¶

bas_dpath: Path | None = None¶

sc_dpath: Path | None = None¶

unbounded_site_status: Literal['completed', 'partial', 'overall'] | None = None¶

classmethod from_dpath_and_anns_root(region_dpath, true_site_dpath, true_region_dpath, unbounded_site_status='overall')[source]¶

property bas_df¶

index:: region_id, rho, tau
columns:: same as merge_bas_metrics_results

property site_ids: List[str]¶

There are a few possible sets of sites it would make sense to return here. - all gt sites - “eligible” gt sites that could be matched against, ie with status == “predicted*”. This depends on temporal_unbounded handling choice of completed, partial, or overall. - “matched” gt sites with at least 1 observation matched to at least 1 observation in a proposed site.

Currently we are returning “matched” for consistency with the metrics framework, but we should consider trying “eligible” to decouple BAS and SC metrics; i.e. it would no longer be possible to do worse on SC by doing better on BAS.

property sc_df¶

Notes

index:: region_id, site_id, [predicted] phase (w/o No Activity) incl. special site_id __avg__

F1: micro (or option for macro) TIoU: ~micro over all truth-prediction pairs, skipping

undetected truth sites

TE(p): micro confusion: micro
columns:: F1, TIoU, TE, TEp, [true] phase (incl. No Activity)

confusion matrix and f1 scores apprently ignore subsites, so we must do the same https://smartgitlab.com/TE/metrics-and-test-framework/-/issues/24

Example

>>> from sklearn.metrics import f1_score, confusion_matrix
>>> f1 = f1_score(['a,a', 'a'], ['a,a', 'b'], labels=['a', 'b'],
>>>               average=None)
>>> confusion_matrix(['a,a', 'a'], ['a,a', 'b'], labels=['a', 'b'])
array([[0, 1],
       [0, 0]])

property sc_te_df¶

More detailed temporal error results; main value is included in sc_df.

Notes

index:: region_id, (site | __micro__), (ac | ap), phase
columns:: mean days (all detections) <– main value std days (all) mean days (early detections) std days (early) mean days (late detections) std days (late) all detections early late perfect missing proposals missing truth sites

property sc_phasetable¶

Currently used only for Gantt chart viz. Could be used to recalculate all SC metrics for micro-average.

This excludes gt sites with no matched proposals and proposals with no matched gt sites.

geowatch.tasks.metrics.merge_iarpa_metrics.merge_bas_metrics_results(bas_results: List[RegionResult], fbetas: List[float])[source]¶

Merge BAS results and return as a pd.DataFrame

with MultiIndex([region_id’, ‘rho’, ‘tau’]) incl. special region_ids __micro__, __macro__

and columns

min_area                  int64
tp sites                  int64
tp exact                  int64
tp under                  int64
tp under (IoU)            int64
tp under (IoT)            int64
tp over                   int64
fp sites                  int64
fp area                 float64
ffpa                    float64
proposal area           float64
fpa                     float64
fn sites                  int64
truth annotations         int64
truth sites               int64
proposed annotations      int64
proposed sites            int64
total sites               int64
truth slices              int64
proposed slices           int64
precision               float64
recall (PD)             float64
F1                      float64
spatial FAR             float64
temporal FAR            float64
images FAR              float64

geowatch.tasks.metrics.merge_iarpa_metrics.merge_sc_metrics_results(sc_results: List[RegionResult])[source]¶

Merge SC results and return as a pd.DataFrame

with MultiIndex([‘region_id’, ‘phase’]) incl. special region_ids __micro__: micro-avg over regions (normalize by n_sites per region) __macro__: macro-avg over regions In neither case do we weight by the length/size of individual sites.

and columns:: F1 float64 TIoU float64 TE float64 TEp float64 No Activity int64 Site Preparation int64 Active Construction int64 Post Construction int64

Notes

For confusion matrix, rows are pred and cols are true.
Confusion matrix is never normalized, so macro == micro.
F1 is only defined for SP and AC.
TEp is temporal error of next predicted phase
merged TE(p) is RMSE, so nonnegative, but regions’ TE(p) can be
negative.
TE is temporal error of current phase
TEp is temporal error of next predicted phase

geowatch.tasks.metrics.merge_iarpa_metrics.merge_metrics_results(region_dpaths, true_site_dpath, true_region_dpath, fbetas)[source]¶

Merge metrics results from multiple regions.

Parameters:

region_dpaths – List of directories containing the subdirs bas/ phase_activity/ [optional]
true_site_dpath, true_region_dpath – Path to GT annotations repo
merge_dpath – Directory to save merged results.

Returns:

(bas_df, sc_df) Two pd.DataFrames that are saved as

{out_dpath}/(bas|sc)_df.pkl

geowatch.tasks.metrics.merge_iarpa_metrics.iarpa_bas_color_legend()[source]¶