geowatch.tasks.metrics.merge_iarpa_metrics module¶
Code to consolidate and merge IARPA results across regions.
- class geowatch.tasks.metrics.merge_iarpa_metrics.RegionResult(region_id: str, region_model: Dict, site_models: List[Dict], bas_dpath: ubelt.util_path.Path | None = None, sc_dpath: ubelt.util_path.Path | None = None, unbounded_site_status: Literal['completed', 'partial', 'overall'] | None = None)[source]¶
Bases:
object
- classmethod from_dpath_and_anns_root(region_dpath, true_site_dpath, true_region_dpath, unbounded_site_status='overall')[source]¶
- property bas_df¶
- index:
region_id, rho, tau
- columns:
same as merge_bas_metrics_results
- property site_ids: List[str]¶
There are a few possible sets of sites it would make sense to return here. - all gt sites - “eligible” gt sites that could be matched against, ie with status == “predicted*”. This depends on temporal_unbounded handling choice of completed, partial, or overall. - “matched” gt sites with at least 1 observation matched to at least 1 observation in a proposed site.
Currently we are returning “matched” for consistency with the metrics framework, but we should consider trying “eligible” to decouple BAS and SC metrics; i.e. it would no longer be possible to do worse on SC by doing better on BAS.
- property sc_df¶
Notes
- index:
region_id, site_id, [predicted] phase (w/o No Activity) incl. special site_id __avg__
F1: micro (or option for macro) TIoU: ~micro over all truth-prediction pairs, skipping
undetected truth sites
TE(p): micro confusion: micro
- columns:
F1, TIoU, TE, TEp, [true] phase (incl. No Activity)
confusion matrix and f1 scores apprently ignore subsites, so we must do the same https://smartgitlab.com/TE/metrics-and-test-framework/-/issues/24
Example
>>> from sklearn.metrics import f1_score, confusion_matrix >>> f1 = f1_score(['a,a', 'a'], ['a,a', 'b'], labels=['a', 'b'], >>> average=None) >>> confusion_matrix(['a,a', 'a'], ['a,a', 'b'], labels=['a', 'b']) array([[0, 1], [0, 0]])
- property sc_te_df¶
More detailed temporal error results; main value is included in sc_df.
Notes
- index:
region_id, (site | __micro__), (ac | ap), phase
- columns:
mean days (all detections) <– main value std days (all) mean days (early detections) std days (early) mean days (late detections) std days (late) all detections early late perfect missing proposals missing truth sites
- property sc_phasetable¶
Currently used only for Gantt chart viz. Could be used to recalculate all SC metrics for micro-average.
This excludes gt sites with no matched proposals and proposals with no matched gt sites.
- geowatch.tasks.metrics.merge_iarpa_metrics.merge_bas_metrics_results(bas_results: List[RegionResult], fbetas: List[float])[source]¶
Merge BAS results and return as a pd.DataFrame
with MultiIndex([region_id’, ‘rho’, ‘tau’]) incl. special region_ids __micro__, __macro__
and columns
min_area int64 tp sites int64 tp exact int64 tp under int64 tp under (IoU) int64 tp under (IoT) int64 tp over int64 fp sites int64 fp area float64 ffpa float64 proposal area float64 fpa float64 fn sites int64 truth annotations int64 truth sites int64 proposed annotations int64 proposed sites int64 total sites int64 truth slices int64 proposed slices int64 precision float64 recall (PD) float64 F1 float64 spatial FAR float64 temporal FAR float64 images FAR float64
- geowatch.tasks.metrics.merge_iarpa_metrics.merge_sc_metrics_results(sc_results: List[RegionResult])[source]¶
Merge SC results and return as a pd.DataFrame
with MultiIndex([‘region_id’, ‘phase’]) incl. special region_ids __micro__: micro-avg over regions (normalize by n_sites per region) __macro__: macro-avg over regions In neither case do we weight by the length/size of individual sites.
- and columns:
F1 float64 TIoU float64 TE float64 TEp float64 No Activity int64 Site Preparation int64 Active Construction int64 Post Construction int64
Notes
For confusion matrix, rows are pred and cols are true.
Confusion matrix is never normalized, so macro == micro.
F1 is only defined for SP and AC.
TEp is temporal error of next predicted phase
- merged TE(p) is RMSE, so nonnegative, but regions’ TE(p) can be
negative.
TE is temporal error of current phase
TEp is temporal error of next predicted phase
- geowatch.tasks.metrics.merge_iarpa_metrics.merge_metrics_results(region_dpaths, true_site_dpath, true_region_dpath, fbetas)[source]¶
Merge metrics results from multiple regions.
- Parameters:
region_dpaths – List of directories containing the subdirs bas/ phase_activity/ [optional]
true_site_dpath, true_region_dpath – Path to GT annotations repo
merge_dpath – Directory to save merged results.
- Returns:
(bas_df, sc_df) Two pd.DataFrames that are saved as
{out_dpath}/(bas|sc)_df.pkl