geowatch.mlops.aggregate module

Loads results from an evaluation, aggregates them, and reports text or visual results.

This is the main entry point for the mlops.aggregate CLI. It contains the logic to consolidate rows of results into macro averages and compute a parameter hashid (param_hashid) for each row. It also contains the basic text report logic (although maybe that should be moved out?). It relies on several other files in this directory

  • aggregate_loader.py - handles the loading of individual rows from mlops output

  • aggregate_plots.py - handles plotting relationships between parameters and metrics

  • smart_global_helper.py - quick and dirty project specific stuff that ideally wont

    get in the way of general use-cases but should eventually be factored out.

Todo

  • [ ] The package_fpath (i.e. model_cols) reporting does heuristics to

    shorten the path to the package, but we shouldn’t do this. We should make a new column that indicates it is a shortened name for the model, otherwise it is confusing.

class geowatch.mlops.aggregate.AggregateLoader(*args, **kwargs)[source]

Bases: DataConfig

Base config that will be mixed in to the AggregateEvluationConfig. This config just defines parts related to constructing the Aggregator objects (i.e. loading the tables).

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

coerce_aggregators()[source]
default = {'cache_resolved_results': <Value(True)>, 'display_metric_cols': <Value('auto')>, 'eval_nodes': <Value(None)>, 'io_workers': <Value('avail')>, 'pipeline': <Value('joint_bas_sc')>, 'primary_metric_cols': <Value('auto')>, 'target': <Value(None)>}
normalize()
class geowatch.mlops.aggregate.AggregateEvluationConfig(*args, **kwargs)[source]

Bases: AggregateLoader

Aggregates results from multiple DAG evaluations.

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

coerce_aggregators()[source]
default = {'cache_resolved_results': <Value(True)>, 'custom_query': <Value(None)>, 'display_metric_cols': <Value('auto')>, 'embed': <Value(False)>, 'eval_nodes': <Value(None)>, 'export_tables': <Value(False)>, 'inspect': <Value(None)>, 'io_workers': <Value('avail')>, 'output_dpath': <Value('./aggregate')>, 'pipeline': <Value('joint_bas_sc')>, 'plot_params': <Value(False)>, 'primary_metric_cols': <Value('auto')>, 'query': <Value(None)>, 'resource_report': <Value(False)>, 'rois': <Value('auto')>, 'snapshot': <Value(False)>, 'stdout_report': <Value(True)>, 'symlink_results': <Value(False)>, 'target': <Value(None)>}
main(**kwargs)

Aggregate entry point.

Loads results for each evaluation node_type, constructs aggregator objects, and then executes user specified commands that could include filtering, macro-averaging, reporting, plotting, etc…

normalize()
geowatch.mlops.aggregate.main(cmdline=True, **kwargs)[source]

Aggregate entry point.

Loads results for each evaluation node_type, constructs aggregator objects, and then executes user specified commands that could include filtering, macro-averaging, reporting, plotting, etc…

geowatch.mlops.aggregate.run_aggregate(config)[source]
class geowatch.mlops.aggregate.TopResultsReport(region_id_to_summary, top_param_lut)[source]

Bases: object

Object to hold the result of Aggregator.report_best().

class geowatch.mlops.aggregate.AggregatorAnalysisMixin[source]

Bases: object

Analysis methods for Aggregator.

macro_analysis()[source]
varied_param_counts(min_variations=2, dropna=False)[source]
dump_varied_parameter_report()[source]

Write the varied parameter report to disk

varied_parameter_report(concise=True, concise_value_char_threshold=80)[source]

Dump a machine and human readable varied parameter report.

Parameters:

concise (bool) – if True, sacrifice row homogeneity for shorter encodings

analyze(metrics_of_interest=None)[source]

Does a stats analysis on each varied parameter. Note this makes independence assumptions that may not hold in general.

report_best(top_k=100, shorten=True, per_group=None, verbose=1, reference_region=None, print_models=False, concise=False, show_csv=False, grouptop=None) TopResultsReport[source]

Report the top k pointwise results for each region / macro-region.

Note

Results are chosen per-region independently. To get comparable results for a specific set of parameters choose a reference_region, which could be a macro region.

Parameters:
  • top_k (int) – number of top results for each region

  • shorten (bool) – if True, shorten the columns by removing non-ambiguous prefixes wrt to a known node eval_type.

  • concise (bool) – if True, remove certain columns that communicate context for a more concise report.

  • reference_region (str | None) – if specified filter the top results in all other regions to only be with respect to the top results in this region (or macro region). Can be set to the special key “final” to choose the last region, which is typically a macro region.

  • show_csv (bool) – also print as a CSV suitable for copy/paste into google sheets.

  • grouptop (str | List[str]) – if specified, these are a list of columns that a “suboptimized”, which means that we group the table by these columns (e.g. the model column) and then only consider the “best” scoring results within these groups. This can help remove clutter if attempting to choose between a specific parameter.

Todo

This might need to become a class that builds the TopResultsReport as it is getting somewhat complex.

Returns:

contains: region_id_to_summary (T1=Dict[str, DataFrame]):

mapping from region_id to top k results

top_param_lut (T2=Dict[str, DataFrame]):

mapping from param hash to invocation details

Return type:

TopResultsReport

Example

>>> from geowatch.mlops.aggregate import *  # NOQA
>>> agg = Aggregator.demo(rng=0, num=100).build()
>>> agg.report_best(print_models=True, top_k=3)
>>> agg.report_best(print_models=True, top_k=3, grouptop='special:model')
>>> agg.report_best(print_models=True, top_k=3, grouptop='special:model', reference_region='region1')
resource_summary_table()[source]

Sumarize resource usage of the pipeline

resource_summary_table_friendly()[source]
report_resources()[source]
make_summary_analysis(config)[source]

Builds symlinks to results node paths based on region and param hashids.

build_plotter(rois=None, plot_config=None)[source]
plot_all(rois=None, plot_config=None)[source]
class geowatch.mlops.aggregate.Aggregator(table, output_dpath=None, node_type=None, primary_metric_cols='auto', display_metric_cols='auto', dag=None)[source]

Bases: NiceRepr, AggregatorAnalysisMixin, _AggregatorDeprecatedMixin

Stores multiple data frames that separate metrics, parameters, and other information using consistent pandas indexing. Can be filtered to a comparable subsets of choice. Can also handle building macro averaged results over different “regions” with the same parameters.

Set config based on your problem

Example

>>> from geowatch.mlops.aggregate import *  # NOQA
>>> agg = Aggregator.demo(rng=0, num=3).build()
>>> print(f'agg.config = {ub.urepr(agg.config, nl=1)}')
>>> print('--- The table of only metrics ---')
>>> print(agg.metrics)
>>> print('--- The table of resource utilization ---')
>>> print(agg.resources)
>>> print('--- The table of explicitly requested hyperparameters (to distinguish from defaults) ---')
>>> print(agg.resolved_params)
>>> print('--- The table of resolved hyperparameters ---')
>>> print(agg.resolved_params)
>>> print('--- The table with unique indexes for each experiment ---')
>>> print(agg.index)
>>> print('--- The entire joined table ---')
>>> print(agg.table)
Parameters:
  • table (pandas.DataFrame) – a table with a specific column structure (e.g. built by the aggregate_loader). See the demo for an example. Needs more docs here.

  • output_dpath (None | PathLike) – Path where output aggregate results should be written

  • node_type (str | None) – should not need to specify this anymore. This should just be the “node” column in the table.

  • primary_metric_cols (List[str] | Literal[‘auto’]) – if “auto”, then the “node_type” must be known by the global helpers. Otherwise list the metric columns in the priority that should be used to rank the rows.

  • display_metric_cols (List[str] | Literal[‘auto’]) – if “auto”, then the “node_type” must be known by the global helpers. Otherwise list the metric columns in the order they should be displayed (after the primary metrics).

  • dag (geowatch.mlops.Pipeline) – The pipeline that the evaluation table corresponds to. Only needed if introspection if necessary. If all “auto” params are specified, this should not be needed.

classmethod demo(num=10, rng=None)[source]

Construct a demo aggregator for testing.

This gives an example of the very particular column format that is expected as input the the aggregator.

Parameters:
  • num (int) – number of rows

  • rng (int | None) – random number generator / state

Returns:

Aggregator

Example

>>> from geowatch.mlops.aggregate import *  # NOQA
>>> agg = Aggregator.demo(rng=0, num=100)
>>> print(agg.table)
>>> agg.build()
>>> agg.analyze()
>>> agg.resource_summary_table()
>>> agg.report_best()
build()[source]

Inspect the aggregator’s table and build supporting information

Returns:

returns self for method chaining

Return type:

Self

property primary_macro_region
filterto(index=None, models=None, param_hashids=None, query=None)[source]

Build a new aggregator with a subset of rows from this one.

Parameters:
  • index (List | pd.Index) – a subset of pandas row indexes to restrict to

  • models (List[str]) – list of effective model names (not paths) to restrict to.

  • param_hashids (List[str]) – list of parameter hashids to restrict to

  • query (str) – A custom query string currently parsed our_hack_query(), which can either be a DataFrame.query or a simple eval using df as the dataframe variable (i.e. agg.table) that should resolve to flags or indexes to indicates which rows to take. See the example for demo usage.

Returns:

A new aggregator with a subset of data

Return type:

Aggregator

Example

>>> from geowatch.mlops.aggregate import *  # NOQA
>>> agg = Aggregator.demo(rng=0, num=100)
>>> agg.build()
>>> subagg = agg.filterto(query='df["context.demo_node.uuid"].str.startswith("c")')
>>> assert len(subagg) > 0, 'query should return something'
>>> assert subagg.table['context.demo_node.uuid'].str.startswith('c').all()
>>> assert not agg.table['context.demo_node.uuid'].str.startswith('c').all()
>>> print(subagg.table['context.demo_node.uuid'])
FIXME:

On 2024-02-12 CI failed this test with. Not sure where non-determinisim came from. assert len(subagg) > 0, ‘query should return something’ AssertionError: query should return something

Another instance on 2024-04-19. Job log is: https://gitlab.kitware.com/computer-vision/geowatch/-/jobs/9652752

This is likely because of unseeded UUIDs, which should now be fixed.

compress(flags)[source]
property metrics
property resources
property index
property requested_params
property specified_params
property resolved_params
property default_vantage_points
build_effective_params()[source]

Consolodate / cleanup / expand information

THIS COMPUTES THE param_hashid COLUMN!

The “effective params” normalize the full set of given parameters so we can compute more consistent param_hashid. This is done by condensing paths (which is a debatable design decision) as well as mapping non-hashable data to strings.

Populates:

  • self.hashid_to_effective_params : Dict[str, Dict[str, Any]]

  • self.mappings

  • self.effective_params

find_macro_comparable(verbose=0)[source]

Search for groups that have the same parameters over multiple regions.

We determine if two columns have the same parameters by using the param_hashid, so the details of how that is computed (and which parameters are ignored when computing it - e.g. paths to datasets) has a big impact on the behavior of this function.

SeeAlso:
Aggregator.build_effective_params() -

the method that determines what parameters go into the param_hashid, and how to normalize them.

gather_macro_compatable_groups(regions_of_interest)[source]

Given a set of ROIs, find groups in the comparable regions that contain all of the requested ROIs.

build_macro_tables(rois=None, **kwargs)[source]

Builds one or more macro tables

build_single_macro_table(rois, average='mean')[source]

Builds a single macro table for a choice of regions.

A macro table is a table of paramters and metrics macro averaged over multiple regions of interest.

There is some hard-coded values in this function, but the core idea is general, and they just need to be parameterized correctly.

Parameters:
  • rois (List[str]) – names of regions to average together

  • average (str) – mean or gmean

Return type:

DataFrame | None

geowatch.mlops.aggregate.inspect_node(subagg, id, row, group_agg, agg_group_dpath)[source]
geowatch.mlops.aggregate.aggregate_param_cols(df, aggregator=None, hash_cols=None, allow_nonuniform=False)[source]

Aggregates parameter columns. Specified hash_cols should be dataset-specific columns to be hashed. All other columns should be effectively the same, otherwise we will warn.

Parameters:

hash_cols (None | List[str]) – columns whos values should be hashed together.

Returns:

a single row representing the combined rows

Return type:

pandas.Series

Todo

  • [ ] optimize this

  • [ ] Rectify with ~/code/watch/geowatch/utils/util_pandas.py :: aggregate_columns

Example

>>> from geowatch.mlops.aggregate import *  # NOQA
>>> import pandas as pd
>>> agg = Aggregator.demo(num=3)
>>> agg.build()
>>> df = pd.concat([agg.table] * 3).reset_index()
>>> import scipy.stats.mstats
>>> gmean = scipy.stats.mstats.gmean
>>> aggregator = {'metrics.demo_node.metric1': gmean}
>>> hash_cols = 'param_hashid'
>>> allow_nonuniform = True
>>> hash_cols = ['region_id'] + agg.test_dset_cols
>>> agg_row = aggregate_param_cols(df, aggregator=aggregator, hash_cols=hash_cols, allow_nonuniform=allow_nonuniform)
>>> print(agg_row)
geowatch.mlops.aggregate.macro_aggregate(agg, group, aggregator, average='mean')[source]

Helper function

geowatch.mlops.aggregate.hash_param(row, version=1)[source]

Rule of thumb for probability of a collision:

geowatch.mlops.aggregate.hash_regions(rois)[source]
geowatch.mlops.aggregate.nan_eq(a, b)[source]