geowatch.utils.simple_dvc module

A simplified Python DVC API

class geowatch.utils.simple_dvc.SimpleDVC(dvc_root=None, remote=None)[source]

Bases: NiceRepr

A Simple DVC API

Parameters:
  • dvc_root (Path) – path to DVC repo directory

  • remote (str) – dvc remote to sync to by default

classmethod init(dpath, no_scm=False, force=False, verbose=0)[source]

Initialize a DVC repo in a path

property cache_dir
classmethod demo_dpath(reset=False)[source]
classmethod coerce(dvc_path, **kw)[source]

Given a path inside DVC, finds the root.

classmethod find_root(path=None)[source]

Given a path, search its ancestors to find the root of a dvc repo.

Returns:

Path | None

add(path, verbose=0)[source]
Parameters:

path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add

pathsremove(path, verbose=0)[source]
Parameters:

path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add

check_ignore(path, details=0, verbose=0)[source]
git_pull()[source]
git_push()[source]
git_commit(message)[source]
git_commitpush(message='', pull_on_fail=True)[source]

TODO: better name here?

push(path, remote=None, recursive=False, jobs=None, verbose=0)[source]

Push the content tracked by .dvc files to remote storage.

Parameters:
  • path (Path | List[Path) – one or more file paths that should have an associated .dvc sidecar file or if recursive is true, a directory containing multiple tracked files.

  • remote (str) – the name of the remote registered in the .dvc/config to push to

  • recursive (bool) – if True, then items in path can be a directory.

  • jobs (int) – number of parallel workers

pull(path, remote=None, recursive=False, jobs=None, verbose=0)[source]
request(path, remote=None)[source]

Requests to ensure that a specific file from DVC exists.

Any files that do not exist, check to see if there is an associated .dvc sidecar file. If any sidecar files are missing, an error is thrown. Otherwise we attempt to pull the missing files.

Parameters:

path (Path | List[Path) – one or more file paths that should have an associated .dvc sidecar file.

unprotect(path)[source]
is_tracked(path)[source]
classmethod find_file_tracker(path)[source]
find_dir_tracker(path)[source]
read_dvc_sidecar(sidecar_fpath)[source]
resolve_cache_paths(sidecar_fpath)[source]

Given a .dvc file, enumerate the paths in the cache associated with it.

Parameters:

sidecar_fpath (PathLike | str) – path to the .dvc file

find_sidecar_paths(dpath)[source]
Parameters:

dpath (Path | str) – directory in dvc repo to search

Yields:

ub.Path – existing dvc sidecar files

resolve_sidecar(path)[source]

Given a path in a DVC repo, resolve it to a sidecar file that it corresponds to. If the input is a .dvc file return it.

If it is inside a directory that corresponds to a dvc repo, search for that.

Parameters:

path (Path | str) – directory or file in dvc repo to search

Yields:

ub.Path – existing dvc sidecar files

class geowatch.utils.simple_dvc.SimpleDVC_CLI(description='', sub_clis=None, version=None)[source]

Bases: ModalCLI

A DVC CLI That uses our simplified (and more permissive) interface.

The main advantage is that you can run these commands outside a DVC repo as long as you point to a valid in-repo path.

class Add(*args, **kwargs)[source]

Bases: DataConfig

Add data to the DVC repo.

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

classmethod main(cmdline=1, **kwargs)[source]
default = {'paths': <Value([])>}
class Request(*args, **kwargs)[source]

Bases: DataConfig

Pull data if the requested file doesn’t exist.

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

classmethod main(cmdline=1, **kwargs)[source]
default = {'paths': <Value([])>, 'remote': <Value(None)>}
class CacheDir(*args, **kwargs)[source]

Bases: DataConfig

Print the cache directory

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

classmethod main(cmdline=1, **kwargs)[source]
default = {'dvc_root': <Value('.')>}