hydrodata.data_catalog.data_access module

Functions to access data from data catalog information.

get_catalog_entries()

get_catalog_entry()

get_file_paths()

get_numpy_data()

get_ndarray()

get_table_names()

get_table_rows()

get_table_row()

get_yaml_data_catalog()

grid_to_latlng()

latlng_to_grid()

get_huc_bbox()

get_huc_from_latlng()

get_huc_from_xy()

hydrodata.data_catalog.data_access.construct_string_from_qparams(entry, options)

Constructs the query parameters from the entry and options provided.

Parameters:
Returns:

data – the requested data.

Return type:

numpy array

hydrodata.data_catalog.data_access.get_catalog_entries(*args, **kwargs) List[ModelTableRow]

Get data catalog entry rows selected by filter options.

Parameters:
  • args – Optional positional parameter that must be a dict with filter options.

  • kwargs – Supports multiple named parameters with filter option values.

Returns:

A list of ModelTableRow entries that match the filter options.

A filter option is any column name in the data_catalog_entry table. The value of the filter option argument must be a value of that column in the table.

For example,

entries = get_catalog_entries(dataset=”NLDAS2”, file_type=”pfb”, period=”daily”)

# 9 forcing variables and all metadata

assert len(entries) == 9

assert len(entries[0].column_names()) == 20

assert entries[0][“variable”] == “precipitation”

Note, it is better to filter by dimension options like “variable” instead of free-form volitile options like “dataset_var”.

hydrodata.data_catalog.data_access.get_catalog_entry(*args, **kwargs) ModelTableRow

Get a single data catalog entry row selected by filter options.

Parameters:
  • args – Optional positional parameters that must be a dict with filter options.

  • kwargs – Supports multiple named parameters with filter option values.

Returns:

A single of ModelTableRow entries that matches the filter options or None if no entry matches the filter options.

Raises:

ValueError If the filter options do not uniquely select a single data_catalog_entry.

This method is the same as get_catalog_entries() except returns a single row rather than a list of rows.

hydrodata.data_catalog.data_access.get_file_path(entry, *args, **kwargs) List[str]

Get the file path for a data catalog entry. :param entry: Either a ModelTableRow or the ID number of a data_catalog_entry. If None use the entry found by the filters. :param args: Optional positional parameter that must be a dict with data filter options. :param kwargs: Supports multiple named parameters with data filter option values.

Returns:

A single path path for the data catalog entry

Raises:

ValueError if there is no path or multiple paths for the data catalog entry.

hydrodata.data_catalog.data_access.get_file_paths(entry, *args, **kwargs) List[str]

Get a list of file paths from the data catalog identified by the filter options.

Parameters:
  • entry – Either a ModelTableRow or the ID number of a data_catalog_entry.

  • args – Optional positional parameter that must be a dict with data filter options.

  • kwargs – Supports multiple named parameters with data filter option values.

Returns:

A list of absolute path names to files of the data catalog entry after looping through time range and replacing substitution keys.

A data filter option must be one of the following:
  • start_time: A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.

  • end_time: A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.

  • site_id: An observation point site id.

If only start_time is specified then only the data at that time is used to substitute into path names. If both start_time and end_time is specified then dates in the date range are used to substitute into path names.

For example, get 3 daily files bewteen 9/30/2021 and 10/3/2021.

entry = get_catalog_entry(dataset=”NLDAS2”, file_type=”pfb”, period=”daily”, variable=”precipitation”)

paths = get_file_paths(entry, start_time=”2021-09-30”, end_time=”2021-10-03”)

assert len(paths) = 3

hydrodata.data_catalog.data_access.get_huc_bbox(grid: str, huc_id_list: List[str]) List[int]

Get the grid bounding box containing all the HUC ids. :param grid: A grid id from the data catalog (e.g. conus1 or conus2) :param huc_id_list: A list of HUC id strings of HUCs in the grid.

Returns:

A bounding box as a list of int (i_min, j_min, i_max, j_max)

Raises:
  • ValueError if all the HUC id are not at the same level (same length).

  • ValueError if grid is not valid.

hydrodata.data_catalog.data_access.get_huc_from_latlng(grid: str, level: int, lat: float, lng: float) str

Get a HUC id at a lat/lng point for a given grid and level. :param grid: grid name (e.g. conus1 or conus2) :param level: HUC level (length of HUC id to be returned) lat: lattitude of point :param lng: longitude of point

Returns:

The HUC id string containing the lat/lng point or None.

hydrodata.data_catalog.data_access.get_huc_from_xy(grid: str, level: int, x: int, y: int) str

Get a HUC id at an xy point for a given grid and level. :param grid: grid name (e.g. conus1 or conus2) :param level: HUC level (length of HUC id to be returned) x: x coordinate in the grid :param y: y coordinate in the grid

Returns:

The HUC id string containing the lat/lng point or None.

hydrodata.data_catalog.data_access.get_ndarray(entry, *args, **kwargs) ndarray

Get a numpy ndarray from files in /hydrodata with the applied data filters. If a time_values data filter option is provided as an arrat this array will be populated with the date strings of the date dimentions of the ndarray that is returned. This is used for graphing use cases.

Parameters:
  • entry – Either a ModelTableRow or the ID number of a data_catalog_entry. If None then get entry from options.

  • args – Optional positional parameter that must be a dict with data filter options.

  • kwargs – Supports multiple named parameters with data filter option values.

Returns:

A numpy ndarray containing the data loaded from the files identified by the entry and sliced by the data filter options.

Raises:

ValueError if both grid_bounds and latlng_bounds are specified as data filters.

A data filter option must be one of the following:
  • start_time: A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.

  • end_time: A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.

  • site_id: An observation point site id.

  • grid_bounds: An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.

  • latlng_bounds: An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.

  • depth: A number 0-n of the depth index of the data to return.

  • z: A value of the z dimension to be used as a filter to remove this dismension.

  • level: A HUC level integer when reading HUC boundary files.

For gridded results the returned numpy array has dimensions:
  • [hour, y, x]````` period is hourly without z dimension

  • [day, y, x] period is daily without z dimension

  • [month, y, x] period is monthly without z dimension

  • [y, x] period is static or blank without z dimension

  • [hour, z, y, x]````` period is hourly with z dimension

  • [day, z, y, x] period is daily with z dimension

  • [month, z, y, x] period is monthly with z dimension

  • [z, y, x] period is static or blank with z dimension

If the dataset has ensembles then there is an ensemble dimension at the beginning.

Both start_time and end_time must be in the form “YYYY-MM-DD HH:MM:SS” or “YYYY-MM-DD” or a datetime object.

If only start_time is specified than only that month/day/hour is returned. The start_time is inclusive the end_time is exclusive (data returned less than that time).

If either grid_bounds or latlng_bounds is specified then the result is sliced by the x,y values in the bounds.

If z is specified then the result is sliced by the z dimension.

For example, to get data from the 3 daily files bewteen 9/30/2021 and 10/3/2021.

bounds = [200, 200, 300, 250]

entry = get_catalog_entry(dataset=”NLDAS2”, file_type=”pfb”, period=”daily”, variable=”precipitation”)

data = get_ndarray(entry, start_time=”2021-09-30”, end_time=”2021-10-03”, grid_bounds = bounds)

# The result has 3 days in the time dimension and sliced to x,y shape 100x50 at origin 200, 200 in the conus1 grid.

assert data.shape == [3, 100, 50]

hydrodata.data_catalog.data_access.get_numpy_data(*args, **kwargs) ndarray

Get a numpy ndarray from files in /hydrodata with the applied data filters. If a time_values data filter option is provided as an arrat this array will be populated with the date strings of the date dimentions of the ndarray that is returned. This is used for graphing use cases.

Parameters:
  • entry – Either a ModelTableRow or the ID number of a data_catalog_entry. If None then get entry from options.

  • args – Optional positional parameter that must be a dict with data filter options.

  • kwargs – Supports multiple named parameters with data filter option values.

Returns:

A numpy ndarray containing the data loaded from the files identified by the entry and sliced by the data filter options.

Raises:

ValueError if both grid_bounds and latlng_bounds are specified as data filters.

A data filter option must be one of the following:
  • start_time: A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.

  • end_time: A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.

  • site_id: An observation point site id.

  • grid_bounds: An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.

  • latlng_bounds: An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.

  • depth: A number 0-n of the depth index of the data to return.

  • z: A value of the z dimension to be used as a filter to remove this dismension.

  • level: A HUC level integer when reading HUC boundary files.

For gridded results the returned numpy array has dimensions:
  • [hour, y, x]````` period is hourly without z dimension

  • [day, y, x] period is daily without z dimension

  • [month, y, x] period is monthly without z dimension

  • [y, x] period is static or blank without z dimension

  • [hour, z, y, x]````` period is hourly with z dimension

  • [day, z, y, x] period is daily with z dimension

  • [month, z, y, x] period is monthly with z dimension

  • [z, y, x] period is static or blank with z dimension

If the dataset has ensembles then there is an ensemble dimension at the beginning.

Both start_time and end_time must be in the form “YYYY-MM-DD HH:MM:SS” or “YYYY-MM-DD” or a datetime object.

If only start_time is specified than only that month/day/hour is returned. The start_time is inclusive the end_time is exclusive (data returned less than that time).

If either grid_bounds or latlng_bounds is specified then the result is sliced by the x,y values in the bounds.

If z is specified then the result is sliced by the z dimension.

For example, to get data from the 3 daily files bewteen 9/30/2021 and 10/3/2021.

bounds = [200, 200, 300, 250]

data = get_numpy_data(dataset=”NLDAS2”, file_type=”pfb”, period=”daily”, variable=”precipitation”,

start_time=”2021-09-30”, end_time=”2021-10-03”, grid_bounds = bounds)

# The result has 3 days in the time dimension and sliced to x,y shape 100x50 at origin 200, 200 in the conus1 grid.

assert data.shape == [3, 100, 50]

hydrodata.data_catalog.data_access.get_table_names() List[str]

Return the list of table names in the data model.

hydrodata.data_catalog.data_access.get_table_row(table_name: str, *args, **kwargs) ModelTableRow

Get one row of a data model table filtered by columns from that table.

Parameters:
  • table_name – The name of a table in the data model.

  • args – Optional positional parameter that must be a dict with filter options.

  • kwargs – Supports multiple named parameters with filter option values.

Returns:

A single of ModelTableRow entries of the specified table_name that match the filter options or None if now row is found.

Raises:

ValueError if the filter options are ambiguous and this matches more than one row.

For example,

row = get_table_rows(“variable”, variable_type=”atmospheric”)

assert row[“variable”] == “air_temp”

hydrodata.data_catalog.data_access.get_table_rows(table_name: str, *args, **kwargs) List[ModelTableRow]

Get rows of a data model table filtered by columns from that table.

Parameters:
  • table_name – The name of a table in the data model.

  • args – Optional positional parameter that must be a dict with filter options.

  • kwargs – Supports multiple named parameters with filter option values.

Returns:

A list of ModelTableRow entries of the specified table_name that match the filter options.

For example,

rows = get_table_rows(“variable”, variable_type=”atmospheric”)

assert row[0][“variable”] == “air_temp”

hydrodata.data_catalog.data_access.get_yaml_data_catalog() dict

Get the parsed yaml data catalog.

Returns:

The hydrodata_catalog.yaml file loaded into a dict.

The parsed yaml file representation of the data catalog is useful for use by non-python clients such as javascript that want access to the data model information.

hydrodata.data_catalog.data_access.grid_to_latlng(grid: str, *args) List[float]

Convert grid x,y coordinates to lat,lng.

Parameters:
  • grid – The name of a grid dimension from the data catalog grid table (e.g. conus1 or conus2, or smapgrid).

  • args – A list of numbers of (x,y) values that are coordinates in the grid (may be int or float).

Returns:

An array of lat,lng points converted from each of the (x,y) grid coordinates in args.

Note, this may be used to convert a single point or a bounds of 2 points or a large array of points.

This conversion is fast. It is about 12.8K points/second (8 min for all points in conus1).

For example,

(lat, lng) = grid_to_latlng(“conus1”, 10, 10)

latlng_bounds = grid_to_latlng(“conus1”, *[0, 0, 20, 20])

(lat, lng) = grid_to_latlng(“conus1”, 10.5, 10.5)

hydrodata.data_catalog.data_access.latlng_to_grid(grid: str, *args) List[float]

Convert grid lat,lng coordinates to x,y.

Parameters:
  • grid – The name of a grid dimension from the data catalog grid table (e.g. conus1 or conus2, or smapgrid).

  • args – A list of floating pairs if (lat,lng) values.

Returns:

An array of x,y integer points converted from each of the (lat,lng) grid coordinates in args.

Note, this may be used to convert a single point or a bounds of 2 points or a large array of points.

This conversion is fast. It is about 12.8K points/second (8 min for all points in conus1).

For example,

(x, y) = grid_to_latlng(“conus1”, 31.759219, -115.902573)

latlng_bounds = grid_to_latlng(“conus1”, *[31.651836, -115.982367, 31.759219, -115.902573])