hydrodata.model_evaluation.utils module
Model evaluation utility functions.
Note that these functions are not intended to be used stand-alone; they act as sub-processes within the model_evaluation.evaluate method.
- hydrodata.model_evaluation.utils.aggregate_dataframe(input_df, input_type, aggregate_level, aggregate_method)
Aggregate daily DataFrame to different time period (weekly, monthly, yearly).
- Parameters:
input_df (DataFrame) – Pandas DataFrame with at least site_id, x, y, and columns for daily time series.
input_type (str) – Description of what type of input the DataFrame is sourced from: ‘observation’ or ‘simulation’.
aggregate_level (str) – Level at which to aggregate time series. Options include ‘day’, ‘week’, ‘month’, etc.
aggregate_method (str) – Type of aggregation to conduct. Options include ‘mean’, ‘min’, ‘max’, etc.
- Returns:
Pandas DataFrame with site_id, x, y, and columns for time series at the aggregated level.
- Return type:
DataFrame
- hydrodata.model_evaluation.utils.calculate_stats(observations, simulations, metrics, aggregate_level, aggregate_method, date_start, date_end, statistics_df, debug=False)
Calculate evaluation metric statistics.
- Parameters:
observations (DataFrame) – Pandas DataFrame that includes site ID, x and y CONUS grid mapping values, and columns for each time point to use for evaluation comparison. This DataFrame should already be aggregated to the aggregate_level defined.
simulations (ndarray) – Array of size (t, y, x), where y and x are the same size as CONUS1 or CONUS2 grid, depending on which x, y values are included in the observations DataFrame. The t dimension should match the number of time values included in the ‘daily’ observations DataFrame series. This array should not be aggregated to the aggregate_level defined. That process happens within this function. The spatial extent should be the same size as specified by the grid_bounds parameter if it is not the size of the full CONUS1 or CONUS2 grid.
metrics (list) – List of string names of metrics to use for evaluation. Must be present in METRICS_DICT dictionary in the model_evaluation.py module.
aggregate_level (str) – Level at which to aggregate time series. Options include ‘day’, ‘week’, ‘month’, etc.
aggregate_method (str) – Type of aggregation to conduct. Options include ‘mean’, ‘min’, ‘max’, etc.
date_start (str) – ‘YYYY-MM-DD’ date indicating beginning of time range.
date_end (str) – ‘YYYY-MM-DD’ date indicating end of time range.
statistics_df (DataFrame) – DataFrame containing site ID, x and y CONUS grid mapping values, along with empty columns for each of the evaluation metrics defined in metrics.
debug (bool; default=False) – Whether to show debugging print statements.
- Returns:
Pandas DataFrame including calculated evaluation metrics for each site.
- Return type:
DataFrame
- hydrodata.model_evaluation.utils.collect_simulations(config_dict)
Collect and pre-process simulations data according to file format provided.
- Parameters:
config_dict (dict) – Dictionary from YAML configuration file with the simulations information for a given observation type.
- Returns:
NumPy array to use as input for evaluation comparison.
- Return type:
ndarray
- hydrodata.model_evaluation.utils.create_dataframe_from_dates(data, date_start, date_end)
Given a 1D NumPy array, create a DataFrame with columns spanning the given date range.
- Parameters:
data (1D array) – 1-dimensional array containing the simulation results for a single x, y geo-location.
date_start (str) – ‘YYYY-MM-DD’ date indicating beginning of time range.
date_end (str) – ‘YYYY-MM-DD’ date indicating end of time range.
- Returns:
DataFrame with the columns as dates and values as data from input array.
- Return type:
DataFrame
- hydrodata.model_evaluation.utils.filter_to_mapped_sites(df)
Filter site mapping data for only non-NaN values in mapping.
- Parameters:
df (DataFrame) – Pandas DataFrame consisting of site ID, x, and y values representing the CONUS x and y coordinate mapping for each site.
- Returns:
Pandas DataFrame consisting of site ID, x, and y values representing the CONUS x and y coordinate mapping for each site. Only sites with valid CONUS mapping values are included.
- Return type:
DataFrame
- hydrodata.model_evaluation.utils.filter_to_sites_in_bounds(mapped_df, grid_bounds)
Filter site list and grid-mapping DataFrame to only sites within grid_bounds. Adjust bound values to be relative to indexing on subset domain.
- Parameters:
mapped_df (DataFrame) – Pandas DataFrame consisting of site ID, x, and y values representing the CONUS x and y coordinate mapping for each site.
grid_bounds (list) – List consisting of [xmin, ymin, xmax, ymax] CONUS grid bounds for domain.
- Returns:
subset_site_list (list) – List of string site ID values restricted to sites within provided grid_bounds.
remapped_df (DataFrame) – Pandas DataFrame consisting of site ID, x, and y values representing the CONUS x and y coordinate mapping for each site. DataFrame is subset to only sites within provided grid_bounds.
- hydrodata.model_evaluation.utils.get_full_site_list(observation_type)
Return list of site IDs with daily data available for a particular site type.
- Parameters:
observation_type (str) – Type of observation. Examples include ‘streamflow’, ‘wtd’, ‘swe’.
- Returns:
site_list – List of site ID strings for all sites with data available for the specific site type.
- Return type:
list
- hydrodata.model_evaluation.utils.get_network_site_list(observation_type, network_names)
Return list of site IDs for desired network of observation sites.
- Parameters:
observation_type (str) – Type of observation. Examples include ‘streamflow’, ‘wtd’, ‘swe’.
network_names (list) – List of names of site networks. Can be a list with a single network name. Each network must have matching .csv file with a list of site ID values that comprise the network. This .csv file must be located in either STREAMFLOW_NETWORK_DIR or GROUNDWATER_NETWORK_DIR (as applicable) named as ‘network_name’.csv.
- Returns:
site_list – List of site ID strings for sites belonging to named network.
- Return type:
list
- hydrodata.model_evaluation.utils.initialize_output(sites_df, metrics, aggregation_levels)
Initialize DataFrame table to store output.
- Parameters:
site_df (DataFrame) – Pandas DataFrame consisting of at least site_id, x, and y CONUS grid mapping values.
metrics (list) – List of string names of metrics to use for evaluation. Must be present in METRICS_DICT dictionary in the model_evaluation.py module.
aggregation_levels (list) – List of aggregation levels to calculate evaluations for each metric in metrics. Options currently include: ‘day’, ‘week’, ‘month’, ‘calendar year’.
- Returns:
DataFrame containing site ID, x and y CONUS grid mapping values, along with empty columns for each of the evaluation metrics defined in metrics.
- Return type:
DataFrame
- hydrodata.model_evaluation.utils.map_observations(observation_type, conus_version, site_list)
Get CONUS grid mapping values for sites.
- Parameters:
observation_type (str) – Type of observation. Examples include ‘streamflow’, ‘wtd’, ‘swe’.
conus_version (str) – Reference for whether simulations are on CONUS1 or CONUS2 grid. Options include ‘conus1’ and ‘conus2’.
site_list (list) – List of site ID strings to query conus x, y coordinates for.
- Returns:
Pandas DataFrame consisting of site ID, x, and y values representing the CONUS x and y coordinate mapping for each site. Only sites with valid CONUS mapping values are included.
- Return type:
DataFrame
- hydrodata.model_evaluation.utils.prepare_emulator_simulations(filepath, emulator_var, trace=1)
Prepare ParFlow emulator output for comparison to observations.
- Parameters:
filepath (str) – Full path to location of ParFlow emulator output file.
emulator_var (str) – Name of variable to slice from ParFlow emulator output file.
trace (int; default=1) – Value of trace to slice from ParFlow emulator output file.
- Returns:
a (NumPy array) – Array containing the extracted simulations data.
date_start (str) – ‘YYYY-MM-DD’ date indicating beginning of time range.
date_end (str) – ‘YYYY-MM-DD’ date indicating end of time range.
- hydrodata.model_evaluation.utils.prepare_netcdf_simulations(filepath, var)
Prepare output in NetCDF file for comparison to observations.
- Parameters:
filepath (str) – Full path to location of ParFlow emulator output file.
var (str) – Name of variable to slice from NetCDF file.
- Returns:
a (NumPy array) – Array containing the extracted simulations data.
date_start (str) – ‘YYYY-MM-DD’ date indicating beginning of time range.
date_end (str) – ‘YYYY-MM-DD’ date indicating end of time range.
- hydrodata.model_evaluation.utils.prepare_simulations_time_series(simulations, x, y, date_start, date_end, aggregate_level, aggregate_method)
Pre-process simulations data to be in comparable format to aggregated observations.
- Parameters:
simulations (ndarray) – Array of size (t, y, x), where y and x are the same size as CONUS1 or CONUS2 grid, depending on which x, y values are included in the observations DataFrame. The t dimension should match the number of time values included in the ‘daily’ observations DataFrame series. This array should not be aggregated to the aggregate_level defined. That process happens within this function. The spatial extent should be the same size as specified by the grid_bounds parameter if it is not the size of the full CONUS1 or CONUS2 grid.
x (int) – x coordinate to index simulation array. The origin (0,0) is from the lower left of the grid. The value represents the count of conus grid cells away from the origin.
y (int) – y coordinate to index simulation array. The origin (0,0) is from the lower left of the grid. The value represents the count of conus grid cells away from the origin.
date_start (str) – ‘YYYY-MM-DD’ date indicating beginning of time range.
date_end (str) – ‘YYYY-MM-DD’ date indicating end of time range.
aggregate_level (str) – Level at which to aggregate time series. Options include ‘day’, ‘week’, ‘month’, etc.
aggregate_method (str) – Type of aggregation to conduct. Options include ‘mean’, ‘min’, ‘max’, etc.
- Returns:
1-dimensional array containing the aggregated simulation results for the given x, y geo-location.
- Return type:
1d array
- hydrodata.model_evaluation.utils.query_attributes_conus_xy(observation_type, conus_version, site_list)
Query site attributes database table for conus grid values.
- Parameters:
observation_type (str) – Type of observation. Examples include ‘streamflow’, ‘wtd’, ‘swe’.
conus_version (str) – Reference for whether simulations are on CONUS1 or CONUS2 grid. Options include ‘conus1’ and ‘conus2’.
site_list (list) – List of site ID strings to query conus x, y coordinates for.
- Returns:
Pandas DataFrame consisting of site ID, x, and y values representing the CONUS x and y coordinate mapping for each site.
- Return type:
DataFrame