hydrodata.point_observations.pandas.collect_observations module
- hydrodata.point_observations.pandas.collect_observations.get_citation_information(data_source, site_ids=None)
Print and/or return specific citation information for requested data source.
- Parameters:
data_source (str) – Source from which data originates. Options include: ‘usgs_nwis’, ‘usda_nrcs’, and ‘ameriflux’.
site_ids (list; default None) – If provided, the specific list of sites to return data DOIs for. This is only supported if data_source == ‘ameriflux’.
- Returns:
Nothing returned unless data_source == ameriflux and the parameter site_list is provided.
- Return type:
None or DataFrame of site-specific DOIs
- hydrodata.point_observations.pandas.collect_observations.get_pandas_observations(data_source, variable, temporal_resolution, aggregation, depth_level=None, date_start=None, date_end=None, latitude_range=None, longitude_range=None, site_ids=None, state=None, min_num_obs=1, return_metadata=False, all_attributes=False)
Collect observations data into a Pandas DataFrame.
Observations collected from HydroData for the specified data source, variable, temporal resolution, and aggregation. Optional arguments can be supplied for date bounds, geography bounds, the minimum number of per-site observations allowed, and/or whether site metadata should also be returned (in a separate DataFrame).
- Parameters:
data_source (str) – Source from which requested data originated. Currently supported: ‘usgs_nwis’, ‘usda_nrcs’, ‘ameriflux’.
variable (str) – Description of type of data requested. Currently supported: ‘streamflow’, ‘wtd’, ‘swe’, ‘precipitation’, ‘temperature’, ‘soil moisture’, ‘latent heat flux’, ‘sensible heat flux’, ‘shortwave radiation’, ‘longwave radiation’, ‘vapor pressure deficit’, ‘wind speed’.
temporal_resolution (str) – Collection frequency of data requested. Currently supported: ‘daily’, ‘hourly’, and ‘instantaneous’. Please see the README documentation for allowable combinations with variable.
aggregation (str) – Additional information specifying the aggregation method for the variable to be returned. Options include descriptors such as ‘average’ and ‘total’. Please see the README documentation for allowable combinations with variable.
depth_level (int) – Depth level in inches at which the measurement is taken. Necessary for variable = ‘soil moisture’.
date_start (str; default=None) – ‘YYYY-MM-DD’ date indicating beginning of time range.
date_end (str; default=None) – ‘YYYY-MM-DD’ date indicating end of time range.
latitude_range (tuple; default=None) – Latitude range bounds for the geographic domain; lesser value is provided first.
longitude_range (tuple; default=None) – Longitude range bounds for the geographic domain; lesser value is provided first.
site_ids (list; default=None) – List of desired (string) site identifiers.
state (str; default=None) – Two-letter postal code state abbreviation.
min_num_obs (int; default=1) – Value for the minimum number of observations desired for a site to have.
return_metadata (bool; default=False) – Whether to additionally return a DataFrame containing site metadata.
all_attributes (bool; default=False) – Whether to include all available attributes on returned metadata DataFrame.
db_path (str) – Full path to location of point observations database.
- Returns:
data_df (DataFrame) – Stacked observations data for a single variable, filtered to only sites that (optionally) have the minimum number of observations specified, within the defined geographic and/or date range.
metadata_df (DataFrame; optional) – Metadata about the sites present in data_df for the desired variable.