hydrodata.point_observations.pandas.collect_observations module
- hydrodata.point_observations.pandas.collect_observations.get_citation_information(data_source, site_ids=None)
- Print and/or return specific citation information for requested data source. - Parameters:
- data_source (str) – Source from which data originates. Options include: ‘usgs_nwis’, ‘usda_nrcs’, and ‘ameriflux’. 
- site_ids (list; default None) – If provided, the specific list of sites to return data DOIs for. This is only supported if data_source == ‘ameriflux’. 
 
- Returns:
- Nothing returned unless data_source == ameriflux and the parameter site_list is provided. 
- Return type:
- None or DataFrame of site-specific DOIs 
 
- hydrodata.point_observations.pandas.collect_observations.get_pandas_observations(data_source, variable, temporal_resolution, aggregation, depth_level=None, date_start=None, date_end=None, latitude_range=None, longitude_range=None, site_ids=None, state=None, min_num_obs=1, return_metadata=False, all_attributes=False)
- Collect observations data into a Pandas DataFrame. - Observations collected from HydroData for the specified data source, variable, temporal resolution, and aggregation. Optional arguments can be supplied for date bounds, geography bounds, the minimum number of per-site observations allowed, and/or whether site metadata should also be returned (in a separate DataFrame). - Parameters:
- data_source (str) – Source from which requested data originated. Currently supported: ‘usgs_nwis’, ‘usda_nrcs’, ‘ameriflux’. 
- variable (str) – Description of type of data requested. Currently supported: ‘streamflow’, ‘wtd’, ‘swe’, ‘precipitation’, ‘temperature’, ‘soil moisture’, ‘latent heat flux’, ‘sensible heat flux’, ‘shortwave radiation’, ‘longwave radiation’, ‘vapor pressure deficit’, ‘wind speed’. 
- temporal_resolution (str) – Collection frequency of data requested. Currently supported: ‘daily’, ‘hourly’, and ‘instantaneous’. Please see the README documentation for allowable combinations with variable. 
- aggregation (str) – Additional information specifying the aggregation method for the variable to be returned. Options include descriptors such as ‘average’ and ‘total’. Please see the README documentation for allowable combinations with variable. 
- depth_level (int) – Depth level in inches at which the measurement is taken. Necessary for variable = ‘soil moisture’. 
- date_start (str; default=None) – ‘YYYY-MM-DD’ date indicating beginning of time range. 
- date_end (str; default=None) – ‘YYYY-MM-DD’ date indicating end of time range. 
- latitude_range (tuple; default=None) – Latitude range bounds for the geographic domain; lesser value is provided first. 
- longitude_range (tuple; default=None) – Longitude range bounds for the geographic domain; lesser value is provided first. 
- site_ids (list; default=None) – List of desired (string) site identifiers. 
- state (str; default=None) – Two-letter postal code state abbreviation. 
- min_num_obs (int; default=1) – Value for the minimum number of observations desired for a site to have. 
- return_metadata (bool; default=False) – Whether to additionally return a DataFrame containing site metadata. 
- all_attributes (bool; default=False) – Whether to include all available attributes on returned metadata DataFrame. 
- db_path (str) – Full path to location of point observations database. 
 
- Returns:
- data_df (DataFrame) – Stacked observations data for a single variable, filtered to only sites that (optionally) have the minimum number of observations specified, within the defined geographic and/or date range. 
- metadata_df (DataFrame; optional) – Metadata about the sites present in data_df for the desired variable.