airbase package¶
- class airbase.AirbaseClient(connect=True)[source]¶
Bases:
object
The central point for requesting Airbase data.
- Parameters
connect (bool) – (optional) Immediately test network connection and download available countries and pollutants. If False, .connect() must be called before making data requests. Default True.
- Example
>>> client = AirbaseClient() >>> r = client.request(["NL", "DE"], pl=["O3", "NO2"]) >>> r.download_to_directory("data/raw") Generating CSV download links... 100%|██████████| 4/4 [00:09<00:00, 2.64s/it] Generated 5164 CSV links ready for downloading Downloading CSVs to data/raw... 100%|██████████| 5164/5164 [43:39<00:00, 1.95it/s] >>> r.download_metadata("data/metadata.tsv") Writing metadata to data/metadata.tsv...
- property all_countries¶
All countries available from AirBase.
- property all_pollutants¶
All pollutants available from AirBase.
- connect(timeout=None)[source]¶
Download the available countries and pollutants for validation.
- Parameters
timeout (float) – Raise ConnectionError if the server takes longer than timeout seconds to respond.
- Returns
self
- static download_metadata(filepath, verbose=True)[source]¶
Download the metadata file.
See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.
- Parameters
filepath (str) –
verbose (bool) –
- property pollutants_per_country¶
The pollutants available in each country from AirBase.
- request(country=None, pl=None, shortpl=None, year_from='2013', year_to='2022', source='All', update_date=None, verbose=True, preload_csv_links=False)[source]¶
Initialize an AirbaseRequest for a query.
Pollutants can be specified either by name (pl) or by code (shortpl). If no pollutants are specified, data for all available pollutants will be requested. If a pollutant is not available for a country, then we simply do not try to download those CSVs.
Requests proceed in two steps: First, links to individual CSVs are requested from the Airbase server. Then these links are used to download the individual CSVs.
See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.
- Parameters
country (str|list) – (optional), 2-letter country code or a list of them. If a list, data will be requested for each country. Will raise ValueError if a country is not available on the server. If None, data for all countries will be requested. See self.all_countries.
pl (str|list) – (optional) The pollutant(s) to request data for. Must be one of the pollutants in self.all_pollutants. Cannot be used in conjunction with shortpl.
shortpl (str|list) – (optional). The pollutant code(s) to request data for. Will be applied to each country requested. Cannot be used in conjunction with pl.
year_from (str) – (optional) The first year of data. Can not be earlier than 2013. Default 2013.
year_to (str) – (optional) The last year of data. Can not be later than the current year. Default <current year>.
source (str) – (optional) One of “E1a”, “E2a” or “All”. E2a (UTD) data are only available for years where E1a data have not yet been delivered (this will normally be the most recent year). Default “All”.
update_date (str|datetime) – (optional). Format “yyyy-mm-dd hh:mm:ss”. To be used when only files created or updated after a certain date is of interest.
verbose (bool) – (optional) print status messages to stderr. Default True.
preload_csv_links (bool) – (optional) Request all the csv download links from the Airbase server at object initialization. Default False.
- Return AirbaseRequest
The initialized AirbaseRequest.
- Example
>>> client = AirbaseClient() >>> r = client.request(["NL", "DE"], pl=["O3", "NO2"]) >>> r.download_to_directory("data/raw") Generating CSV download links... 100%|██████████| 4/4 [00:09<00:00, 2.64s/it] Generated 5164 CSV links ready for downloading Downloading CSVs to data/raw... 100%|██████████| 5164/5164 [43:39<00:00, 1.95it/s] >>> r.download_metadata("data/metadata.tsv") Writing metadata to data/metadata.tsv...
- search_pollutant(query, limit=None)[source]¶
Search for a pollutant’s shortpl number based on its name.
- Parameters
query (str) – The pollutant to search for.
limit (int) – (optional) Max number of results.
- Return list[dict]
The best pollutant matches. Pollutants are dicts with keys “pl” and “shortpl”.
- Example
>>> AirbaseClient().search_pollutant("o3", limit=2) >>> [{"pl": "O3", "shortpl": "7"}, {"pl": "NO3", "shortpl": "46"}]
- class airbase.AirbaseRequest(country=None, shortpl=None, year_from='2013', year_to='2022', source='All', update_date=None, verbose=True, preload_csv_links=False)[source]¶
Bases:
object
Handler for Airbase data requests.
Requests proceed in two steps: First, links to individual CSVs are requested from the Airbase server. Then these links are used to download the individual CSVs.
See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.
- Parameters
country (str|list) – 2-letter country code or a list of them. If a list, data will be requested for each country.
shortpl (str|list) – (optional). The pollutant code to request data for. Will be applied to each country requested. If None, all available pollutants will be requested. If a pollutant is not available for a country, then we simply do not try to download those CSVs.
year_from (str) – (optional) The first year of data. Can not be earlier than 2013. Default 2013.
year_to (str) – (optional) The last year of data. Can not be later than the current year. Default <current year>.
source (str) – (optional) One of “E1a”, “E2a” or “All”. E2a (UTD) data are only available for years where E1a data have not yet been delivered (this will normally be the most recent year). Default “All”.
update_date (str|datetime) – (optional). Format “yyyy-mm-dd hh:mm:ss”. To be used when only files created or updated after a certain date is of interest.
verbose (bool) – (optional) print status messages to stderr. Default True.
preload_csv_links (bool) – (optional) Request all the csv download links from the Airbase server at object initialization. Default False.
- download_metadata(filepath)[source]¶
Download the metadata TSV file.
See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.
- Parameters
filepath (str) – Where to save the TSV
- download_to_directory(dir, skip_existing=True, raise_for_status=True)[source]¶
Download into a directory, preserving original file structure.
- Parameters
dir (str) – The directory to save files in (must exist)
skip_existing (bool) – (optional) Don’t re-download files if they exist in dir. If False, existing files in dir may be overwritten. Default True.
raise_for_status (bool) – (optional) Raise exceptions if download links return “bad” HTTP status codes. If False, a
warnings.warn()
will be issued instead. Default True.
- Returns
self
- download_to_file(filepath, raise_for_status=True)[source]¶
Download data into one large CSV.
Directory where the new CSV will be created must exist.
- Parameters
filepath (str) – The path to the new CSV.
raise_for_status (bool) – (optional) Raise exceptions if download links return “bad” HTTP status codes. If False, a
warnings.warn()
will be issued instead. Default True.
- Returns
self
Submodules¶
airbase.resources module¶
Global variables for URL templating
airbase.util module¶
Utility functions for processing the raw Portal responses, url templating, etc.
- airbase.util.countries_from_summary(summary)[source]¶
Get the list of unique countries from the summary.
- Parameters
summary (list[dict]) – The E1a summary.
- Return list[str]
The available countries.
- airbase.util.link_list_url(country, shortpl=None, year_from='2013', year_to='2022', source='All', update_date=None)[source]¶
Generate the URL where the download links for a query can be found.
- Parameters
country (str) – The 2-letter country code. See AirbaseClient.countries for options.
shortpl (str) – (optional) The pollutant number. Leave blank to get all pollutants. See AirbaseClient.pollutants_per_country for options.
year_from (str) – (optional) The first year of data. Can not be earlier than 2013. Default 2013.
year_to (str) – (optional) The last year of data. Can not be later than the current year. Default <current year>.
source (str) – (optional) One of “E1a”, “E2a” or “All”. E2a (UTD) data are only available for years where E1a data have not yet been delivered (this will normally be the most recent year). Default “All”.
update_date (str|datetime) – (optional). Format “yyyy-mm-dd hh:mm:ss”. To be used when only files created or updated after a certain date is of interest.
- Return str
The URL which will yield the list of relevant CSV download links.
- airbase.util.pollutants_from_summary(summary)[source]¶
Get the list of unique pollutants from the summary.
- Parameters
summary (list[dict]) – The E1a summary.
- Return dict
The available pollutants, with name (“pl”) as key and pollutant number (“shortpl”) as value.
airbase.fetch module¶
Helper functions encapsulating async HTTP request and file IO
- airbase.fetch.fetch_json(url, *, timeout=None, encoding=None)[source]¶
Request url and read response’s body as JSON
- Parameters
url (str) – requested url
timeout (Optional[float]) – maximum time to complete request (seconds)
encoding (Optional[str]) – text encoding used for decodding the response’s body
- Returns
decoded text from response’s body as JSON
- Return type
list[dict[str, str]]
- airbase.fetch.fetch_text(url, *, timeout=None, encoding=None)[source]¶
Request url and read response’s body
- Parameters
url (str) – requested url
timeout (Optional[float]) – maximum time to complete request (seconds)
encoding (Optional[str]) – text encoding used for decoding the response’s body
- Returns
decoded text from response’s body
- Return type
str
- airbase.fetch.fetch_to_directory(urls, root, *, skip_existing=True, encoding=None, progress=False, raise_for_status=True, max_concurrent=10)[source]¶
Request a list of url write each response to different file
- Parameters
urls (list[str]) – requested urls
root (pathlib.Path) – directory to write all responses
skip_existing (bool) – Do not re-download url if the corresponding file is found in root
encoding (Optional[str]) – text encoding used for decodding each response’s body
progress (bool) – show progress bar
raise_for_status (bool) – Raise exceptions if download links return “bad” HTTP status codes. If False, a
warnings.warn()
will be issued instead.max_concurrent (int) – maximum concurrent requests
- Return type
None
- airbase.fetch.fetch_to_file(urls, path, *, encoding=None, progress=False, raise_for_status=True, max_concurrent=10)[source]¶
Request a list of url write out all responses into a single text file
- Parameters
urls (list[str]) – requested urls
path (pathlib.Path) – text file for all combined responses
encoding (Optional[str]) – text encoding used for decodding each response’s body
progress (bool) – show progress bar
raise_for_status (bool) – Raise exceptions if download links return “bad” HTTP status codes. If False, a
warnings.warn()
will be issued instead.max_concurrent (int) – maximum concurrent requests
- Return type
None
- airbase.fetch.fetch_unique_lines(urls, *, encoding=None, progress=False, raise_for_status=True, max_concurrent=10)[source]¶
Request a list of url and return only the unique lines among all the responses
- Parameters
urls (list[str]) – requested urls
encoding (Optional[str]) – text encoding used for decodding each response’s body
progress (bool) – show progress bar
raise_for_status (bool) – Raise exceptions if download links return “bad” HTTP status codes. If False, a
warnings.warn()
will be issued instead.max_concurrent (int) – maximum concurrent requests
- Returns
unique lines among from all the responses
- Return type
set[str]
- async airbase.fetch.fetcher(urls: list[str], *, encoding: str | None = 'None', progress: bool = 'DEFAULT.progress', raise_for_status: bool = 'DEFAULT.raise_for_status', max_concurrent: int = 'DEFAULT.max_concurrent') AsyncIterator[str] [source]¶
- async airbase.fetch.fetcher(urls: dict[str, pathlib.Path], *, encoding: str | None = 'None', progress: bool = 'DEFAULT.progress', raise_for_status: bool = 'DEFAULT.raise_for_status', max_concurrent: int = 'DEFAULT.max_concurrent') AsyncIterator[pathlib.Path]
Request multiple urls and write resquest text into individual paths it a dict[url, path] is provided, or return the decoded text from each request if only a list[url] is provided.
- Parameters
urls – requested urls
encoding – text encoding used for decodding each response’s body
progress – show progress bar
raise_for_status – Raise exceptions if download links return “bad” HTTP status codes. If False, a
warnings.warn()
will be issued instead.max_concurrent – maximum concurrent requests
- Returns
url text or path to downloaded text, one by one as the requests are completed