airbase package

class airbase.AirbaseClient[source]

Bases: object

The central point for requesting Airbase data.

Example
>>> client = AirbaseClient()
>>> r = client.request(["NL", "DE"], pl=["O3", "NO2"])
>>> r.download_to_directory("data/raw")
Generating CSV download links...
100%|██████████| 4/4 [00:09<00:00,  2.64s/it]
Generated 5164 CSV links ready for downloading
Downloading CSVs to data/raw...
100%|██████████| 5164/5164 [43:39<00:00,  1.95it/s]
>>> r.download_metadata("data/metadata.tsv")
Writing metadata to data/metadata.tsv...
Return type

None

all_countries

All countries available from AirBase

all_pollutants

All pollutants available from AirBase

static download_metadata(filepath, verbose=True)[source]

Download the metadata file.

See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.

Parameters
  • filepath (str | pathlib.Path) –

  • verbose (bool) –

Return type

None

pollutants_per_country: dict[str, list[airbase.airbase.PollutantDict]]

The pollutants available in each country from AirBase.

request(country=None, pl=None, shortpl=None, year_from='2013', year_to='2022', source='All', update_date=None, verbose=True, preload_csv_links=False)[source]

Initialize an AirbaseRequest for a query.

Pollutants can be specified either by name (pl) or by code (shortpl). If no pollutants are specified, data for all available pollutants will be requested. If a pollutant is not available for a country, then we simply do not try to download those CSVs.

Requests proceed in two steps: First, links to individual CSVs are requested from the Airbase server. Then these links are used to download the individual CSVs.

See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.

Parameters
  • country (Optional[Union[str, list[str]]]) – (optional), 2-letter country code or a list of them. If a list, data will be requested for each country. Will raise ValueError if a country is not available on the server. If None, data for all countries will be requested. See self.all_countries.

  • pl (Optional[Union[str, list[str]]]) – (optional) The pollutant(s) to request data for. Must be one of the pollutants in self.all_pollutants. Cannot be used in conjunction with shortpl.

  • shortpl (Optional[Union[str, list[str]]]) – (optional). The pollutant code(s) to request data for. Will be applied to each country requested. Cannot be used in conjunction with pl.

  • year_from (str) – (optional) The first year of data. Can not be earlier than 2013. Default 2013.

  • year_to (str) – (optional) The last year of data. Can not be later than the current year. Default <current year>.

  • source (str) – (optional) One of “E1a”, “E2a” or “All”. E2a (UTD) data are only available for years where E1a data have not yet been delivered (this will normally be the most recent year). Default “All”.

  • update_date (Optional[Union[str, datetime.datetime]]) – (optional). Format “yyyy-mm-dd hh:mm:ss”. To be used when only files created or updated after a certain date is of interest.

  • verbose (bool) – (optional) print status messages to stderr. Default True.

  • preload_csv_links (bool) – (optional) Request all the csv download links from the Airbase server at object initialization. Default False.

Return AirbaseRequest

The initialized AirbaseRequest.

Example
>>> client = AirbaseClient()
>>> r = client.request(["NL", "DE"], pl=["O3", "NO2"])
>>> r.download_to_directory("data/raw")
Generating CSV download links...
100%|██████████| 4/4 [00:09<00:00,  2.64s/it]
Generated 5164 CSV links ready for downloading
Downloading CSVs to data/raw...
100%|██████████| 5164/5164 [43:39<00:00,  1.95it/s]
>>> r.download_metadata("data/metadata.tsv")
Writing metadata to data/metadata.tsv...
Return type

airbase.airbase.AirbaseRequest

search_pollutant(query, limit=None)[source]

Search for a pollutant’s shortpl number based on its name.

Parameters
  • query (str) – The pollutant to search for.

  • limit (Optional[int]) – (optional) Max number of results.

Returns

The best pollutant matches. Pollutants are dicts with keys “pl” and “shortpl”.

Example
>>> AirbaseClient().search_pollutant("o3", limit=2)
>>> [{"pl": "O3", "shortpl": "7"}, {"pl": "NO3", "shortpl": "46"}]
Return type

list[airbase.airbase.PollutantDict]

class airbase.AirbaseRequest(country=None, shortpl=None, year_from='2013', year_to='2022', source='All', update_date=None, verbose=True, preload_csv_links=False)[source]

Bases: object

Handler for Airbase data requests.

Requests proceed in two steps: First, links to individual CSVs are requested from the Airbase server. Then these links are used to download the individual CSVs.

See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.

Parameters
  • country (str | list[str] | None) – 2-letter country code or a list of them. If a list, data will be requested for each country.

  • shortpl (str | list[str] | None) – (optional). The pollutant code to request data for. Will be applied to each country requested. If None, all available pollutants will be requested. If a pollutant is not available for a country, then we simply do not try to download those CSVs.

  • year_from (str) – (optional) The first year of data. Can not be earlier than 2013. Default 2013.

  • year_to (str) – (optional) The last year of data. Can not be later than the current year. Default <current year>.

  • source (str) – (optional) One of “E1a”, “E2a” or “All”. E2a (UTD) data are only available for years where E1a data have not yet been delivered (this will normally be the most recent year). Default “All”.

  • update_date (str | datetime | None) – (optional). Format “yyyy-mm-dd hh:mm:ss”. To be used when only files created or updated after a certain date is of interest.

  • verbose (bool) – (optional) print status messages to stderr. Default True.

  • preload_csv_links (bool) – (optional) Request all the csv download links from the Airbase server at object initialization. Default False.

Return type

None

download_metadata(filepath)[source]

Download the metadata TSV file.

See http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm.

Parameters

filepath (str | pathlib.Path) – Where to save the TSV

Return type

None

download_to_directory(dir, skip_existing=True, raise_for_status=True)[source]

Download into a directory, preserving original file structure.

Parameters
  • dir (str | pathlib.Path) – The directory to save files in (must exist)

  • skip_existing (bool) – (optional) Don’t re-download files if they exist in dir. If False, existing files in dir may be overwritten. Default True.

  • raise_for_status (bool) – (optional) Raise exceptions if download links return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead. Default True.

Returns

self

Return type

airbase.airbase.AirbaseRequest

download_to_file(filepath, raise_for_status=True)[source]

Download data into one large CSV.

Directory where the new CSV will be created must exist.

Parameters
  • filepath (str | pathlib.Path) – The path to the new CSV.

  • raise_for_status (bool) – (optional) Raise exceptions if download links return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead. Default True.

Returns

self

Return type

airbase.airbase.AirbaseRequest

Submodules

airbase.resources module

Global variables for URL templating

airbase.util module

Utility functions for processing the raw Portal responses, url templating, etc.

Generate the URL where the download links for a query can be found.

Parameters
  • country (str | None) – The 2-letter country code. See AirbaseClient.countries for options.

  • shortpl (Optional[str]) – (optional) The pollutant number. Leave blank to get all pollutants. See AirbaseClient.pollutants_per_country for options.

  • year_from (str) – (optional) The first year of data. Can not be earlier than 2013. Default 2013.

  • year_to (str) – (optional) The last year of data. Can not be later than the current year. Default <current year>.

  • source (str) – (optional) One of “E1a”, “E2a” or “All”. E2a (UTD) data are only available for years where E1a data have not yet been delivered (this will normally be the most recent year). Default “All”.

  • update_date (Optional[Union[str, datetime.datetime]]) – (optional). Format “yyyy-mm-dd hh:mm:ss”. To be used when only files created or updated after a certain date is of interest.

Returns

The URL which will yield the list of relevant CSV download links.

Return type

str

airbase.util.string_safe_list(obj: None) list[None][source]
airbase.util.string_safe_list(obj: Union[str, Iterable[str]]) list[str]

Turn an (iterable) object into a list. If it is a string or not iterable, put the whole object into a list of length 1.

Parameters

obj

Return list

airbase.fetch module

Helper functions encapsulating async HTTP request and file IO

airbase.fetch.fetch_json(url, *, timeout=None, encoding=None)[source]

Request url and read response’s body as JSON

Parameters
  • url (str) – requested url

  • timeout (Optional[float]) – maximum time to complete request (seconds)

  • encoding (Optional[str]) – text encoding used for decodding the response’s body

Returns

decoded text from response’s body as JSON

Return type

list[dict[str, str]]

airbase.fetch.fetch_text(url, *, timeout=None, encoding=None)[source]

Request url and read response’s body

Parameters
  • url (str) – requested url

  • timeout (Optional[float]) – maximum time to complete request (seconds)

  • encoding (Optional[str]) – text encoding used for decoding the response’s body

Returns

decoded text from response’s body

Return type

str

airbase.fetch.fetch_to_directory(urls, root, *, skip_existing=True, encoding=None, progress=False, raise_for_status=True, max_concurrent=10)[source]

Request a list of url write each response to different file

Parameters
  • urls (list[str]) – requested urls

  • root (pathlib.Path) – directory to write all responses

  • skip_existing (bool) – Do not re-download url if the corresponding file is found in root

  • encoding (Optional[str]) – text encoding used for decodding each response’s body

  • progress (bool) – show progress bar

  • raise_for_status (bool) – Raise exceptions if download links return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead.

  • max_concurrent (int) – maximum concurrent requests

Return type

None

airbase.fetch.fetch_to_file(urls, path, *, encoding=None, progress=False, raise_for_status=True, max_concurrent=10)[source]

Request a list of url write out all responses into a single text file

Parameters
  • urls (list[str]) – requested urls

  • path (pathlib.Path) – text file for all combined responses

  • encoding (Optional[str]) – text encoding used for decodding each response’s body

  • progress (bool) – show progress bar

  • raise_for_status (bool) – Raise exceptions if download links return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead.

  • max_concurrent (int) – maximum concurrent requests

Return type

None

airbase.fetch.fetch_unique_lines(urls, *, encoding=None, progress=False, raise_for_status=True, max_concurrent=10)[source]

Request a list of url and return only the unique lines among all the responses

Parameters
  • urls (list[str]) – requested urls

  • encoding (Optional[str]) – text encoding used for decodding each response’s body

  • progress (bool) – show progress bar

  • raise_for_status (bool) – Raise exceptions if download links return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead.

  • max_concurrent (int) – maximum concurrent requests

Returns

unique lines among from all the responses

Return type

set[str]

async airbase.fetch.fetcher(urls: list[str], *, encoding: str | None = 'None', progress: bool = 'DEFAULT.progress', raise_for_status: bool = 'DEFAULT.raise_for_status', max_concurrent: int = 'DEFAULT.max_concurrent') AsyncIterator[str][source]
async airbase.fetch.fetcher(urls: dict[str, pathlib.Path], *, encoding: str | None = 'None', progress: bool = 'DEFAULT.progress', raise_for_status: bool = 'DEFAULT.raise_for_status', max_concurrent: int = 'DEFAULT.max_concurrent') AsyncIterator[pathlib.Path]

Request multiple urls and write resquest text into individual paths it a dict[url, path] is provided, or return the decoded text from each request if only a list[url] is provided.

Parameters
  • urls – requested urls

  • encoding – text encoding used for decodding each response’s body

  • progress – show progress bar

  • raise_for_status – Raise exceptions if download links return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead.

  • max_concurrent – maximum concurrent requests

Returns

url text or path to downloaded text, one by one as the requests are completed

airbase.summary module

class airbase.summary.DB[source]

Bases: object

In DB containing the available country and pollutants

classmethod countries()[source]

Get the list of unique countries from the summary.

Returns

list of available country codes

Return type

list[str]

classmethod cursor()[source]

db cursor as a “self closing” context manager

Return type

Iterator[sqlite3.Cursor]

db = <sqlite3.Connection object>
classmethod pollutants()[source]

Get the list of unique pollutants from the summary.

Parameters

summary – The E1a summary.

Returns

The available pollutants, as a dictionary with

Return type

dict[str, str]

with name as keys with name as values, e.g. {“NO”: “38”, …}

classmethod pollutants_per_country()[source]

Get the available pollutants per country from the summary.

Returns

All available pollutants per country, as a dictionary with

Return type

dict[str, dict[str, int]]

with country code as keys and a dictionary of pollutant/ids (e.g. {“NO”: 38, …}) as values.

classmethod search_pollutant(query, *, limit=None)[source]

Search for a pollutant’s ID number based on its name.

Parameters
  • query (str) – The pollutant to search for.

  • limit (Optional[int]) – (optional) Max number of results.

Returns

The best pollutant matches, as a dictionary with

Return type

dict[str, int]

with name as keys with name as values, e.g. {“NO”: 38, …}