airbase.parquet_api package

class airbase.parquet_api.AggregationType(value)[source]

Bases: str, Enum

represents whether the data collected is obtaining the values: 1. Hourly data. 2. Daily data. 3. Variable intervals (different than the previous observations such as weekly, monthly, etc.)

https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf

Daily = 'day'
Hourly = 'hour'
Other = 'var'
VariableIntervals = 'var'
class airbase.parquet_api.Client(*, timeout=None, max_concurrent=10)[source]

Bases: AbstractAsyncContextManager

Handle for requests to Parquet downloads API v1 https://eeadmz1-downloads-api-appservice.azurewebsites.net/swagger/index.html

Parameters:
  • timeout (float | None) –

  • max_concurrent (int) –

async city(payload)[source]

post request to /City

Parameters:

payload (tuple[str, ...]) –

Return type:

CityJSON

async country()[source]

get request to /Country

Return type:

CountryJSON

async download_binary(url, path)[source]

get request to url, write response body content (in binary form) into a a binary file, and return path (exactly as the input)

Parameters:
  • url (str) –

  • path (Path) –

Return type:

Path

async download_metadata(path)[source]

download compressed metadata file and returns path to uncompressed csv

Parameters:

path (Path) –

Return type:

Path

async download_summary(payload)[source]

post request to /DownloadSummary

Parameters:

payload (ParquetDataJSON) –

Return type:

DownloadSummaryJSON

async download_urls(payload)[source]

post request to /ParquetFile/urls

Parameters:

payload (ParquetDataJSON) –

Return type:

str

async pollutant()[source]

get request to /Property

Return type:

PollutantJSON

class airbase.parquet_api.Dataset(value)[source]

Bases: IntEnum

1. Unverified data transmitted continuously (Up-To-Date/UTD/E2a) data from the beginning of 2024. 2. Verified data (E1a) from 2013 to 2023 reported by countries by 30 September each year for the previous year. 3. Historical Airbase data delivered between 2002 and 2012 before Air Quality Directive 2008/50/EC entered into force.

https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf

Airbase = 3
E1a = 2
E2a = 1
Historical = 3
UDT = 1
Unverified = 1
Verified = 2
class airbase.parquet_api.ParquetData(country, dataset, pollutant=None, city=None, frequency=None, source='API')[source]

Bases: NamedTuple

info needed for requesting the URLs for country and dataset the request can be further restricted with the pollutant, city and frequency

Create new instance of ParquetData(country, dataset, pollutant, city, frequency, source)

Parameters:
  • country (str) –

  • dataset (Dataset) –

  • pollutant (frozenset[str] | None) –

  • city (str | None) –

  • frequency (AggregationType | None) –

  • source (str) –

city: str | None

Alias for field number 3

country: str

Alias for field number 0

dataset: Dataset

Alias for field number 1

frequency: AggregationType | None

Alias for field number 4

payload()[source]
Return type:

ParquetDataJSON

pollutant: frozenset[str] | None

Alias for field number 2

source: str

Alias for field number 5

class airbase.parquet_api.Session(*, progress=False, raise_for_status=True)[source]

Bases: AbstractAsyncContextManager

Parameters:
  • progress (bool) – (optional, default False) Show progress bars

  • raise_for_status (bool) – (optional, default True) Raise exceptions if any request from summary, url_to_files or download_to_directory methods returns “bad” HTTP status codes. If False, a warnings.warn() will be issued instead. Default True.

add_expected(numberFiles, size)[source]

add to the expected download files and size in Mb

Parameters:
  • numberFiles (int) –

  • size (int) –

Return type:

None

add_urls(more_urls)[source]

add to the unique URLs ready for download

Parameters:

more_urls (Iterable[str]) –

Return type:

int

async cities(*countries)[source]

city names id and notation from API

Parameters:

countries (str) –

Return type:

defaultdict[str, set[str]]

clear()[source]

reset URLs and expected download values

Return type:

None

client: Client = <airbase.parquet_api.client.Client object>
countries[source]

request country codes from API

async download_metadata(path, skip_existing=True)[source]

download station metadata into the given path.

Parameters:
  • path (Path) – pathlib.Path to the station metadata (parent directory must exist)

  • skip_existing (bool) – (optional, default True) Don’t re-download metadata if path already exists. If False, path may be overwritten.

Return type:

None

async download_to_directory(root_path, *, country_subdir=True, skip_existing=True)[source]

download into a directory

Parameters:
  • root_path (Path) – The directory to save files in (must exist)

  • country_subdir (bool) – (optional, default True) Download files for different counties to different root_path sub directories. If False, download all files to root_path

  • skip_existing (bool) – (optional, default True) Don’t re-download files if they exist in root_path. If False, existing files in root_path may be overwritten. Empty files will be re-downloaded regardless of this option.

Return type:

None

NOTE need to call url_to_files first, in order to retrieve the URLs to download, or add the urls directly with add_urls

property expected_files: int

expected number of files to download

property expected_size: int

expected download size in Mb

property number_of_urls: int

number of unique URLs ready for download

pollutants[source]

requests pollutants id and notation from API

remove_url(url)[source]

remove URL from unique URLs ready for download

Parameters:

url (str) –

Return type:

None

async summary(*download_infos)[source]

aggregated summary from multiple requests

Parameters:

download_infos (ParquetData) – info about requested urls

Return type:

None

async url_to_files(*download_infos)[source]

multiple request for file URLs and return only unique URLs from each responses

Parameters:

download_infos (ParquetData) – info about requested urls

Return type:

None

property urls: Iterable[str]

unique URLs ready for download

async airbase.parquet_api.download(dataset, root_path, *, countries, pollutants=None, cities=None, frequency=None, summary_only=False, metadata=False, country_subdir=True, overwrite=False, quiet=True, raise_for_status=False, session=<airbase.parquet_api.session.Session object>)[source]

request file urls by country|city/pollutant and download unique files

Parameters:
  • dataset (Dataset) – Dataset.Historical, Dataset.Verified or Dataset.Unverified.

  • root_path (Path) – The directory to save files in (must exist).

  • countries (frozenset[str] | set[str]) – Request observations for these countries.

  • pollutants (frozenset[str] | set[str] | None) – (optional, default None) Limit requests to these specific pollutants.

  • cities (frozenset[str] | set[str] | None) – (optional, default None) Limit requests to these specific cities.

  • summary_only (bool) – (optional, default False) Request total files/size, nothing will be downloaded.

  • metadata (bool) – (optional, default False) Download station metadata into root_path/”metadata.csv”.

  • country_subdir (bool) – (optional, default True) Download files for different counties to different root_path sub directories. If False, download all files to root_path

  • overwrite (bool) – (optional, default False) Re-download existing files in root_path. If False, existing files will be skipped. Empty files will be re-downloaded regardless of this option.

  • quiet (bool) – (optional, default True) Disable progress bars.

  • raise_for_status (bool) – (optional, default False) Raise exceptions if any request return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead.

  • frequency (AggregationType | None) –

  • session (Session) –

airbase.parquet_api.request_info_by_city(dataset, *cities, pollutants=None, frequency=None)[source]

download info one city at the time

Parameters:
  • dataset (Dataset) –

  • pollutants (frozenset[str] | set[str] | None) –

  • frequency (AggregationType | None) –

Return type:

set[airbase.parquet_api.dataset.ParquetData]

airbase.parquet_api.request_info_by_country(dataset, *countries, pollutants=None, frequency=None)[source]

download info one country at the time

Parameters:
  • dataset (Dataset) –

  • pollutants (frozenset[str] | set[str] | None) –

  • frequency (AggregationType | None) –

Return type:

set[airbase.parquet_api.dataset.ParquetData]

Submodules

airbase.parquet_api.client module

Client for Parquet downloads API v1 https://eeadmz1-downloads-api-appservice.azurewebsites.net/swagger/index.html

class airbase.parquet_api.client.Client(*, timeout=None, max_concurrent=10)[source]

Bases: AbstractAsyncContextManager

Handle for requests to Parquet downloads API v1 https://eeadmz1-downloads-api-appservice.azurewebsites.net/swagger/index.html

Parameters:
  • timeout (float | None) –

  • max_concurrent (int) –

async city(payload)[source]

post request to /City

Parameters:

payload (tuple[str, ...]) –

Return type:

CityJSON

async country()[source]

get request to /Country

Return type:

CountryJSON

async download_binary(url, path)[source]

get request to url, write response body content (in binary form) into a a binary file, and return path (exactly as the input)

Parameters:
  • url (str) –

  • path (Path) –

Return type:

Path

async download_metadata(path)[source]

download compressed metadata file and returns path to uncompressed csv

Parameters:

path (Path) –

Return type:

Path

async download_summary(payload)[source]

post request to /DownloadSummary

Parameters:

payload (ParquetDataJSON) –

Return type:

DownloadSummaryJSON

async download_urls(payload)[source]

post request to /ParquetFile/urls

Parameters:

payload (ParquetDataJSON) –

Return type:

str

async pollutant()[source]

get request to /Property

Return type:

PollutantJSON

airbase.parquet_api.client.extract_metadata_csv(archive, metadata)[source]

extract metadata CSV from zip file

Parameters:
  • archive (Path) –

  • metadata (Path) –

Return type:

Path

airbase.parquet_api.dataset module

class airbase.parquet_api.dataset.AggregationType(value)[source]

Bases: str, Enum

represents whether the data collected is obtaining the values: 1. Hourly data. 2. Daily data. 3. Variable intervals (different than the previous observations such as weekly, monthly, etc.)

https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf

Daily = 'day'
Hourly = 'hour'
Other = 'var'
VariableIntervals = 'var'
class airbase.parquet_api.dataset.Dataset(value)[source]

Bases: IntEnum

1. Unverified data transmitted continuously (Up-To-Date/UTD/E2a) data from the beginning of 2024. 2. Verified data (E1a) from 2013 to 2023 reported by countries by 30 September each year for the previous year. 3. Historical Airbase data delivered between 2002 and 2012 before Air Quality Directive 2008/50/EC entered into force.

https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf

Airbase = 3
E1a = 2
E2a = 1
Historical = 3
UDT = 1
Unverified = 1
Verified = 2
class airbase.parquet_api.dataset.ParquetData(country, dataset, pollutant=None, city=None, frequency=None, source='API')[source]

Bases: NamedTuple

info needed for requesting the URLs for country and dataset the request can be further restricted with the pollutant, city and frequency

Create new instance of ParquetData(country, dataset, pollutant, city, frequency, source)

Parameters:
  • country (str) –

  • dataset (Dataset) –

  • pollutant (frozenset[str] | None) –

  • city (str | None) –

  • frequency (AggregationType | None) –

  • source (str) –

city: str | None

Alias for field number 3

country: str

Alias for field number 0

dataset: Dataset

Alias for field number 1

frequency: AggregationType | None

Alias for field number 4

payload()[source]
Return type:

ParquetDataJSON

pollutant: frozenset[str] | None

Alias for field number 2

source: str

Alias for field number 5

airbase.parquet_api.dataset.request_info_by_city(dataset, *cities, pollutants=None, frequency=None)[source]

download info one city at the time

Parameters:
  • dataset (Dataset) –

  • pollutants (frozenset[str] | set[str] | None) –

  • frequency (AggregationType | None) –

Return type:

set[airbase.parquet_api.dataset.ParquetData]

airbase.parquet_api.dataset.request_info_by_country(dataset, *countries, pollutants=None, frequency=None)[source]

download info one country at the time

Parameters:
  • dataset (Dataset) –

  • pollutants (frozenset[str] | set[str] | None) –

  • frequency (AggregationType | None) –

Return type:

set[airbase.parquet_api.dataset.ParquetData]

airbase.parquet_api.session module

class airbase.parquet_api.session.Session(*, progress=False, raise_for_status=True)[source]

Bases: AbstractAsyncContextManager

Parameters:
  • progress (bool) – (optional, default False) Show progress bars

  • raise_for_status (bool) – (optional, default True) Raise exceptions if any request from summary, url_to_files or download_to_directory methods returns “bad” HTTP status codes. If False, a warnings.warn() will be issued instead. Default True.

add_expected(numberFiles, size)[source]

add to the expected download files and size in Mb

Parameters:
  • numberFiles (int) –

  • size (int) –

Return type:

None

add_urls(more_urls)[source]

add to the unique URLs ready for download

Parameters:

more_urls (Iterable[str]) –

Return type:

int

async cities(*countries)[source]

city names id and notation from API

Parameters:

countries (str) –

Return type:

defaultdict[str, set[str]]

clear()[source]

reset URLs and expected download values

Return type:

None

client: Client = <airbase.parquet_api.client.Client object>
countries[source]

request country codes from API

async download_metadata(path, skip_existing=True)[source]

download station metadata into the given path.

Parameters:
  • path (Path) – pathlib.Path to the station metadata (parent directory must exist)

  • skip_existing (bool) – (optional, default True) Don’t re-download metadata if path already exists. If False, path may be overwritten.

Return type:

None

async download_to_directory(root_path, *, country_subdir=True, skip_existing=True)[source]

download into a directory

Parameters:
  • root_path (Path) – The directory to save files in (must exist)

  • country_subdir (bool) – (optional, default True) Download files for different counties to different root_path sub directories. If False, download all files to root_path

  • skip_existing (bool) – (optional, default True) Don’t re-download files if they exist in root_path. If False, existing files in root_path may be overwritten. Empty files will be re-downloaded regardless of this option.

Return type:

None

NOTE need to call url_to_files first, in order to retrieve the URLs to download, or add the urls directly with add_urls

property expected_files: int

expected number of files to download

property expected_size: int

expected download size in Mb

property number_of_urls: int

number of unique URLs ready for download

pollutants[source]

requests pollutants id and notation from API

remove_url(url)[source]

remove URL from unique URLs ready for download

Parameters:

url (str) –

Return type:

None

async summary(*download_infos)[source]

aggregated summary from multiple requests

Parameters:

download_infos (ParquetData) – info about requested urls

Return type:

None

async url_to_files(*download_infos)[source]

multiple request for file URLs and return only unique URLs from each responses

Parameters:

download_infos (ParquetData) – info about requested urls

Return type:

None

property urls: Iterable[str]

unique URLs ready for download

async airbase.parquet_api.session.download(dataset, root_path, *, countries, pollutants=None, cities=None, frequency=None, summary_only=False, metadata=False, country_subdir=True, overwrite=False, quiet=True, raise_for_status=False, session=<airbase.parquet_api.session.Session object>)[source]

request file urls by country|city/pollutant and download unique files

Parameters:
  • dataset (Dataset) – Dataset.Historical, Dataset.Verified or Dataset.Unverified.

  • root_path (Path) – The directory to save files in (must exist).

  • countries (frozenset[str] | set[str]) – Request observations for these countries.

  • pollutants (frozenset[str] | set[str] | None) – (optional, default None) Limit requests to these specific pollutants.

  • cities (frozenset[str] | set[str] | None) – (optional, default None) Limit requests to these specific cities.

  • summary_only (bool) – (optional, default False) Request total files/size, nothing will be downloaded.

  • metadata (bool) – (optional, default False) Download station metadata into root_path/”metadata.csv”.

  • country_subdir (bool) – (optional, default True) Download files for different counties to different root_path sub directories. If False, download all files to root_path

  • overwrite (bool) – (optional, default False) Re-download existing files in root_path. If False, existing files will be skipped. Empty files will be re-downloaded regardless of this option.

  • quiet (bool) – (optional, default True) Disable progress bars.

  • raise_for_status (bool) – (optional, default False) Raise exceptions if any request return “bad” HTTP status codes. If False, a warnings.warn() will be issued instead.

  • frequency (AggregationType | None) –

  • session (Session) –

airbase.parquet_api.session.pollutant_id_from_url(url)[source]
numeric pollutant id from urls like

http://dd.eionet.europa.eu/vocabulary/aq/pollutant/1 http://dd.eionet.europa.eu/vocabularyconcept/aq/pollutant/44/view

Parameters:

url (str) –

Return type:

int

airbase.parquet_api.types module

type annotations from https://eeadmz1-downloads-api-appservice.azurewebsites.net/swagger/index.html

class airbase.parquet_api.types.CityData[source]

Bases: TypedDict

part of /City response

cityName: str
countryCode: str
class airbase.parquet_api.types.CountryData[source]

Bases: TypedDict

part of /Country response

countryCode: str
countryName: str
class airbase.parquet_api.types.DownloadSummaryJSON[source]

Bases: TypedDict

full /DownloadSummary response

numberFiles: int
size: int
class airbase.parquet_api.types.ParquetDataJSON[source]

Bases: TypedDict

request payload to /DownloadSummary, /ParquetFile and /ParquetFile/urls

aggregationType: NotRequired[Literal['hour', 'day', 'var'] | AggregationType]
cities: list[str]
countries: list[str]
dataset: Literal[0, 1, 2] | Dataset
dateTimeEnd: NotRequired[str]
dateTimeStart: NotRequired[str]
pollutants: list[str]
source: NotRequired[str]
class airbase.parquet_api.types.PollutantDict[source]

Bases: TypedDict

part of Pollutant response

id: str
notation: str