{ "cells": [ { "cell_type": "markdown", "id": "4b6d4f7a-6bc3-4885-8fda-e40eacc48a0d", "metadata": {}, "source": [ "# Climatology" ] }, { "cell_type": "markdown", "id": "49705b35-8af5-4de8-8848-845158fbf1ba", "metadata": {}, "source": [ "## Overview\n", "The climato method from the lntime module enables regression of a signal from a dataset:\n", " - on annual or semi-annual cycles\n", " - on polynomials of any order\n", " - on any user-defined function\n", "\n", "It returns an instance of the *Coeffs_climato* class.\n", "\n", "\n", "The time sampling can be arbitrary.\n", "\n", "Basically, the climato method solves a least squares problem **Y = A.X**, where **Y** is a time-dependent signal and **X** is a matrix of defined temporal functions (annual cycle, mean, trend, acceleration, arbitrary functions, etc.). **A** thus contains the coefficients to apply to each of these temporal functions to minimize the residuals **Y - A.X**.\n", "\n", "Input dataset format\n", " - One dimension must correspond to time. The time variable does not necessarily need to be a coordinate; it can be a variable in the dataset.\n", " - There can be any number of dimensions independent of time (latitude, longitude, depth, models, etc.).\n", "\n", "\n", "## Coeffs_climato Class\n", "### Initialisation\n", "```python\n", "clim = Coeffs_climato(ds, dim='time', var=None, Nmin=6, cycle=True, order=1)\n", "```\n", "- **ds**: dataset or dataarray containing the data \n", "- **dim**: name of the dimension over which to perform the regressions \n", "- **var**: name of the variable or coordinate containing time information. If None, `var = dim` \n", "- **Nmin**: minimum number of valid data points required to compute the coefficients. This number is added to the number of regression functions. For example, if `Nmin=0` and you're computing annual and semi-annual cycles, trend, and mean, you need at least 6 valid points; otherwise, the coefficients will be set to NaN. \n", "- **cycle**: use annual and semi-annual cycle functions by default \n", "- **order**: polynomial order for regression. If `order = -1`, no polynomial regression is applied.\n", "\n", "### Adding custom functions:\n", "```python\n", "clim.add_coeffs(coefficients, func, *var, ref=None, scale=pd.to_timedelta(\"1D\").asm8, **kwargs)\n", "```\n", "- **coefficients**: names of the coefficients corresponding to the output of the `func` function \n", "- **func**: function that takes a time series as input and returns a dataarray \n", "- **var**: variables passed as parameters to the `func` \n", "- **ref**: temporal origin used in computing the function \n", "- **scale**: time scaling used in computing the function \n", "- **kwargs**: additional parameters passed to the function\n", "\n", "**The computed function is actually evaluated as**: `func((x - ref) / scale)`\n", "\n", "### Coefficient computation:\n", "```python\n", "coeffs = clim.solve(measure=None, chunk=None, weight=None, t_min=None, t_max=None)\n", "```\n", "- **measure**: name of the variable in the input dataset `ds` for which to compute the climatology. If the input is a dataArray, leave this blank. \n", "- **chunk**: enables parallel processing for large datasets with multiple dimensions \n", "- **weight**: weighting matrix for the least squares computation. If `None`, all measurements are equally weighted \n", "- **t_min, t_max**: start and end times for the climatology calculation. If `None`, no time limit is applied. \n", "\n", "**This method returns an instance of the *Signal_climato* class.**\n", " \n", "## Signal_climato Class\n", "This class is used to work with climatological coefficients. It is returned by the *solve* method from the *Coeffs_climato* class, but can also be constructed from previously saved coefficients. In that case, it must be properly initialized with all the parameters of the functions used to perform the regression.\n", "\n", "### Initialization\n", "```python\n", "signal = Signal_climato(coeffs, dim='time', var=None, cycle=True, order=1, ref=None, ds=None, measure=None)\n", "```\n", "- **coeffs**: coefficients computed by the *Coeffs_climato* class \n", "- **dim**: name of the dimension over which to perform the regressions \n", "- **var**: name of the variable or coordinate containing time information. If None, `var = dim` \n", "- **cycle**: use of annual and semi-annual cycle functions by default \n", "- **order**: polynomial order for regression. If `order = -1`, no polynomial regression \n", "- **ref**: default reference for built-in functions (annual cycle and polynomial) \n", "- **ds**: dataset or dataarray used for coefficient computation (optional; only used for residual or interpolation calculations) \n", "- **measure**: if `ds` is defined and is a dataset, name of the variable containing the measurements used to compute the coefficients \n", "\n", "\n", "### Adding custom functions:\n", "Same method as for coefficient calculation. These functions must be defined in a way that is consistent with those used during coefficient computation. \n", "Unnecessary when the instance is obtained from the *solve* method of the *Coeffs_climato* class.\n", "\n", "### Outputs\n", "```python\n", "signal.climatology(coefficients=None, x=None) :\n", "```\n", "Regressed climatological function\n", "- **coefficients**: names of coefficients to include in the output time series. If `None`, all coefficients are used \n", "- **x**: time points for computation. If `None`, time values from the input dataset are used (`x = ds.var`)\n", "\n", "```python\n", "signal.residuals(coefficients=None)\n", "```\n", "Residuals between measurements and the regressed climatological function \n", "- **coefficients**: names of coefficients to consider for residuals calculation. If `None`, all coefficients are used \n", "\n", "```python\n", "signal.signal(x=None, coefficients=None, method='linear')\n", "```\n", "Signal combining interpolated residuals and the regressed climatological function. If residuals contain NaNs, they are interpolated using the chosen method \n", "- **x**: time points for computation. If `None`, time values from the input dataset are used (`x = ds.var`) \n", "- **coefficients**: names of coefficients to consider. If `None`, all are used \n", "- **method**: interpolation method for calculation times and NaN values, from `scipy.interp1d` options\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "d2355246-b710-4010-a8c3-76b2cdb8a82e", "metadata": { "tags": [] }, "outputs": [], "source": [ "import lenapy\n", "import xarray as xr\n", "import os.path\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from lenapy.constants import *\n" ] }, { "cell_type": "markdown", "id": "d61ab661-5d6a-457c-a96e-8613d2338bd7", "metadata": {}, "source": [ "## Example on a Simple Time Series" ] }, { "cell_type": "code", "execution_count": 2, "id": "b09abf32-44e7-4108-8338-4d68ead6be5f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 2MB\n",
"Dimensions: (time: 216, latitude: 30, longitude: 30)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 2kB 2005-01-14T23:46:17.343750 ... 2022-...\n",
" * latitude (latitude) float64 240B 30.5 31.5 32.5 33.5 ... 57.5 58.5 59.5\n",
" * longitude (longitude) float64 240B -29.5 -28.5 -27.5 ... -2.5 -1.5 -0.5\n",
"Data variables:\n",
" ohc (time, latitude, longitude) float64 2MB ...\n",
" gohc (time) float64 2kB ...<xarray.Dataset> Size: 260kB\n",
"Dimensions: (N_PROF: 8124)\n",
"Dimensions without coordinates: N_PROF\n",
"Data variables:\n",
" latitude (N_PROF) float64 65kB ...\n",
" longitude (N_PROF) float64 65kB ...\n",
" time (N_PROF) datetime64[ns] 65kB ...\n",
" ohc (N_PROF) float64 65kB 3.301e+10 3.092e+10 ... 3.439e+10 3.549e+10<xarray.Dataset> Size: 6kB\n",
"Dimensions: (time: 241)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 2kB 2004-01-01 2004-02-01 ... 2024-01-01\n",
"Data variables:\n",
" latitude (time) float64 2kB 45.0 45.0 45.0 45.0 ... 45.0 45.0 45.0 45.0\n",
" longitude (time) float64 2kB -15.0 -15.0 -15.0 -15.0 ... -15.0 -15.0 -15.0<xarray.Dataset> Size: 600B\n",
"Dimensions: (coeffs: 10)\n",
"Coordinates:\n",
" * coeffs (coeffs) <U13 520B 'cosAnnual' 'sinAnnual' ... 'GradLat' 'GradLon'\n",
"Data variables:\n",
" ohc (coeffs) float64 80B ...