Historical Data
Forecast
Knowledge Center
About
Contact

the times series

A Journey Through Time-Domain Wind Resource Assessment.

4. TIMES USING PYTHON

By Marta Gil-Bardají and Patrícia Puig

This is the fourth post in the TIMES Series, where we break down everything about Vortex TIMES. In this one, we’ll try to give you insight into how we open the TIMES dataset using Python and how to do a few interesting plots and analyses. Check out other posts in the series from the table of contents here.

Reading a Vortex TIMES file in Python

Vortex data is downloadable in various formats, with .txt being the most common for time series. Each series provides wind data at different heights. You can open the file in a spreadsheet or upload it to visualize the data in different wind resource assessment software.

At Vortex, we usually use Python to open these files, leveraging many post-processing functions programmed with the pandas or xarray modules.

Our technical team colleagues have shared a Python notebook to guide you on how to read a .txt file from TIMES at different heights and visualize the data for one or multiple heights using interesting plots.

We hope you find it useful!

TIMES_python

Pyhton TIMES¶

The following jupyter notebook includes examples to read a TIMES times series into Python using pandas and xarray, and also how to join the times series for different heights to a single xarray dataset. We also include some examples of easy summaries and visualizations that can be done with the read output.

Sample TIMES at one height¶

The file at 100m for the Sample TIMES run was downloaded from the Vortex interface for this exercise, at local time zone (+2 UTC). There are three lines on top that give information about the site coordinates, and the file height, timezone and request time:

Lat=-32.556797  Lon=20.691242  Hub-Height=100.0 Timezone=02.00 (file requested on 2023-12-20 14:16:13 UTC0)

VORTEX (www.vortexfdc.com) - Computed at 333m resolution based on ERA5 data

Next, the timeseries data is saved in fixed-width columns (ie, the columns are separated by one or multiple white spaces so that the columns are aligned). The first two columns give information about the date and time, and the other columns the variables of the dataset including the units (as explained in this previous post form the TIMES Blog Series).

YYYYMMDD  HHMM M(m/s) D(deg) SD(m/s)  T(C) De(k/m3) PRE(hPa) RiNumber  RH(%) RMOL(1/m) VertM(m/s)
20100101  0200   6.81   73.3    0.54  14.4     1.01    834.3     0.37   81.1    0.0004      -0.04
20100101  0210   6.99   75.6    0.60  14.5     1.01    834.3     0.70   81.2    0.0006      -0.09
20100101  0220   6.72   69.6    0.41  14.8     1.01    834.3     0.14   82.1    0.0009      -0.07
...

We can open it in python as a pandas DataFrame using pandas.read_csv() with the following options:

In [1]:

import pandas as pd

file = 'example_txt/vortex.times.677619.6m 100m UTC+02.0 ERA5.txt'
df = pd.read_csv(file, sep=r"\s+", skiprows=3, header=0, names=None,
                parse_dates={'time': [0, 1]}, index_col='time')
df

Out[1]:

	M(m/s)	D(deg)	SD(m/s)	T(C)	De(k/m3)	PRE(hPa)	RiNumber	RH(%)	RMOL(1/m)	VertM(m/s)
time
2010-01-01 02:00:00	6.81	73.3	0.54	14.4	1.01	834.3	0.37	81.1	0.0004	-0.04
2010-01-01 02:10:00	6.99	75.6	0.60	14.5	1.01	834.3	0.70	81.2	0.0006	-0.09
2010-01-01 02:20:00	6.72	69.6	0.41	14.8	1.01	834.3	0.14	82.1	0.0009	-0.07
2010-01-01 02:30:00	6.91	67.4	0.42	14.4	1.01	834.2	0.65	83.6	0.0011	-0.05
2010-01-01 02:40:00	6.42	71.0	0.50	14.1	1.01	834.2	0.40	84.8	0.0013	0.07
...	...	...	...	...	...	...	...	...	...	...
2023-10-30 01:10:00	18.84	74.0	1.32	0.1	1.07	837.8	0.21	86.6	0.0000	-0.42
2023-10-30 01:20:00	17.19	73.6	1.44	0.3	1.07	837.7	0.40	88.6	0.0000	-0.02
2023-10-30 01:30:00	19.76	74.9	1.40	0.2	1.07	837.8	-4.26	90.4	0.0000	-0.54
2023-10-30 01:40:00	17.59	72.6	1.71	-0.0	1.07	837.7	0.53	92.3	0.0000	0.28
2023-10-30 01:50:00	18.66	73.8	1.46	-0.0	1.07	837.7	-0.09	94.0	0.0000	0.84

727200 rows × 10 columns

Read in pandas¶

However, using only this function we lose the information about the coordinates and the UTC which is in the header of the file. That is why we defined a function which is a little bit more complete that handles the coordinates, corrects the local time-zone and returns a dataframe with more standard variable names.

In [2]:

import numpy as np
from typing import Dict, Tuple


def read_vortex_times_pandas(filename: str) -> (Tuple)[pd.DataFrame, Dict[str, float]]:
   """
   Read typical vortex time series from TIMES product


   We return a pandas dataframe at utc 0 (universal time zone)


   Parameters
   ----------
   filename: str


   Returns
   -------
   data: pd.DataFrame
       Time series with time index
   coords: Dict[float]
       Coordinates (keys: lat, lon and lev)
   """
   patterns = {'Lat=': 'lat', 'Lon=': 'lon', 'Hub-Height=': 'lev'}


   # Look at the first line of the text file (the header)
   # and manually look for the values of Lat= Lon= Timezone=
   # and Hub-Height to save the
   utc = np.nan
   coords = {}
   with open(filename, 'r') as f:
       header = f.readline()
       for info in header.split(' '):
           for pattern, keyword in patterns.items():
               if pattern in info:
                   coords[keyword] = float(info.replace(pattern, ''))
               elif 'Timezone=' in info:
                   utc = float(info.replace('Timezone=', ''))
   if np.isnan(utc):
       raise ValueError('Could not read UTC from the header: ' + header)


   data = pd.read_csv(filename, sep=r"\s+", skiprows=3, header=0, names=None,
                      parse_dates={'time': [0, 1]}, index_col='time')
   data.dropna(inplace=True)
   data.index = data.index - pd.Timedelta(utc, unit='h')


   vars_new_names = {
       'M(m/s)': 'M', 'D(deg)': 'Dir', 'SD(m/s)': 'SD', 'T(C)': 'T',
       'De(k/m3)': 'D', 'PRE(hPa)': 'P', 'RiNumber': 'RI',
       'RH(%)': 'RH', 'RMOL(1/m)': 'RMOL', 'VertM(m/s)': 'W'
   }
   data = data.rename(columns=vars_new_names)

   return data, coords

df, coords = read_vortex_times_pandas(file)
print(coords)
df

{'lat': -32.556797, 'lon': 20.691242, 'lev': 100.0}

Out[2]:

	M	Dir	SD	T	D	P	RI	RH	RMOL	W
time
2010-01-01 00:00:00	6.81	73.3	0.54	14.4	1.01	834.3	0.37	81.1	0.0004	-0.04
2010-01-01 00:10:00	6.99	75.6	0.60	14.5	1.01	834.3	0.70	81.2	0.0006	-0.09
2010-01-01 00:20:00	6.72	69.6	0.41	14.8	1.01	834.3	0.14	82.1	0.0009	-0.07
2010-01-01 00:30:00	6.91	67.4	0.42	14.4	1.01	834.2	0.65	83.6	0.0011	-0.05
2010-01-01 00:40:00	6.42	71.0	0.50	14.1	1.01	834.2	0.40	84.8	0.0013	0.07
...	...	...	...	...	...	...	...	...	...	...
2023-10-29 23:10:00	18.84	74.0	1.32	0.1	1.07	837.8	0.21	86.6	0.0000	-0.42
2023-10-29 23:20:00	17.19	73.6	1.44	0.3	1.07	837.7	0.40	88.6	0.0000	-0.02
2023-10-29 23:30:00	19.76	74.9	1.40	0.2	1.07	837.8	-4.26	90.4	0.0000	-0.54
2023-10-29 23:40:00	17.59	72.6	1.71	-0.0	1.07	837.7	0.53	92.3	0.0000	0.28
2023-10-29 23:50:00	18.66	73.8	1.46	-0.0	1.07	837.7	-0.09	94.0	0.0000	0.84

727200 rows × 10 columns

In [3]:

df['M'].describe()

Out[3]:

count    727200.000000
mean          8.120100
std           3.921388
min           0.010000
25%           5.180000
50%           7.690000
75%          10.500000
max          27.680000
Name: M, dtype: float64

In [4]:

import matplotlib.pyplot as plt

# Plot the 300 first timestamps of the wind speed
df['M'].iloc[:300].plot()
plt.show()

In [5]:

# Plot the wind speed histogram for the full period for 0.2m/s bins
df['M'].plot.hist(bins=np.arange(0, 30, 0.2), density=True)
plt.show()

Read in xarray¶

Converting from pandas dataframe to xarray dataset allows us to store the coordinates (and other attributes) in a single object.

In [14]:

import xarray as xr

def convert_to_xarray(df: pd.DataFrame,
                      coords: Dict[str, float] = None) -> xr.Dataset:
    """
    Convert a dataframe to a xarray object.

    Parameters
    ----------
    df: pd.DataFrame
    coords: Dict[str, float]
        Info about lat, lon, lev so that the new dimensions can be added

    Returns
    -------
    xr.Dataset
        With added coordinates
    """
    # Simply use the to_xarray function to convert from dataframe to dataset
    # It sets the dataframe's index as the dataset's coordinates
    ds: xr.Dataset = df.to_xarray()
    
    # Add the other coordinates (which were not the dataframe indices: lat, lon, lev)
    if coords is not None:
        coords_dict = {name: [float(val)] for name, val in coords.items()
                       if name not in ds.dims}
        ds = ds.assign_coords(coords_dict)
        
    return ds

ds = convert_to_xarray(df, coords=coords)
print(ds)

<xarray.Dataset>
Dimensions:  (time: 727200, lat: 1, lon: 1, lev: 1)
Coordinates:
  * time     (time) datetime64[ns] 2010-01-01 ... 2023-10-29T23:50:00
  * lat      (lat) float64 -32.56
  * lon      (lon) float64 20.69
  * lev      (lev) float64 100.0
Data variables:
    M        (time) float64 6.81 6.99 6.72 6.91 6.42 ... 17.19 19.76 17.59 18.66
    Dir      (time) float64 73.3 75.6 69.6 67.4 71.0 ... 73.6 74.9 72.6 73.8
    SD       (time) float64 0.54 0.6 0.41 0.42 0.5 ... 1.32 1.44 1.4 1.71 1.46
    T        (time) float64 14.4 14.5 14.8 14.4 14.1 ... 0.1 0.3 0.2 -0.0 -0.0
    D        (time) float64 1.01 1.01 1.01 1.01 1.01 ... 1.07 1.07 1.07 1.07
    P        (time) float64 834.3 834.3 834.3 834.2 ... 837.7 837.8 837.7 837.7
    RI       (time) float64 0.37 0.7 0.14 0.65 0.4 ... 0.21 0.4 -4.26 0.53 -0.09
    RH       (time) float64 81.1 81.2 82.1 83.6 84.8 ... 88.6 90.4 92.3 94.0
    RMOL     (time) float64 0.0004 0.0006 0.0009 0.0011 ... 0.0 0.0 0.0 0.0
    W        (time) float64 -0.04 -0.09 -0.07 -0.05 ... -0.02 -0.54 0.28 0.84

Sample TIMES at multiple heights¶

For a single time series it is not so different to using pandas, but it allows us to merge files for different heights and analyze them easily.

In [15]:

# I have downloaded files for the Sample TIMES run for heights 10, 50, 100, 150 and 200m
# All of them at local UTC, and stored in the folder `example_txt` with the default names
# Therefore, all 5 files that I want to merge have this name:
file_template = 'example_txt/vortex.times.677619.6m <LEV>m UTC+02.0 ERA5.txt'
# with <LEV> representing the height in meters

# We will do the dataset for each height, and store them in this list
all_heights = []
for lev in [10, 50, 100, 150, 200]:
    # the name of the file can be deduced by changing <LEV> to the height value
    file_lev = file_template.replace('<LEV>', str(lev))
    # I use the read_vortex_times_pandas pandas to read the dataframe and the coordinates
    df_lev, coords_lev = read_vortex_times_pandas(file_lev)
    # Then, I convert the dataframe to an xarray dataset (that includes info about the lev)
    ds_lev = convert_to_xarray(df_lev, coords=coords_lev)
    # Append this dataset to the list
    all_heights.append(ds_lev)
    
# Concatenate each dataset of the list over the 'lev' dimension
ds_full = xr.concat(all_heights, dim='lev')
print(ds_full)

<xarray.Dataset>
Dimensions:  (time: 727200, lev: 5, lat: 1, lon: 1)
Coordinates:
  * time     (time) datetime64[ns] 2010-01-01 ... 2023-10-29T23:50:00
  * lat      (lat) float64 -32.56
  * lon      (lon) float64 20.69
  * lev      (lev) float64 10.0 50.0 100.0 150.0 200.0
Data variables:
    M        (lev, time) float64 2.92 2.98 2.72 3.02 ... 18.66 21.01 19.16 20.3
    Dir      (lev, time) float64 76.5 82.2 63.9 62.4 ... 74.9 76.2 74.1 75.3
    SD       (lev, time) float64 0.69 0.53 0.27 0.25 ... 1.15 1.03 1.34 1.22
    T        (lev, time) float64 14.3 13.9 14.4 14.0 ... -0.5 -0.8 -1.0 -0.8
    D        (lev, time) float64 1.02 1.02 1.02 1.02 ... 1.06 1.06 1.06 1.06
    P        (lev, time) float64 843.2 843.2 843.2 843.1 ... 827.3 827.3 827.2
    RI       (lev, time) float64 0.03 0.25 0.14 -0.03 ... 0.45 0.59 -0.02 -0.05
    RH       (lev, time) float64 85.7 86.0 86.2 86.5 ... 92.3 94.0 95.8 97.3
    RMOL     (lev, time) float64 0.0004 0.0006 0.0009 0.0011 ... 0.0 0.0 0.0 0.0
    W        (lev, time) float64 -0.02 -0.03 -0.03 -0.07 ... -0.57 0.13 0.88

Note: The RMOL (Inverse Monin Obukhov length) was read at each file, but it is actually a surface variable. Therefore, having it for all heights is redundant, and we could simplify it:

ds_full['RMOL'] = ds_full['RMOL'].isel(lev=0)

In [16]:

# Plot the firsts timestamps for all heights
ds_full['M'].isel(time=range(200)).plot(hue='lev')
plt.show()

In [33]:

# Plot the mean wind speed for each month at all heights
ds_full['M'].groupby('time.month').mean(dim='time').plot.scatter(hue='lev')
plt.show()

In [18]:

# Interpolate at height 90m + do the monthly means + plot the time series
ds_full['M'].interp(lev=90).resample(time='M').mean().plot()
plt.show()

In [19]:

# Plot the wind standard deviation at all heights for one day
ds_full['SD'].sel(time='2012-08-01').plot()
plt.show()

Thank you for joining us on this Python-powered exploration of TIMES data. We hope the shared notebook proves helpful in opening our product files. If you have any questions or need further assistance, feel free to contact us.

Stay tuned for more insights. Explore the entire series here. Your feedback drives our commitment to advancing atmospheric modeling with Vortex TIMES.

For pricing options, you can either contact us directly or sign up for a free account on our Vortex Interface.

We are open to suggestions about other topics that can be added to the analysis, leave your comments below:

Modeled wind resource data for the wind industry.
At any site around the world. Onshore and offshore.

info@vortexfdc.com