This is the fourth post in the TIMES Series, where we break down everything about Vortex TIMES. In this one, we’ll try to give you insight into how we open the TIMES dataset using Python and how to do a few interesting plots and analyses. Check out other posts in the series from the table of contents here.
Vortex data is downloadable in various formats, with .txt being the most common for time series. Each series provides wind data at different heights. You can open the file in a spreadsheet or upload it to visualize the data in different wind resource assessment software.
At Vortex, we usually use Python to open these files, leveraging many post-processing functions programmed with the pandas or xarray modules.
Our technical team colleagues have shared a Python notebook to guide you on how to read a .txt file from TIMES at different heights and visualize the data for one or multiple heights using interesting plots.
We hope you find it useful!
The following jupyter notebook includes examples to read a TIMES
times series into Python using pandas and xarray, and also how to join the times series for different heights to a single xarray dataset. We also include some examples of easy summaries and visualizations that can be done with the read output.
The file at 100m for the Sample TIMES run was downloaded from the Vortex interface for this exercise, at local time zone (+2 UTC). There are three lines on top that give information about the site coordinates, and the file height, timezone and request time:
Lat=-32.556797 Lon=20.691242 Hub-Height=100.0 Timezone=02.00 (file requested on 2023-12-20 14:16:13 UTC0)
VORTEX (www.vortexfdc.com) - Computed at 333m resolution based on ERA5 data
Next, the timeseries data is saved in fixed-width columns (ie, the columns are separated by one or multiple white spaces so that the columns are aligned). The first two columns give information about the date and time, and the other columns the variables of the dataset including the units (as explained in this previous post form the TIMES Blog Series).
YYYYMMDD HHMM M(m/s) D(deg) SD(m/s) T(C) De(k/m3) PRE(hPa) RiNumber RH(%) RMOL(1/m) VertM(m/s)
20100101 0200 6.81 73.3 0.54 14.4 1.01 834.3 0.37 81.1 0.0004 -0.04
20100101 0210 6.99 75.6 0.60 14.5 1.01 834.3 0.70 81.2 0.0006 -0.09
20100101 0220 6.72 69.6 0.41 14.8 1.01 834.3 0.14 82.1 0.0009 -0.07
...
We can open it in python as a pandas DataFrame using pandas.read_csv()
with the following options:
import pandas as pd
file = 'example_txt/vortex.times.677619.6m 100m UTC+02.0 ERA5.txt'
df = pd.read_csv(file, sep=r"\s+", skiprows=3, header=0, names=None,
parse_dates={'time': [0, 1]}, index_col='time')
df
M(m/s) | D(deg) | SD(m/s) | T(C) | De(k/m3) | PRE(hPa) | RiNumber | RH(%) | RMOL(1/m) | VertM(m/s) | |
---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||
2010-01-01 02:00:00 | 6.81 | 73.3 | 0.54 | 14.4 | 1.01 | 834.3 | 0.37 | 81.1 | 0.0004 | -0.04 |
2010-01-01 02:10:00 | 6.99 | 75.6 | 0.60 | 14.5 | 1.01 | 834.3 | 0.70 | 81.2 | 0.0006 | -0.09 |
2010-01-01 02:20:00 | 6.72 | 69.6 | 0.41 | 14.8 | 1.01 | 834.3 | 0.14 | 82.1 | 0.0009 | -0.07 |
2010-01-01 02:30:00 | 6.91 | 67.4 | 0.42 | 14.4 | 1.01 | 834.2 | 0.65 | 83.6 | 0.0011 | -0.05 |
2010-01-01 02:40:00 | 6.42 | 71.0 | 0.50 | 14.1 | 1.01 | 834.2 | 0.40 | 84.8 | 0.0013 | 0.07 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2023-10-30 01:10:00 | 18.84 | 74.0 | 1.32 | 0.1 | 1.07 | 837.8 | 0.21 | 86.6 | 0.0000 | -0.42 |
2023-10-30 01:20:00 | 17.19 | 73.6 | 1.44 | 0.3 | 1.07 | 837.7 | 0.40 | 88.6 | 0.0000 | -0.02 |
2023-10-30 01:30:00 | 19.76 | 74.9 | 1.40 | 0.2 | 1.07 | 837.8 | -4.26 | 90.4 | 0.0000 | -0.54 |
2023-10-30 01:40:00 | 17.59 | 72.6 | 1.71 | -0.0 | 1.07 | 837.7 | 0.53 | 92.3 | 0.0000 | 0.28 |
2023-10-30 01:50:00 | 18.66 | 73.8 | 1.46 | -0.0 | 1.07 | 837.7 | -0.09 | 94.0 | 0.0000 | 0.84 |
727200 rows × 10 columns
However, using only this function we lose the information about the coordinates and the UTC which is in the header of the file. That is why we defined a function which is a little bit more complete that handles the coordinates, corrects the local time-zone and returns a dataframe with more standard variable names.
import numpy as np
from typing import Dict, Tuple
def read_vortex_times_pandas(filename: str) -> (Tuple)[pd.DataFrame, Dict[str, float]]:
"""
Read typical vortex time series from TIMES product
We return a pandas dataframe at utc 0 (universal time zone)
Parameters
----------
filename: str
Returns
-------
data: pd.DataFrame
Time series with time index
coords: Dict[float]
Coordinates (keys: lat, lon and lev)
"""
patterns = {'Lat=': 'lat', 'Lon=': 'lon', 'Hub-Height=': 'lev'}
# Look at the first line of the text file (the header)
# and manually look for the values of Lat= Lon= Timezone=
# and Hub-Height to save the
utc = np.nan
coords = {}
with open(filename, 'r') as f:
header = f.readline()
for info in header.split(' '):
for pattern, keyword in patterns.items():
if pattern in info:
coords[keyword] = float(info.replace(pattern, ''))
elif 'Timezone=' in info:
utc = float(info.replace('Timezone=', ''))
if np.isnan(utc):
raise ValueError('Could not read UTC from the header: ' + header)
data = pd.read_csv(filename, sep=r"\s+", skiprows=3, header=0, names=None,
parse_dates={'time': [0, 1]}, index_col='time')
data.dropna(inplace=True)
data.index = data.index - pd.Timedelta(utc, unit='h')
vars_new_names = {
'M(m/s)': 'M', 'D(deg)': 'Dir', 'SD(m/s)': 'SD', 'T(C)': 'T',
'De(k/m3)': 'D', 'PRE(hPa)': 'P', 'RiNumber': 'RI',
'RH(%)': 'RH', 'RMOL(1/m)': 'RMOL', 'VertM(m/s)': 'W'
}
data = data.rename(columns=vars_new_names)
return data, coords
df, coords = read_vortex_times_pandas(file)
print(coords)
df
{'lat': -32.556797, 'lon': 20.691242, 'lev': 100.0}
M | Dir | SD | T | D | P | RI | RH | RMOL | W | |
---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||
2010-01-01 00:00:00 | 6.81 | 73.3 | 0.54 | 14.4 | 1.01 | 834.3 | 0.37 | 81.1 | 0.0004 | -0.04 |
2010-01-01 00:10:00 | 6.99 | 75.6 | 0.60 | 14.5 | 1.01 | 834.3 | 0.70 | 81.2 | 0.0006 | -0.09 |
2010-01-01 00:20:00 | 6.72 | 69.6 | 0.41 | 14.8 | 1.01 | 834.3 | 0.14 | 82.1 | 0.0009 | -0.07 |
2010-01-01 00:30:00 | 6.91 | 67.4 | 0.42 | 14.4 | 1.01 | 834.2 | 0.65 | 83.6 | 0.0011 | -0.05 |
2010-01-01 00:40:00 | 6.42 | 71.0 | 0.50 | 14.1 | 1.01 | 834.2 | 0.40 | 84.8 | 0.0013 | 0.07 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2023-10-29 23:10:00 | 18.84 | 74.0 | 1.32 | 0.1 | 1.07 | 837.8 | 0.21 | 86.6 | 0.0000 | -0.42 |
2023-10-29 23:20:00 | 17.19 | 73.6 | 1.44 | 0.3 | 1.07 | 837.7 | 0.40 | 88.6 | 0.0000 | -0.02 |
2023-10-29 23:30:00 | 19.76 | 74.9 | 1.40 | 0.2 | 1.07 | 837.8 | -4.26 | 90.4 | 0.0000 | -0.54 |
2023-10-29 23:40:00 | 17.59 | 72.6 | 1.71 | -0.0 | 1.07 | 837.7 | 0.53 | 92.3 | 0.0000 | 0.28 |
2023-10-29 23:50:00 | 18.66 | 73.8 | 1.46 | -0.0 | 1.07 | 837.7 | -0.09 | 94.0 | 0.0000 | 0.84 |
727200 rows × 10 columns
df['M'].describe()
count 727200.000000 mean 8.120100 std 3.921388 min 0.010000 25% 5.180000 50% 7.690000 75% 10.500000 max 27.680000 Name: M, dtype: float64
import matplotlib.pyplot as plt
# Plot the 300 first timestamps of the wind speed
df['M'].iloc[:300].plot()
plt.show()
# Plot the wind speed histogram for the full period for 0.2m/s bins
df['M'].plot.hist(bins=np.arange(0, 30, 0.2), density=True)
plt.show()
Converting from pandas dataframe to xarray dataset allows us to store the coordinates (and other attributes) in a single object.
import xarray as xr
def convert_to_xarray(df: pd.DataFrame,
coords: Dict[str, float] = None) -> xr.Dataset:
"""
Convert a dataframe to a xarray object.
Parameters
----------
df: pd.DataFrame
coords: Dict[str, float]
Info about lat, lon, lev so that the new dimensions can be added
Returns
-------
xr.Dataset
With added coordinates
"""
# Simply use the to_xarray function to convert from dataframe to dataset
# It sets the dataframe's index as the dataset's coordinates
ds: xr.Dataset = df.to_xarray()
# Add the other coordinates (which were not the dataframe indices: lat, lon, lev)
if coords is not None:
coords_dict = {name: [float(val)] for name, val in coords.items()
if name not in ds.dims}
ds = ds.assign_coords(coords_dict)
return ds
ds = convert_to_xarray(df, coords=coords)
print(ds)
<xarray.Dataset> Dimensions: (time: 727200, lat: 1, lon: 1, lev: 1) Coordinates: * time (time) datetime64[ns] 2010-01-01 ... 2023-10-29T23:50:00 * lat (lat) float64 -32.56 * lon (lon) float64 20.69 * lev (lev) float64 100.0 Data variables: M (time) float64 6.81 6.99 6.72 6.91 6.42 ... 17.19 19.76 17.59 18.66 Dir (time) float64 73.3 75.6 69.6 67.4 71.0 ... 73.6 74.9 72.6 73.8 SD (time) float64 0.54 0.6 0.41 0.42 0.5 ... 1.32 1.44 1.4 1.71 1.46 T (time) float64 14.4 14.5 14.8 14.4 14.1 ... 0.1 0.3 0.2 -0.0 -0.0 D (time) float64 1.01 1.01 1.01 1.01 1.01 ... 1.07 1.07 1.07 1.07 P (time) float64 834.3 834.3 834.3 834.2 ... 837.7 837.8 837.7 837.7 RI (time) float64 0.37 0.7 0.14 0.65 0.4 ... 0.21 0.4 -4.26 0.53 -0.09 RH (time) float64 81.1 81.2 82.1 83.6 84.8 ... 88.6 90.4 92.3 94.0 RMOL (time) float64 0.0004 0.0006 0.0009 0.0011 ... 0.0 0.0 0.0 0.0 W (time) float64 -0.04 -0.09 -0.07 -0.05 ... -0.02 -0.54 0.28 0.84
For a single time series it is not so different to using pandas, but it allows us to merge files for different heights and analyze them easily.
# I have downloaded files for the Sample TIMES run for heights 10, 50, 100, 150 and 200m
# All of them at local UTC, and stored in the folder `example_txt` with the default names
# Therefore, all 5 files that I want to merge have this name:
file_template = 'example_txt/vortex.times.677619.6m <LEV>m UTC+02.0 ERA5.txt'
# with <LEV> representing the height in meters
# We will do the dataset for each height, and store them in this list
all_heights = []
for lev in [10, 50, 100, 150, 200]:
# the name of the file can be deduced by changing <LEV> to the height value
file_lev = file_template.replace('<LEV>', str(lev))
# I use the read_vortex_times_pandas pandas to read the dataframe and the coordinates
df_lev, coords_lev = read_vortex_times_pandas(file_lev)
# Then, I convert the dataframe to an xarray dataset (that includes info about the lev)
ds_lev = convert_to_xarray(df_lev, coords=coords_lev)
# Append this dataset to the list
all_heights.append(ds_lev)
# Concatenate each dataset of the list over the 'lev' dimension
ds_full = xr.concat(all_heights, dim='lev')
print(ds_full)
<xarray.Dataset> Dimensions: (time: 727200, lev: 5, lat: 1, lon: 1) Coordinates: * time (time) datetime64[ns] 2010-01-01 ... 2023-10-29T23:50:00 * lat (lat) float64 -32.56 * lon (lon) float64 20.69 * lev (lev) float64 10.0 50.0 100.0 150.0 200.0 Data variables: M (lev, time) float64 2.92 2.98 2.72 3.02 ... 18.66 21.01 19.16 20.3 Dir (lev, time) float64 76.5 82.2 63.9 62.4 ... 74.9 76.2 74.1 75.3 SD (lev, time) float64 0.69 0.53 0.27 0.25 ... 1.15 1.03 1.34 1.22 T (lev, time) float64 14.3 13.9 14.4 14.0 ... -0.5 -0.8 -1.0 -0.8 D (lev, time) float64 1.02 1.02 1.02 1.02 ... 1.06 1.06 1.06 1.06 P (lev, time) float64 843.2 843.2 843.2 843.1 ... 827.3 827.3 827.2 RI (lev, time) float64 0.03 0.25 0.14 -0.03 ... 0.45 0.59 -0.02 -0.05 RH (lev, time) float64 85.7 86.0 86.2 86.5 ... 92.3 94.0 95.8 97.3 RMOL (lev, time) float64 0.0004 0.0006 0.0009 0.0011 ... 0.0 0.0 0.0 0.0 W (lev, time) float64 -0.02 -0.03 -0.03 -0.07 ... -0.57 0.13 0.88
ds_full['RMOL'] = ds_full['RMOL'].isel(lev=0)
# Plot the firsts timestamps for all heights
ds_full['M'].isel(time=range(200)).plot(hue='lev')
plt.show()