Project-1 it’s about a wind turbine
This is my first python project on jupyter notebook for ISTDSA. It’s not brilliant but a great start-up for me in other ways I don’t have experiment before in this area.
Let’s start explaining my first Project and share my experiments for beginners. I took my dataset from kaggle. This data includes 13 features and 85066 rows, generally clear data and take from SCADA system with 10 minutes resolution in the 2020 year.
wind turbines SCADA datasets
2020 wind turbinewww.kaggle.com
df = pd.read_csv(".\data\wt.csv")
df.head(5)
Which tools is used in this project
· Dataset: kaggle
· Programming: Anaconda, python, jupyter notebook
· Library & Visualisation: Numpy, pandas, matplotlib, seaborn
· Publish & stored: Medium, Github
Data clearing
I things this step so important for data science or data engineering. Then check the below step for good jobs.
* Looking at columns and rename their names
df.rename(columns={'unitlocation': 'unitLoc',
'ttimestamplocal': 'timestamp',
'wind direction Angle': 'windDirAng',
'pitch Angle': 'pitchAng',
'wheel hub temperature': 'wheelHubTem',
'ambient Temperature': 'ambientTem',
'Tower bottom ambient temperature': 'tBATem',
'failure time': 'failTime'},
inplace=True)
df.columns
* Timestamp column is data type show like an object but we need translate to DateTime and separate columns like a year, month and day
df['timestamp']=df['timestamp'].str.replace('\t', '')
df['timestamp']= pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S')
df["year"] = df.timestamp.apply(lambda x: x.year)
df["month"] = df.timestamp.apply(lambda x: x.strftime("%m"))
df["hour"] = df.timestamp.apply(lambda x: x.strftime("%H"))
* Check null values, if have null values then delete them or using another solution to make better use of data.
df.info(show_counts=True)RangeIndex: 85066 entries, 0 to 85065
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 unitLoc 85066 non-null object
1 timestamp 85066 non-null datetime64[ns]
2 windspeed 85066 non-null float64
3 power 85066 non-null float64
4 windDirAng 85066 non-null float64
5 rtr_rpm 85066 non-null float64
6 pitchAng 85066 non-null float64
7 generation 85066 non-null int64
8 wheelHubTem 85066 non-null float64
9 ambientTem 85066 non-null float64
10 tBATem 85066 non-null float64
11 failTime 85066 non-null float64
dtypes: datetime64[ns](1), float64(9), int64(1), object(1)
memory usage: 7.8+ MB
Those turbine generation values
Now we do some visualization with our data by Seaborn and matplot
df.groupby(['timestamp','unitLoc']).generation.max().unstack().plot(figsize=(16,8), linewidth=4)plt.xlabel('Term', fontsize=20), plt.ylabel('Power', fontsize=20)
plt.title('WTG34&WTG40 Power generation in 2020', fontsize=20)
Correlation data with seaborn
#import seaborn as sn
corrMatrix = df.corr()
fig, ax = plt.subplots(figsize=(16,8))
sn.heatmap(corrMatrix,
annot=True, #cmap="viridis",
linewidths=.001, ax=ax, annot_kws={'fontsize':10},
square=True, linecolor="#222")
plt.tight_layout()
plt.savefig('final.png', dpi=300)
We looking at another matplot function like the scatter and focus to heatmap details
* Power & Windspeed
WTG40 = df[df.unitLoc == "WTG40"]
WTG34 = df[df.unitLoc == "WTG34"]
# scatter plot power&windSpeed
plt.scatter(WTG40.power,WTG40.windspeed,color="red",label="WTG40",alpha= 0.5)
plt.scatter(WTG34.power,WTG34.windspeed,color="green",label="WTG34",alpha= 0.3)
plt.xlabel("power")
plt.ylabel("windspeed")
plt.legend()
plt.show()
As the wind speed increases, we see that the energy produced increases at a direct rate.
However, when the wind speed reaches 12.5 m/s, the electricity generated remains constant.
Protection can be taken for 12.5 m/s speed and above so that the turbine is not damaged.
* Windspeed & Wind direction Ang
# scatter plot windspeed&windDirAng
plt.scatter(WTG40.windspeed,WTG40.windDirAng,color="red",label="WTG40",alpha= 0.5)
plt.scatter(WTG34.windspeed,WTG34.windDirAng,color="green",label="WTG34",alpha= 0.3)
plt.xlabel("windspeed")
plt.ylabel("windDirAng")
plt.legend()
plt.show()
Here we see from which angle the wind is stronger.
The turbine generally gets more wind at 180° and 0° angles.
We see that there is density especially at 180°.
It seems that the WTG 40 turbine receives better winds between 250°-325°.
* Power & Wind direction Ang
# scatter plot power&windDirAng
plt.scatter(WTG40.power,WTG40.windDirAng,color="red",label="WTG40",alpha= 0.5)
plt.scatter(WTG34.power,WTG34.windDirAng,color="green",label="WTG34",alpha= 0.3)
plt.xlabel("power")
plt.ylabel("windDirAng")
plt.legend()
plt.show()
Turbine generally produces higher energy at 180° and 0° angles.
We see that there is volume especially at 180°.
It seems that most of the energy produced here is at 180.
Finally, in two weeks I do so many things with ISTDSA thanks to everyone for their contribution.