In this lecture, we'll be using another case study to learn about visualizing time-series data. We'll be particularly focusing on candlestick plot, which is a special bar chart commonly used in trading platform, foreign exchange, and weather forecasts.
Candlestick charts originated in Japan during the 1700s when Japanese traders Homma analyzed the price of trade pattern of rice contracts for huge profits. His research on so called price pattern recognition was widely credited and gave rise to the global trading in Japan.
Candlestick plot delivers important information for traders given a time period. It shows the opening price, closing price, highest trading price and lowest trading price of a particular commodity over a time period. The rectangular portion of the candle is called the Body. The lines above and below the Body are called Upper and Lower Shadow respectively. The Highest Trading Price is marked by the top of the Upper Shadow and the Lowest trading Price is marked by the bottom of the Lower Shadow.
Candlesticks can display various trading patterns like bullish, bearish and many more. In particular, the green candle represents the bullish pattern in which the trading price increased over a certain time period. The bottom of the candle body shows the opening price and the top of the candle body shows the closing price.
Conversely, the red candle shows the bearish pattern in which the price dropped over a period of time. In this case the top of the candle body shows the opening price and the bottom of the candle body shows the closing price.
In case you're curious, there is a great investopedia website which shows you a variery of trading patterns related to the candlestick plots.
There are many possible ways in Python to make candlestick plots, mpl_finance, plotly, finplot are among some of the most common libraries. We'll focus on using the mpl_finance option.
# Let's bring in pandas as pd as usual
import pandas as pd
# To get the Python to manipulate how datetime is formatted in the dataset,
# we need to import the native datetime package
import datetime
# We also import matplotlib.pyplot as plt for making plots
import matplotlib.pyplot as plt
# and import date functionalities mdates
# https://matplotlib.org/3.3.2/api/dates_api.html
import matplotlib.dates as mdates
# Let's import the stock data stock pd.read_csv
# and let's parse the date column into datetime type and put it into index
stock_data = pd.read_csv('assets/stocks.csv', parse_dates = ['Date'], index_col = 1)
# and let's check it out.
stock_data.head()
# Now there are a few companies included in the stock data, so it's a good practice to
# keep a list of the company names to make candlestick plot for each company
company_list = stock_data['Ticker'].unique().tolist()
# We can run a for loop to look at the stock data more closely
# So for each company in company_list
for company in company_list:
# we print out the stock data associated with each company
print(stock_data.groupby('Ticker').get_group(company))
# So you'll see there are 2517 opening trading day from 2006 to 2015. For each trading day,
# the stock price of AAPL, MSFT, IBM, GOOG and an index fund called GSPC are shown in 4 values,
# namely High, Low, Open and Close.
# This is really a cool data!
# Let's suppose we are interested in understanding how did the stock price of Apple,
# Microsoft, IBM, Google and GSPC fluctuate over the great recession during 2008 and 2010.
# Let's pause the video and think of how do we visualize the bullish and bearish trend for each of the 5 stocks?
# So here's my solution. We could first subset the time period to between
# 2008 and 2010 and overwrite the old data
stock_data = stock_data['2008':'2010']
# and then reset the index
stock_data.reset_index(inplace = True)
# We can pass through a mapper in the date column
# to convert date to a unique numeric identifier of datetime
stock_data['Date'] = stock_data['Date'].map(mdates.date2num)
# Cool! Next, let's import lines
# which is an artistic library of legend handles supported by matplotlib,
import matplotlib.lines as mlines
# and we are specifically using mpl_finance candlestick_ohlc function to make candlestick plots
from mpl_finance import candlestick_ohlc
# Let's set up the visualization by creating our fig objects using matplotlib's
# plt.gcf() (get current figure) function as usual,
fig = plt.gcf()
# and I'm gotta set the size of it in inches to 15 by 18
fig.set_size_inches(15, 18)
#Now let's iterate over the list of companies and create a side-by-side figure for each of the 5 companies.
for i in range(len(company_list)):
# So stock_group will save a copy of the stock data splitted by company
stock_group = stock_data.groupby('Ticker').get_group(company_list[i])
# and we'll make a new subplot for each company
ax = fig.add_subplot(5, 1, i+1)
# We should be noted that candlestick plot expects the date as index, company names as labels,
# and the ohlc data as values. To recap, ohlc is just the short hand of open, high,
# low and close, values that best represent time-series trends.
# Let's pull out a list of the columns to be used.
columns = ['Date', 'Open', 'High', 'Low', 'Close']
# Cool! Now we'll use the candlestick_ohlc function (which is under the mpl_finance package)
# and we pass through the ax object, and provide the plot with the date, and ohlc trade prices values
# It's great to adjust the width of the "candles" so I use width equals 0.5 and set the color
# for bullish (colorup) to be green
candlestick_ohlc(ax,
stock_group[columns].values,
width = 0.5,
colorup = 'g')
ax.xaxis_date()
# Let's create a legend for each plot using a blue star as a logo for each company
blue_star = mlines.Line2D([], [], color = 'blue', marker = '*',
markersize = 15, label = company_list[i])
# and make the blue star as handles and set the legend at the lower right corner
plt.legend(handles = [blue_star], loc='lower right')
# and x y axis labels be Date and price respectively
plt.xlabel("Date")
plt.ylabel("Price")
#plt.savefig('my_figure.png')
# and show the graph
plt.show()
Cool! You can see the direction the stock price moved during the 3-year time frame of the candle by the color and positioning of the candlestick. When the candlestick is green, the price moves upward; when the candlestick is red, the price moves downward.
You can clearly see a majority of the candles are red during the year 2008. This reflects the overall pattern that the stock prices for all of the 5 tech companies plummetted from the beginning of 2008 to the end of the first quarter of 2009 by seeing many red candles. Apple Inc. and IBM recovered the loss in stock prices by the end of the 2010, while the rest of the three companies still traded lower than the beginning of 2008 at the end of the 2010. The size of the candles is also important. So I will invite you to make your own interpretations.
# If you are interested about making our own comical info-graphics, you can specify your own
# xkcd-style plot by wrapping your plotting code with the with plt.xkcd() statement
with plt.xkcd():
# So we'll just copy across all of our plotting information and pass through the xkcd formatting filter
fig = plt.gcf()
fig.set_size_inches(15, 18)
for i in range(len(company_list)):
stock_group = stock_data.groupby('Ticker').get_group(company_list[i])
ax = fig.add_subplot(5, 1, i+1)
columns = ['Date', 'Open', 'High', 'Low', 'Close']
candlestick_ohlc(ax,
stock_group[columns].values,
width = 0.7,
colorup = 'g')
ax.xaxis_date()
blue_star = mlines.Line2D([], [], color = 'blue', marker = '*',
markersize = 15, label = company_list[i])
plt.legend(handles = [blue_star], loc='lower right')
plt.xlabel("Date")
plt.ylabel("Price")
In this lecture I introduced you to visualize time-series data using candlestick plots, which has a number of important use cases for recognizing trade patterns. We particularly went through some useful matplotlib tricks such as mdates and mlines. You don't have to use all these for your assignments. I actually applied quite a bit myself out of the curiosity of understanding precipitation and solar level in my hometown Macau. So it's useful to take time practicing how to create stylish graphs, annotations in graphical visualizations since it's a pretty powerful way to interact with data and to interact with others as data scientists.