Video 1: PyMC3 Sampling

Now let's go out and use PyMC3 to look at an example about online advertisement. Online advertisement has ever expanding popularity because of rising internet penetration throughout the world. We can think of PyMC3 here as trying to combine data and all prior belief in a blackbox (i.e. we can't tell exactly the trajectory of each iteration) and we want to see the trajectory and infer knowledge from the distribution of posterior samples. In this example, our goal is really to find a distribution that best represents the cost that an individual online advertiser needs to spend on advertisement providers.

I'm going to make the problem a bit easier for teaching, so our research question is this: Can we utilize the data about whether a user clicked into an advertisement or not and the advertisement provider to predict the cost that a company needs to offer online advertisement to reach 10,000 users? We’ll see if we can find such a cost distribution.

So our dataset records the name of the advertisement, the provider who launches the advertisement, whether a user visited the advertisement, and the cost of an advertisement impression. I recorded 1 in the clicked column if the user indeed clicked to the advertisement after impression, and 0 if not.

Model specification:

Question 1: Why do we want to use PyMC3 to generate posterior samples?

There are two main reasons why I choose to use PyMC3 to do the Bayesian modeling process. First, specifying prior distribution and the likelihood in PyMC3 is convenient because the syntax name is good to understand. Second, the model language in PyMC3 it's typically normal in Python language, or we call it Pythonic.

Question 2: What is needed to perform a simulation in PyMC3?

Question 3: How to determine the sample size and chains in the sampling process?

Cool! Before we go further to use the models, let's talk a bit more about how they work. First of all, why do I choose 3 chains in the pm.sample function? Here I generated 3 chains so that 3 different simulations for the click-through rate are created, each is a study about people engaging in Google and Yahoo advertisements. If we adjust the chains to 10 instead of 3, then we actually increase the number of experiments. In PyMC3, each simulation can either be convergent (i.e. the sample trajectory oscillates near the destiny value) or divergent (i.e. the sample trajectory diverts from a range of points). Because of that, I'd recommend you to set a larger number of chains when the problem is complex or you're not very confident about your past knowledge and the data collected, but set a smaller number of chains if the problem is easier and you've good knowledge prior to the simulation.

So how about increasing the sample size for each simulation chain, say 20,000? I've used PyMC3 for a couple of years and most of the time if you increase the sample size of the simulation chain, you are more likely to see chains that eventually converge to some points, but the graph you'll see becomes more jaggy as you make the trajectory longer.

That's it! I hope this will help you gain some knowledge from practical experiences in using PyMC3 model sampling. As you might feel marvelous about is the point that the motivation behind PyMC3 is that this high-level language for specifying the Bayesian models has almost the same number of lines of code as you do the math. There's very little extra stuff going on here. Let's come back for the PyMC3 posterior inference in the next video!


Video 2: PyMC3 Posterior Inference

In this lecture, we'll move on to discuss the posterior inference in PyMC3 using exactly the online advertisement data for case study. Once we get that posterior distribution you get a bunch of stuff for free. In other words, as you get the samples, you're able to get means and standard deviations of the posterior distribution, credible interval and everything else that you need, which is fantastic. Now let's pause here a bit. As an introduction, we're going to look at several short questions regarding what posterior diagnostics to use for analysis.

Question 1: What features are essential to interpret after sampling for almost every PyMC3 model?

The main plot we are going to analyze in this video is called the traceplot.

Question 2: In what way could we know the quantities of posterior statistics?

Question 3: In what way could we visually understand the posterior results?

Question 4: In what way could we know if the sampled model is in a good shape?

That's it for now! You can see that the power of Bayesian estimation is that now we have a full distribution for sampling from the click-through rate of user viewing advertisements. Not only can we visualize the distribution, but also we can quantify how uncertain the posterior of click-through rate is, and most importantly, assess if our model is in a good shape. It's fortunate that our model is in good shape according to our observation in traceplot, so we don't actually need to make changes to the model prior and likelihood. But sometimes when the model becomes more complex, as you'll encounter in week 2 and week 3, changing the model using different prior as a starting point might be needed. But the main point I wanted to get across is that we can use this cool library PyMC3 to get a lot of useful information verbally and visually out of a dataset, which we don't just get a simple point estimate.

Video 3: Introduction to Arviz

The Bayesian workflow is more than just specifying the model and running inference. Certainly those are the most notable components but just specifying models and generating samples do not constitute everything that could be useful in a full Bayesian workflow.

Question 1: What does the Arviz package do?

So ArviZ fits in for the other things - it is a Python package for visualization in Bayesian workflow. It fits in for evaluating the prior distributions, it helps us evaluate sample diagnostics and it helps summarize the posterior distribution or model fit.

Question 2: Why do we need Arviz during the process of performing Bayesian analysis?

In PyMC3 we've looked at the traceplot (well Arviz also provides that), but we definitely need more visualizations to help characterize some uncertainty for whatever problems that we are working on.

Question 3: Where could we find out documentation and recent updates about the Arviz package?

Question 4: How does Arviz help fulfill the 4-step Bayesian analysis?

So the whole point of the Arviz Python package is to help organize data to make complex calculations much much easier for everybody. It helps us look inside the shape of the prior distributions and explore posterior results by performing more diagnostics on it after simulating the model in PyMC3.

Here we conclude the short Arviz tutorial! Most of the tricks of using Arviz are nothing more than importing the package and applying one of the various Arviz functions that are available in the Arviz gallery (https://arviz-devs.github.io/arviz/examples/index.html). I encourage you to dive into some Arviz plots and reading these may help you choose the best visualization to interpret results to various target audiences.


Optional Video: Cost Distribution

Now we've explored the PyMC3 and the Arviz packages in Python. Both packages are important in this course because they support Bayesian modeling from planning the analysis, to executing models and interpreting posterior results. Now let's come back to the beginning. Our research question is Can we utilize the data about whether a user clicked into an advertisement or not and the advertisement provider to predict the cost that a company needs to offer online advertisement to reach 10,000 users? Let's rewrite a bit about the model and this time we aim to find the cost distribution for each provider.

Solution:

Great! Now the trace object includes everything, including the expected cost needed for an organization to launch an online advertisement to 10,000 users.

(in-class activity: In your perspective, what different research questions do you find interesting in online advertising? Here is the time you can propose alternative questions. You're more than welcome to write down and compare your questions with your peers!)

From this example, we found that one of the most salient advantage of using PyMC3 is that the model diagnostics not only returns the posterior cost distributions here for both online ads providers, but also helps us quantify the variability of both posterior distributions especially on both tails, thus yielding more realistic predictions. Noticed that the dataset that I collected for both engines was just taken from a short timespan so it might not represent the historical trend about the online advertisement cost for both engines. If some of you do encounter online advertisement problems as a data scientist or consultant, you might need more advanced longitudinal data to determine which search engines will best suit your client.

Created in deepnote.com Created in Deepnote