Difference makes the DIFFERENCE
In this project we will be working with a fake advertising data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user.
This data set contains the following features:
Import a few libraries you think you'll need (Or just import them as you go along!)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics
Read in the advertising.csv file and set it to a data frame called ad_data.
ad_data = pd.read_csv('/content/advertising.csv')
Check the head of ad_data
ad_data.head()
Use info and describe() on ad_data
ad_data.describe()
ad_data.info()
Let's use seaborn to explore the data!
Try recreating the plots shown below!
Create a histogram of the Age
sns.set_style = "whitegrid"
sns.histplot(x = 'Age', data = ad_data, bins =30)
ad_data['Age'].hist(bins = 30)
Create a jointplot showing Area Income versus Age.
sns.jointplot(x = 'Age', y= 'Area Income', data = ad_data, hue = "Clicked on Ad")
Create a jointplot showing the kde distributions of Daily Time spent on site vs. Age.
sns.jointplot(kind = 'kde', x= "Area Income", y = "Age", data = ad_data, hue = "Clicked on Ad", cmap = "Blues")
Create a jointplot of 'Daily Time Spent on Site' vs. 'Daily Internet Usage'
sns.jointplot(x = "Daily Time Spent on Site", y = "Daily Internet Usage", hue = "Clicked on Ad", data = ad_data, cmap = "Green")
Finally, create a pairplot with the hue defined by the 'Clicked on Ad' column feature.
sns.pairplot(hue = "Clicked on Ad", data = ad_data)