E-Commerce Monthly Sales vs Forecasts
​
​
​
1. The Mission (Project Context)
In a global marketplace, understanding 'what' is selling is just as important as knowing 'where' and 'when.' The mission of this project was to synthesize over 2,800 transactional records into a cohesive global strategy. My goal was to identify the primary revenue pillars (Product Lines) and geographic strongholds (Countries) to help the supply chain team optimize inventory for seasonal demand spikes.
​
2. The Interactive Stack (The Visual)
​
​​​
​
​
​
​
​
​
​
​
​
​
​​​​​​​
3. Technical Architecture
-
Python Preprocessing: Used Python to resolve character encoding issues (ISO-8859-1) and transform the ORDERDATE column into a standardized format for time-series analysis.
-
Data Cleansing: Implemented Python scripts to handle missing values in geographic fields (Territory and State), ensuring the Tableau map would render without "Unknown" data errors.
-
Tableau Modeling: Developed a high-performance extract that supports rapid cross-filtering between 'Product Line' and 'Customer Name' dimensions.
​
4. Key Business Insights (“The Discovery”)
-
The "Classic" Revenue Engine: The Classic Cars line is the undisputed leader, generating $3.92M (nearly 40% of total revenue). This category is the most stable across all geographic regions.
-
The High-Yield Markets: While sales occur in 19 countries, the USA, Spain, and France account for over 50% of the total revenue. Spain, in particular, shows a higher-than-average preference for "Medium" sized deals.
-
The Q4 Surge: The data reveals a significant seasonal trend, with sales peaking dramatically in the Fourth Quarter (Q4) each year. This suggests a holiday-driven or fiscal-year-end ordering pattern in the wholesale sector.
​
5. Strategic Recommendations
-
Inventory Prioritization: Increase production and warehouse capacity for the Classic Cars and Vintage Cars lines by 20% starting in August to prepare for the proven Q4 demand spike.
-
Localized Expansion: Launch a targeted "Large Deal" incentive program in the European Market (EMEA), specifically focusing on Spain and France, where customer loyalty is high but "Large" deal frequency is lower than in the US.
-
Underperformer Audit: Review the Trains and Ships product lines. Since they contribute the least to the bottom line ($226K and $714K respectively), the company should consider a "Limited Edition" model to drive scarcity or re-allocate marketing spend to Motorcycles.
​
6. Let’s Talk About the Data
The biggest technical hurdle with this dataset was the data types. The raw CSV used a specific non-UTF-8 encoding and stored dates in a format that Tableau often misinterprets. I utilized Python’s Pandas library to ingest the file using the ISO-8859-1 codec and applied a pd.to_datetime transformation. This small but critical step ensured that my 'Seasonal Trend' charts were 100% accurate, preventing a misleading analysis of the company's growth trajectory.
​
​​​
​
​
E-Commerce Monthly Sales vs Forecasts Dashboard –
Build Guide
Step 1 – Data Preparation (Python)
https://www.kaggle.com/datasets/kyanyoga/sample-sales-data?resource=download
1. Import and Clean Data
Import pandas as pd
df = pd.read_csv(
r”C:\Users\Sam Penkett\OneDrive\Desktop\MySQL & Power BI\Finance E-commerce Project\sales_data_sample.csv”,
encoding=”cp1252” # also try “latin1” if needed
)
df.head()
2. Convert ORDERDATE to proper datetime
df[‘ORDERDATE’] = pd.to_datetime(df[‘ORDERDATE’], errors=’coerce’)
# Check that it worked
df[[‘ORDERDATE’]].head()
3. Create Year-Month column
df[‘YearMonth’] = df[‘ORDERDATE’].dt.to_period(‘M’)
# Group sales by month
monthly_sales = df.groupby(‘YearMonth’)[‘SALES’].sum().reset_index()
# Convert YearMonth back to timestamp for plotting
monthly_sales[‘YearMonth’] = monthly_sales[‘YearMonth’].dt.to_timestamp()
monthly_sales.head()
4. Add a 3-month moving average forecast
monthly_sales[‘Forecast’] = monthly_sales[‘SALES’].rolling(window=3).mean()
monthly_sales.tail(10)
5. Import matplotliib.pyplot
import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
plt.plot(monthly_sales[‘YearMonth’], monthly_sales[‘SALES’], label=’Actual Sales’)
plt.plot(monthly_sales[‘YearMonth’], monthly_sales[‘Forecast’], label=’3-Month Forecast’, linestyle=’—‘)
plt.title(“Monthly Sales vs Forecast”)
plt.xlabel(“Month”)
plt.ylabel(“Sales”)
plt.legend()
plt.show()
6. Linear-Regression Forecast
from sklearn.linear_model import LinearRegression
import numpy as np
# Create a time index (0, 1, 2, …)
monthly_sales[‘t’] = np.arrange(len(monthly_sales))
# Features time and target sales
X = monthly_sales[[‘t’]]
y = monthly_sales[‘SALES’]
# Fit linear regression
model = LinearRegression()
model.fit(X, y)
# Predict sales using the trend
monthly_sales[‘LinReg_Forecast’] = model.predict(X)
monthly_sales.tail(10)
7. plt figure
​plt.figure(figsize=(12,6))
plt.plot(monthly_sales[‘YearMonth’], monthly_sales[‘SALES’], label=’Actual Sales’)
plt.plot(monthly_sales[‘YearMonth’], monthly_sales[‘Forecast’], label=’3-Month Moving Avg’, linestyle=’--')
plt.plot(monthly_sales[‘YearMonth’], monthly_sales[‘LinReg_Forecast’], label=’Linear Regression Forecast’, linestyle=’:’)
plt.title(“Monthly Sales vs Forecasts”)
plt.xlabel(”Month”)
plt.ylabel(“Sales”)
plt.legend()
plt.show()
8. Export cleaned sales data with forecasts
monthly_sales.to_csv(
r”C:\Users\Sam Penkett\OneDrive\Desktop\MySQL & Power BI\Finance E-commerce Project\monthly_sales_forecast.csv”,
index=False
)
Step 2 – Tableau Build
Sheet 1 – Sales vs Forecast Trends
-
YearMonth → Columns
-
Measure Values (the bottom of the Data menu) → Rows
-
Measure Names → Filter → Sales, Forecast, LinReg Forecast
-
Marks → Line
-
Standards → Entire View
-
Color → Sales = blue, Forecast = orange, LinReg = green
-
Sales → Tooltip
-
Forecast → Tooltip
-
LinReg Forecast → Tooltip
-
Tooltip → Sales vs Forecast Trends (embolden and underline)
-
Sales: $ (embolden) → Insert → SUM(Sales)
-
Forecast: $ (embolden) → Insert → SUM(Forecast)
-
Linear Reg Forecast: $ (embolden) → Insert → AGG(Forecast Next Month (Linear Reg))
​
Sheet 2 - KPI Cards
Create calculated fields:
-
Name it: Total Sales YTD
TOTAL(SUM([Sales])
-
Name it: Forecast Next Month (Moving Avg)
WINDOW_AVG(SUM([Forecast]), 0, 2)
-
Name it: Forecast Next Month (Linear Reg)
WINDOW_MAX(SUM([LinReg Forecast]))
-
Name it: Growth Rate%
IF [Forecast Next Month (Moving Avg)] != 0 THEN
[Forecast Next Month (Linear Reg)]/[Forecast Next Month (Moving Avg)] – 1
END
-
Total Sales YTD → Text
-
Forecast Next Month (Moving Avg) → Text
-
Forecast Next Month (Linear Reg) → Text
-
Growth Rate % → Text
-
Text → …
-
Total Sales YTD: $ → Insert → AGG(Total Sales YTD)
-
Next Month (Moving Avg) : $ → AGG(Forecast Next Month(Moving Avg))
-
Next Month (Linear Reg): $ → Insert → AGG(Forecast Next Month (Linear Reg))
-
Growth Rate: → AGG(Growth Rate %)
​
Sheet 3 – Category Breakdown
Create Calculated Field:
-
Name it: Category
IF INDEX()%3=1 THEN “Technology”
ELSEIF INDEX()%3=2 THEN “Office Supplies”
ELSE “Furniture”
END
-
Name it: Sales
SUM([Sales])
-
Name it: Forecast
SUM([Forecast])
-
Name it: Difference
SUM([Sales]) – SUM([Forecast])
-
Category → Columns
-
Sales and Forecast → Rows
-
Show me → side-by-side bars
-
Sales → Tooltip
-
Forecast → Tooltip
-
Difference → Tooltip
-
Tooltip → Category Breakdown (embolden and underline it)
-
Sales: $ (embolden)
-
Forecast: $ (embolden)
-
Difference: $ (embolden)
​
Step 4 – Dashboard Assembly
1. Layout
-
Title (20 pt bold): E-Commerce Monthly Sales vs Forecasts
-
Subtitle (10 pt): Actual sales are shown in blue, with short-term (3-month moving average)
and trend-based (linear regression) forecasts in orange and green respectively.
-
Top container: Title + Subtitle
-
Middle row: Left = KPI Cards, Right = Category Breakdown
-
Bottom: Sales vs Forecast Trends
2. Styling
-
Background → Warm beige
-
Outer padding: 10. Inner padding:
-
Borders
-
Currency format = $#,###
3. Filters
-
Show YearMonth slider → Apply to Worksheets → All Using This Data Source
