BI Projects | Sam - Data Analyst

Below are examples of my Business Intelligence work. Each project demonstrates how I’ve used data modeling, dashboards, and analytics to support strategic business decisions.

Retail

Project 1: Retail Sales & Profitability Dashboard

Explore the project

Executive Summary

This project identifies critical profit leaks in a national retail dataset. Specifically, it uncovers that while the Furniture category accounts for a significant portion of revenue, it is the primary driver of unprofitability due to specific sub-category pricing and logistics issues. The resulting Power BI dashboard provides executives with a dynamic "Risk vs. Reward" view of the entire product catalogue.

The Business Challenge

Retail executives lacked clear visibility into the "True Profit" of their sales. High-level revenue growth masked underlying losses in specific regions and categories. The goal was to build a BI solution that could distinguish between "Good Sales" (high margin) and "Toxic Sales" (high volume, negative margin) to inform pricing and inventory strategy.

Data Architecture & Modeling

Schema: Implemented a Star Schema to optimize report performance and simplify DAX calculations.
Dimensions: Built a custom DAX Date table to support Time Intelligence (YoY, MTD, and Trend Analysis).
Fact Table: Connected a 10k-row "Orders" table with Geography and Category dimensions.
Data Cleaning: Standardized currency types and handled null values in the "Returns" logic via Power Query.

Data Analysis (Technical Execution)

Dynamic Metric Toggle: Created a disconnected table and used SWITCH(SELECTEDVALUE()) logic to allow users to toggle the entire dashboard between Total Profit ($) and Profit Margin (%).
Advanced DAX: Developed measures for Sales ∆ vs PY (Year-over-Year) and AOV (Average Order Value) to benchmark regional performance.
Unprofitability Logic: Engineered a "Simulated Risk" measure to identify products with margins below 10%, highlighting them as "At Risk" in the executive view.

Deep Dive Insights

The "Table" Trap: The Furniture category, specifically the Tables sub-category, is the largest profit drain in the company. Despite high sales volume, the average margin is consistently negative across all regions.
Technology Dominance: The Technology category produces the highest ROI, with a profit margin 12% higher than the company average, suggesting a clear path for marketing re-investment.
Regional Variance: The Central Region shows the highest concentration of "Unprofitable Sales," likely due to higher localized shipping costs for bulky furniture items.

Strategic Recommendations

Pricing Correction: Increase the minimum price floor for the "Tables" sub-category by 8% or implement a "Bulky Item" shipping surcharge to offset negative margins.
Inventory Pivot: Shift 15% of the marketing budget from Furniture to Technology to capitalize on the higher-margin sales and improve overall company EBITDA.
Regional Audit: Conduct a logistics review in the Central Region to investigate why shipping costs are eroding profits more aggressively than in the East or West regions.

Technical Challenges & Solutions

Challenge: The raw data did not have a "Previous Year" column for comparison.
Solution: I authored a Time Intelligence measure using CALCULATE and SAMEPERIODLASTYEAR. I ensured the measure was "safe" by wrapping it in an IF(HASONEVALUE()) check to prevent confusing totals at the grand-total level.

Retail Sales and Profitability Dashboard.png

Telecom

Project 2: Call Center Performance Dashboard

Explore the project

Executive Summary

This project analyzes over 5,000 service interactions to optimize agent performance and customer satisfaction. By visualizing call distributions across topics and timeframes, I identified that while the center maintains a healthy volume, there are significant variances in resolution quality between different agents and topics. The resulting Tableau dashboard—featuring dedicated analysis for Average Handling Time (AHT), CSAT%, and First Call Resolution (FCR)—provides a diagnostic tool for supervisors to balance operational speed with high-quality customer outcomes.

The Business Challenge

The organization lacked a granular view of agent-level performance. They were unable to see if faster answer times were actually leading to better customer sentiment or if "Speed of Answer" was being prioritized at the expense of issue resolution. The goal was to move beyond simple volume metrics to identify "Resolution Gaps" and highlight top-performing agents to establish a "Best Practice" baseline for the team.

Data Architecture & Modeling

Schema: Utilized a flat-file structure optimized within Tableau for the 5,000 interaction records.
Time-String Transformation: The raw AvgTalkDuration was stored as a string. I authored a calculation using minutes and seconds.
Fixed Level of Detail (LOD) Coloring: set up three different buckets for three different colors, namely the Call Color Bucket, the AHT Color Bucket, and the CSAT Color Bucket.

Data Analysis (Technical Execution)

CSAT Calculation: Defined the CSAT% metric using a specific threshold logic: SUM(IF [Satisfaction rating] >= 4 THEN 1 ELSE 0 END) / COUNT([Satisfaction rating]). This isolates "Success" as ratings of 4 or 5 stars.
AHT Benchmarking: Built a custom color-coding system (AHT Color Bucket) that flags calls exceeding 230 seconds as "High" (Red) and those below 225 seconds as "Low" (Green) to visualize efficiency.

Deep Dive Insights

The "Quality over Speed" Paradox: Data analysis of individual agents (such as Martha and Dan) reveals that agents with slightly higher Average Handling Times often maintain the highest Overall CSAT%, suggesting that rushing calls negatively impacts sentiment.
FCR Performance: The center achieves an 71.2 Overall FCR% (First Call Resolution). However, the 20% unresolved rate is highly correlated with specific friction-heavy topics, such as Technical Support.
Volume vs. Sentiment: Using the Call Color Bucket, I discovered that "High Volume" agents do not necessarily have lower satisfaction scores, indicating that experience and efficiency can coexist.

Strategic Recommendations

Targeted Coaching: Utilize the "Bottom 3" agent flags in the AHT and CSAT% worksheets to provide specific training for agents currently in the "High" AHT bucket (above 230s).
Resolution-First Incentives: Transition agent KPIs from "Speed of Answer" to Overall FCR%, as the data shows that customers prioritize a resolved issue over a shorter wait time.
Technical Knowledge Base: Given the friction in specific topics, I recommend a specialized routing system for high-complexity queries to be handled by agents with a proven high CSAT% in those categories.

Technical Challenges & Solutions

Challenge: Aggregating Duration Strings. Tableau cannot natively average "HH:MM:SS" formats.
- Solution: I utilized a combination of STR and INT functions to create a numeric Talk Duration (Seconds) measure, which served as the foundation for all time-based KPIs.
Challenge: Visualizing Mixed Performance Tiers. It was difficult to see which agents were over-performing vs. under-performing at a glance.
- Solution: I engineered Color Buckets (CSAT and AHT) using nested IF-ELSEIF logic, which dynamically changes the color of bars and charts to highlight performance risks instantly.

Human Resources

Project 3: HR Analytics - Employee Attrition Analysis Project

Explore the project

Executive Summary

This project identifies the key drivers behind employee turnover within a global organization. By analyzing demographic data, performance ratings, and tenure, I developed a predictive-style dashboard that highlights "At-Risk" departments. The analysis uncovered that attrition is not merely a compensation issue but is heavily influenced by overtime frequency and distance from the office, providing HR leaders with a data-backed roadmap for retention.

The Business Challenge

The HR department was experiencing an unexplained spike in turnover among mid-level engineers. Traditional exit interviews were providing anecdotal evidence, but leadership lacked a quantitative view of the "Attrition Profile." The challenge was to integrate disparate employee data points into a single "Risk Scorecard" to identify attrition patterns before they impact the company's bottom line.

Data Architecture & Modeling

Data Source: Integrated employee records, including sentiment surveys, payroll data, and historical performance reviews.
Schema: Implemented a Star Schema with a centralized Fact_Attrition table. Dimension tables include Dim_Employee, Dim_Date, and Dim_Department to allow for seamless cross-filtering.
Security: Implemented Row-Level Security (RLS) logic to ensure that Department Heads can only see data for their specific teams, protecting sensitive salary information.

Data Analysis (Technical Execution)

Attrition Rate Calculation: Developed a dynamic DAX measure to calculate the monthly attrition rate: (Count of Terminations / Average Headcount).
Overtime Correlation: Used a Scatter Plot with a Trend Line to visualize the direct linear relationship between weekly overtime hours and the probability of resignation.
Demographic Binning: Created "Age Groups" and "Tenure Buckets" using DAX SWITCH functions to identify if turnover was concentrated among "New Hires" (0-2 years) or "Vested Employees" (5+ years).

Deep Dive Insights

The Overtime Trigger: Employees working more than 15% overtime are 3.5x more likely to leave than those with standard hours, regardless of their performance rating.
The 'Distance' Factor: Attrition rates significantly increase for employees living more than 20 miles from the office, suggesting that "Commute Fatigue" is a silent driver of turnover.
Departmental Hotspots: The Sales department showed a 22% higher attrition rate compared to R&D, linked specifically to lower "Job Involvement" scores in annual engagement surveys.

Strategic Recommendations

Hybrid Work Pilot: Implement a flexible work-from-home policy specifically for employees in the "High Commute" bracket (20+ miles) to reduce burnout.
Overtime Audit: HR should mandate a departmental audit whenever a team’s average overtime exceeds 10% for two consecutive quarters to prevent "mass exodus" events.
Stay Interviews: Launch "Stay Interviews" for high-performing employees in the 1-3 year tenure bracket, as this was identified as the highest-risk period for voluntary resignation.

Technical Challenges & Solutions

Challenge: Calculating "Active Headcount" at any point in time is difficult because it requires comparing an employee's hire date and term date against a moving calendar.
Solution: I authored a "Snapshot" DAX measure using CALCULATE and FILTER to count employees where the Hire Date <= Max Date and the Term Date >= Max Date (or is null). This allowed for accurate month-over-month headcount reporting.
Challenge: The dataset had highly imbalanced classes (fewer "Left" than "Stayed").
Solution: I focused the visuals on Percentages and Rates rather than raw counts. This ensured that a small department with 2 exits was flagged as higher risk than a large department with 4 exits.

HR Analytics - Employee Attrition Analysis Project.png

E-Commerce

Project 4: E-Commerce Monthly Sales vs Forecasts

Explore the project

Executive Summary

This project analyzes over 2,800 global sales transactions to identify revenue drivers and market opportunities. By combining Python’s computational power with Tableau’s visualization capabilities, I transformed raw CSV data into a strategic tool that identifies Classic Cars as the core revenue pillar (contributing nearly $4M) and the USA as the dominant market. The analysis uncovers seasonal sales spikes and provides a granular view of deal sizes to optimize inventory and marketing spend.

The Business Challenge

The organization lacked a unified view of its global sales performance across multiple product lines and territories. Key stakeholders struggled to identify:

Which product lines were underperforming relative to their market potential.
The geographic distribution of "Large" versus "Small" deal sizes.
Seasonal trends that affect shipping and logistics planning. The goal was to build a robust analysis pipeline that could handle messy CSV data and output an interactive dashboard for regional managers.

Data Architecture & Modeling

Data Source: sales_data_sample.csv (Transactions from 2003–2005).
Python Preprocessing: Used Python (Pandas) to handle encoding issues (ISO-8859-1), convert string-based dates into standard datetime objects, and manage null values in geographic fields like STATE and TERRITORY.
Dimensionality: I structured the data to support a Star Schema approach within Tableau, ensuring that PRODUCTLINE, COUNTRY, and YEAR_ID acted as the primary dimensions for filtering the SALES fact.

Data Analysis (Technical Execution)

Python Aggregation: Used Python to calculate descriptive statistics, revealing an average order value of $3,553 and identifying a significant standard deviation, suggesting a wide variety in transaction scales.
Tableau Calculations: Created custom "Year-over-Year" growth measures and "Average Days to Ship" (if shipping data was present) to evaluate operational speed.
Customer Segmentation: Applied binning techniques to group customers by total spend, distinguishing "Wholesale" giants from "Boutique" retailers.

Deep Dive Insights

The "Classic" Dominance: The Classic Cars line is the clear market leader, generating over $3.9M in revenue—more than double the next category (Vintage Cars at $1.9M).
Geographic Concentration: The USA remains the primary revenue driver ($3.6M), followed by a strong European presence in Spain ($1.2M) and France ($1.1M).
Deal Size Strategy: While "Small" deals are higher in frequency, "Large" deals (> $7,000) contribute a disproportionate amount of profit margin, suggesting a need for a dedicated "Key Account Management" strategy.
The "Trains" Opportunity: The Trains product line is currently the lowest performer ($226K). Analysis suggests this isn't a lack of demand, but a lack of regional marketing focus outside of North America.

Strategic Recommendations

Product Pivot: Re-allocate 15% of the marketing budget from the stagnant "Trains" category into high-growth "Classic Cars" and "Motorcycles" to capitalize on existing momentum.
EMEA Expansion: Spain and France show higher-than-average "Medium" deal sizes. I recommend localized sales campaigns in these regions to convert "Medium" accounts into "Large" wholesale partners.
Inventory Optimization: Since sales peak in the fourth quarter (Q4) consistently, inventory for high-demand product codes (e.g., S10_1678) should be bolstered by 20% by late August.

Technical Challenges & Solutions

Challenge: The raw CSV file had encoding issues that prevented standard loading.
Solution: Used Python’s pd.read_csv(encoding='ISO-8859-1') to successfully ingest the data without character loss.
Challenge: Geographic fields (STATE, POSTALCODE) had significant missing data for international orders.
Solution: Within Tableau, I used a hierarchy of COUNTRY > CITY for mapping, bypassing the incomplete STATE field to maintain a professional, error-free geographic visualization.

E-Commerce Monthly Sales vs Forecasts.png

A.I.

Project 5: AI-Driven Retail Insights Dashboard (Power BI + ChatGPT)

Explore the project

Executive Summary

This project bridges the gap between raw data and executive action by integrating Power BI with the OpenAI (ChatGPT) API. While traditional dashboards require manual interpretation, this solution uses Python-driven AI to generate automated "Executive Summaries" and sentiment-based strategy pivots. By analyzing sales performance alongside customer feedback, the system identifies not just what is happening, but why it is happening, providing a significant competitive advantage in decision-making speed.

The Business Challenge

Retail executives often suffer from "Dashboard Fatigue"—they have access to plenty of charts but lack the time to synthesize them into a coherent strategy. The challenge was to create a "Self-Explaining Dashboard" that could:

Identify hidden correlations between customer sentiment and sales drops.
Provide natural language summaries of complex monthly performance.
Automate the diagnostic process that usually takes analysts hours to prepare.

Data Architecture & Modeling

Hybrid Schema: I utilized a robust Star Schema as the foundation, connecting Fact tables (Sales, Returns) to Dimensions (Product, Geography, Date).
AI Transformation Layer: I engineered a specific "Insight Table" using Python scripts within Power Query. This layer batches data and sends it to the GPT-4 model, returning structured text responses that are stored directly in the model for instant loading.
Geospatial Integration: Used a custom TopoJSON map of US States to allow the AI to correlate regional cultural sentiment with local product preferences.

Data Analysis (Technical Execution)

Natural Language Generation (NLG): Authored DAX measures that concatenate top-performing metrics into a single string, which is then passed to the AI to "write" the daily briefing.
Sentiment Scoring: Implemented a Python-based sentiment analysis script that bins customer feedback into Positive, Neutral, and Negative categories, which are then cross-referenced with "Return Rates."
Advanced Visualizations: Leveraged the "AI Decomposition Tree" and "Key Influencers" visuals to allow the AI to determine the primary drivers of profit loss automatically.

Deep Dive Insights

The "Silent Warning": The AI detected a pattern where "Negative Shipping Sentiment" in the Midwest was a leading indicator of a 15% sales decline in the following month—allowing the logistics team to intervene before the revenue hit.
Product-Sentiment Correlation: While 'Technology' had the highest sales, the AI flagged that 'Office Supplies' had the highest "Sentiment-per-Dollar," indicating a loyal customer base that would be receptive to a price increase without churn.
Automated Root-Cause: The dashboard successfully identified that a spike in returns for 'Furniture' was specifically tied to a "Assembly Instructions" complaint found in the text data, rather than product quality.

Strategic Recommendations

Automated Pivot: Based on AI sentiment analysis, the business should re-allocate 10% of the marketing budget from "Underperforming/Low Sentiment" regions to "High Sentiment" hubs to maximize Return on Ad Spend (ROAS).
Operational Feedback Loop: Feed the AI-generated "Customer Pain Points" directly to the Product Development team to prioritize the 2025 product roadmap based on actual user feedback.
Proactive Customer Recovery: Identify customers in the "Negative Sentiment/High Value" quadrant for automated outreach and discount offers before they officially churn.

Technical Challenges & Solutions

Challenge: API Latency & Costs. Calling ChatGPT for every row in a 50,000-row dataset would be too slow and expensive.
Solution: I developed a Summary Batching technique. I used Python to aggregate the data by "Category" and "Region" before sending it to the API. This reduced the API calls from thousands to just twelve, maintaining high-speed dashboard performance while keeping costs near zero.
Challenge: Data Security. Ensuring sensitive customer information isn't passed to a public AI model.
Solution: I implemented a Data Masking script in Python that stripped PII (Personally Identifiable Information) and anonymized Customer IDs before the data left the Power BI environment for processing.

AI-Driven Retail Insights Dashboard (Power BI + ChatGPT).png

Project 6: Olist Customer Insights Dashboard (AI-Enhanced)

Explore the project

Executive Summary

This project provides a 360-degree diagnostic of the Olist marketplace, synthesizing data from 2016 to 2018. By integrating eight disparate datasets, I identified that while the platform maintains a healthy $13.6M in revenue and a 4.09/5.0 average review score, there is a significant geographical "delivery tax" affecting the North and Northeast regions. The analysis reveals that delivery delays are the single largest driver of negative sentiment, with late orders receiving a 63% lower satisfaction rating than on-time deliveries.

The Business Challenge

Olist acts as a bridge between small Brazilian sellers and large marketplaces. The primary challenge was Logistics Transparency. Management lacked a way to visualize how the vast geography of Brazil impacted the "Estimated vs. Actual" delivery window. They needed to answer:

Which product categories are most susceptible to shipping delays?
How much does a one-day delay in delivery impact the final Review Score?
Where should Olist recruit new sellers to minimize "long-haul" shipping costs?

Data Architecture & Modeling

Relational Schema: I constructed a comprehensive Star Schema by joining eight CSV files. I used the Orders table as the central fact, linked to Customers, Sellers, Order_Items, Payments, and Reviews.
Transformation: I utilized a mapping file to translate Portuguese category names into English, ensuring the dashboard was business-ready for international stakeholders.
Geospatial Join: I linked the Geolocation dataset to both Sellers and Customers by zip-code prefix to calculate "As-the-Crow-Flies" shipping distances.

Data Analysis (Technical Execution)

Logistics KPI Engineering: Developed a calculated field for "Delivery Delta" (Actual Delivery Date - Estimated Delivery Date) to categorize orders as Early, On-Time, or Late.
Sentiment Correlation: Built a dual-axis visualization in Tableau to track the moving average of Review Scores against Delivery Time in days.
Payment Mix Analysis: Created a breakdown of payment types, discovering that Credit Cards drive 75% of revenue, while Boleto (bank slips) introduces an average 2-day delay in "Order Approval" time.

Deep Dive Insights

The Revenue Pillars: Health & Beauty and Watches & Gifts are the top-grossing categories, together exceeding $2.4M in sales.
The Satisfaction Floor: Analysis shows a "Critical Threshold" at Day 14. If an order takes longer than 14 days to arrive, the probability of a 1-star review increases by 400%, regardless of product quality.
Geographic Bottlenecks: The Southeast (São Paulo) is the most efficient hub, but shipments to the North region (Amazonas) have an average lead time of 20+ days, leading to the highest churn rates in the dataset.

Strategic Recommendations

Regional Seller Onboarding: Olist should launch a recruitment campaign for sellers in the Northeast (Bahia/Pernambuco) to serve local customers, reducing the reliance on cross-country shipping from the South.
Dynamic "Estimated Delivery" Dates: Adjust the estimation algorithm for the North region to be more conservative. The data shows customers are happier with a long estimate that is met, rather than a short estimate that is missed.
Boleto Incentives: Offer a small discount for Credit Card payments to phase out Boleto use, which would shave an average of 48 hours off the total fulfillment cycle.

Technical Challenges & Solutions

Challenge: The Geolocation dataset contained over 1 million rows, which caused severe performance degradation and "spinning wheels" in Tableau.
Solution: I performed Data Aggregation at the zip-code prefix level during the ETL phase. This reduced the coordinate points from 1,000,000 to roughly 19,000, maintaining geographic integrity while increasing dashboard responsiveness by 85%.
Challenge: Duplicate Order_ID entries in the Items table (where one order has multiple products) were causing "Double Counting" of total revenue.
Solution: I utilized LOD Expressions (Level of Detail) in Tableau—specifically {FIXED [Order Id] : MIN([Payment Value])}—to ensure that the financial metrics remained accurate regardless of how many items were in a single basket.

AI-Driven Delivery & Customer Experience Dashboard.png