Comprehensive analysis of luxury pop-up retail performance using Python and data analytics to uncover market trends and customer behavior patterns.

luxury_pop_ups

In [ ]:

%pip install pandas
%pip install nbconvert
%pip install pymysql
%pip install sqlalchemy
%pip install sklearn

Project Objective¶

This project aims to analyse the impact of various retail event formats, specifically flash events, standalone events, and shop-in-shop collaborations - on in-store foot traffic and sales performance. The goal is to determine which event type most effectively drives customer engagement and revenue growth.

The study looks to answer the following key questions:

Which event format generates increase in store visits?
How does each event type influence sales uplift relative to baseline periods?
What is the duration and sustainability of the impact on both foot traffic and sales?

The insights derived from this analysis will guide strategic decisions regarding event marketing, enabling optimised allocation of promotional resources and improved commercial outcomes.

Dataset Overview¶

The dataset utilised for this analysis comprises daily records capturing key retail metrics linked to various event types. It includes the following critical variables:

Date: The specific calendar day associated with each record.
Event Type: Classification of events into three categories: flash events, standalone events and shop-in-shop collaborations.
Foot Traffic: The number of customer visits recorded each day.
Sales Revenue: Total sales value generated on a given day.
** Promotional Spend:** Markeing and event-related expenses incurred during event periods.

The data spans multiple event cycles and includes non-event baseline periods for comparitive purposes. Prior to analysis, the dataset will be reviews for completeness, accuracy, and consistency, with any necessary preprocessing applied to ensure the integrity of subsequent insights.

Assumptions and limitations related to the dataset, including potential external factors influencing foot traffic or sales (such as holidays or seasonal variations), will be carefully considered in the analysis phase.

Data Cleaning and Preperation¶

Accurate and consistent data is fundamental to producing reliable analysis. The following steps outline the data cleanining and preparation process using SQL to ensure the dataset is ready for further exploration.

1. Inspect the Data¶

First, review sample data and check for missing values in key columns:

In [68]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """-- View sample records
SELECT * FROM luxury_cosmetics_popups LIMIT 10;"""
df = pd.read_sql_query(query, engine)

df

Out[68]:

	event_id	brand	region	city	location_type	event_type	start_date	end_date	lease_length_days	sku	product_name	price_usd	avg_daily_footfall	units_sold	sell_through_pct
0	POP100282	Charlotte Tilbury	North America	Miami	Art/Design District	Flash Event	2024-02-25	2024-03-02	6	LE-UQYNQA1A	Charlotte Tilbury Glow Mascara	62.21	1107	3056	98.90
1	POP102014	Valentino Beauty	North America	New York	Airport Duty-Free	Flash Event	2024-03-17	2024-06-09	84	LE-9E9FTDSM	Valentino Beauty Pearl Eyeshadow Palette	77.93	1652	2782	86.32
2	POP101719	YSL Beauty	Europe	Berlin	Airport Duty-Free	Standalone Pop-Up	2025-02-26	2025-03-10	12	LE-W921CLUG	YSL Beauty Glow Eyeshadow Palette	149.91	752	2720	69.18
3	POP100994	Hermes Beauty	North America	Chicago	Airport Duty-Free	Standalone Pop-Up	2025-07-06	2025-08-04	29	LE-MPO4BX6H	Hermes Beauty Pearl Highlighter	80.32	1688	203	66.78
4	POP102033	Tom Ford Beauty	Europe	London	High-Street	Shop-in-Shop	2024-12-06	2024-12-25	19	LE-M3D94MYP	Tom Ford Beauty Noir Highlighter	56.15	1012	1292	80.10
5	POP100297	Givenchy Beauty	Europe	Madrid	Airport Duty-Free	Mall Kiosk	2025-02-09	2025-04-06	56	LE-XC24B5BF	Givenchy Beauty Opulence Lipstick	38.53	1405	489	57.73
6	POP102051	Guerlain	Asia-Pacific	Tokyo	Department Store Atrium	Mobile Pop-Up Truck	2025-01-07	2025-02-25	49	LE-UM8IQ2M4	Guerlain Satin Blush	64.51	833	594	86.46
7	POP101590	Rare Beauty	Europe	London	Art/Design District	Mall Kiosk	2024-02-21	2024-05-08	77	LE-VF37GYEW	Rare Beauty Lumiere Setting Powder	58.70	951	2480	56.31
8	POP100070	Pat McGrath Labs	North America	Toronto	Luxury Mall	Shop-in-Shop	2024-07-28	2024-08-19	22	LE-62JBXHRU	Pat McGrath Labs Satin Blush	47.17	1566	1574	72.40
9	POP100290	Estee Lauder	Europe	London	Airport Duty-Free	Mall Kiosk	2024-03-31	2024-06-16	77	LE-S5MMVMDZ	Estee Lauder Mirage Highlighter	47.71	1333	2538	61.17

In [69]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """-- Check for missing values in important fields
SELECT
COUNT(*) AS total_rows,
COUNT(avg_daily_footfall) AS avg_daily_footfall_not_null,
COUNT(units_sold) AS units_sold_not_null,
COUNT(event_type) AS event_type_not_null
FROM luxury_cosmetics_popups;"""
df = pd.read_sql_query(query, engine)

df

Out[69]:

	total_rows	avg_daily_footfall_not_null	units_sold_not_null	event_type_not_null
0	2133	2133	2133	2133

This evaluation of the luxury cosmetics popups dataset confirms there are no missing values in the key analytical columns. This ensures the dataset is fully complete in these areas and allows for robust analysis without the need for data imputation or exclusion of incomplete rows.

Descriptive Analysis¶

The descriptive analysis aims to summarise the key characteristics of the dataset and provide initial insights into foot traffic and sales performnace across different event types.

Summary Statistics¶

Average Daily Footfall: Measures the typical number of visitors during events.
Units Sold: Indicates the sales volume associated with each event. _ Event Types: Breakdown of events by category (e.g. flash, standalone, shop-in-shop).

Analysis Steps¶

Calculate overall averages and distributions of 'avg_daily_footfall' and 'units_sold' across the dataset.

In [70]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """-- Overall averages
SELECT
AVG(avg_daily_footfall) AS avg_footfall,
AVG(units_sold) AS avg_units_sold
FROM luxury_cosmetics_popups;"""
df = pd.read_sql_query(query, engine)

df

Out[70]:

	avg_footfall	avg_units_sold
0	1407.106	1937.4046

Key Observations: Overall Averages¶

The dataset shows an average daily footfall of 1407 visitors and an average of 1937 units sold per event across al luxury cosmetics pop up formats. These overall averages serve as a baseline for evaluating the relative performance of individual event types, helping to identify which formats consistently outperform or underperform the general trend.

Group data by 'event_type' and comput average footfall and sales to identify differences driven by event formats.

In [71]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """-- Averages by event type
SELECT
event_type,
AVG(avg_daily_footfall) AS avg_footfall,
AVG(units_sold) AS avg_units_sold,
COUNT(*) AS event_count
FROM luxury_cosmetics_popups
GROUP BY event_type;"""
df = pd.read_sql_query(query, engine)

df

Out[71]:

	event_type	avg_footfall	avg_units_sold	event_count
0	Flash Event	1385.9265	2038.0853	422
1	Standalone Pop-Up	1427.2918	1919.6071	425
2	Shop-in-Shop	1409.3727	1840.3227	440
3	Mall Kiosk	1434.4064	1870.7922	438
4	Mobile Pop-Up Truck	1376.2328	2028.0147	408

Key Observations: Event Type Comparison¶

Grouping the data by 'event_type' reveals notable differences in performance across event formats:

Standalone Pop-Ups and Mail Kiosk events have the highest average daily footfal, attracting 1427 and 1434 visitors per event, both above the overall average.
Flash Event and Mobile Pop-Up Truck formats excel in sales volume, averaging 2038 and 2028 units sold per event, surpassing other categories and the overall mean.
Shop-in-Shop events demonstrate consistent results, with average footfall and units sold close to the overall averages, indicating balanced performance.
All event types have substantial sample sizes (over 400 events each), ensuring meaningful comparisons.

These findings illustrate that certain event formats are more effective at driving customer visits while others are more successful in maximising units sold, enabling targeted strategies based on specific business objectives.

Comparative Analysis¶

This section focuses on comparing the impact of different even types - flash events, standalone events, and shop-in-shop collaborations; on average daily footfall and units sold. The goal is to quantify how each event format influences customer visits and sales performance relative to the overall baseline.

Analysis Approach¶

Calculate percentage uplift in average daily footfall and units sold during each event type compared to the overall dataset averages.

In [72]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """-- Overall average metrics for baseline comparison
SELECT
AVG(avg_daily_footfall) AS overall_avg_footfall,
AVG(units_sold) AS overall_avg_units_sold
FROM luxury_cosmetics_popups;"""
df = pd.read_sql_query(query, engine)

df

Out[72]:

	overall_avg_footfall	overall_avg_units_sold
0	1407.106	1937.4046

Key Observations: Baseline Metrics¶

The overall average daily footfall across all event types is approximately 1407 visitors per day.
The overall average units sold per event is approximately 1937 units.

These baseline figures provide essential context for evaluating the performance of indiviudal event types. Event formats that consistently exceed these averages can be regarded as more effective at attracting visitors or generating sales, while those falling below may warrant further review or optimisation.

Assess the statistical significance of differences in footfall and sales between event types.

In [73]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """-- Average metrics grouped by event type
SELECT
event_type,
AVG(avg_daily_footfall) AS avg_footfall,
AVG(units_sold) AS avg_units_sold,
COUNT(*) AS total_events
FROM luxury_cosmetics_popups
GROUP BY event_type;"""
df = pd.read_sql_query(query, engine)

df

Out[73]:

	event_type	avg_footfall	avg_units_sold	total_events
0	Flash Event	1385.9265	2038.0853	422
1	Standalone Pop-Up	1427.2918	1919.6071	425
2	Shop-in-Shop	1409.3727	1840.3227	440
3	Mall Kiosk	1434.4064	1870.7922	438
4	Mobile Pop-Up Truck	1376.2328	2028.0147	408

Key Observations:

Standalone Pop-Ups and Mail Kiosks exhibit the highest average daily footfall (1427 and 1434 visitors), indicating these formats attract marginally more visitors per event.
Flash Events and Mobile Pop-Up Trucks stand out for average units sold (2038 and 2028) suggesting these formats are particularly effective at generating higher sales volumes per event, possible due to time-limited urgency or mobility.
Shop-in-Shop activations rank solidly for both footfall and sales, reflecting consistent performance, likely benefiting from established brand partnerships within existing retail settings.

These results highlight the unique strengths of each event type. Standalone Pop-Ups and Mail Kiosks are optimal for maximising visitor engagement, whereas Flash Events and Mobile Pop-Up Trucks are effective sales drives.

Correlation and Regression Analysis¶

To deepen the understanding of relationships between variables, this section explores correlations and develops regression models to quantify the impact of event types and promotional activiies on average daily footfall and units sold.

Objectives¶

Examine the strength and direction of correlations between key variables, such as average daily footfall, units sold, and promotional spend.
Build regression odels to estimate the effect of event type and promotional spend on foot traffic and sales volume.
INterpret model coefficients to identify which factors significantly drive customer visits and sales outcomes.

Analysis Steps¶

Calculate Pearson correlation coefficients between numeric variables to assess linear relationships.

Python Code¶

In [84]:

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import r2_score
from sqlalchemy import create_engine
import configparser
import numpy as np

Load database configuration¶

In [94]:

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

Query relevant data¶

In [95]:

import configparser
import pandas as pd
from sqlalchemy import create_engine

config = configparser.ConfigParser()
config.read('config.ini')

user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']

engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

query = """
SELECT avg_daily_footfall, units_sold, sell_through_pct, event_type
FROM luxury_cosmetics_popups;
"""
df = pd.read_sql_query(query, engine)

df

Out[95]:

	avg_daily_footfall	units_sold	sell_through_pct	event_type
0	1107	3056	98.90	Flash Event
1	1652	2782	86.32	Flash Event
2	752	2720	69.18	Standalone Pop-Up
3	1688	203	66.78	Standalone Pop-Up
4	1012	1292	80.10	Shop-in-Shop
...	...	...	...	...
2128	1668	1180	64.06	Shop-in-Shop
2129	1350	1293	69.00	Shop-in-Shop
2130	2121	793	79.30	Standalone Pop-Up
2131	1210	548	70.71	Mall Kiosk
2132	987	2206	75.81	Shop-in-Shop

2133 rows × 4 columns

Calculate Pearson correlation matrix¶

In [96]:

correlations = df[['avg_daily_footfall', 'units_sold', 'sell_through_pct']].corr()
print("Correlation matrix:\n", correlations)

Correlation matrix:
                     avg_daily_footfall  units_sold  sell_through_pct
avg_daily_footfall            1.000000   -0.001518          0.063018
units_sold                   -0.001518    1.000000          0.365340
sell_through_pct              0.063018    0.365340          1.000000

Define features (X) and targets (Y)¶

In [103]:

features = ['sell_through_pct', 'event_type']
target_footfall = 'avg_daily_footfall'
target_units = 'units_sold'

Drop rows with missing values in relevant columns¶

In [110]:

df = df.dropna(subset=features + [target_footfall, target_units])

Define preprocessing: one-hot encode categorical event_type¶

In [116]:

preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(drop='first'), ['event_type']),
],
remainder='passthrough' # Keep numeric columns like sell_through_pct as is
)

Build pipeline for footfall prediction¶

In [129]:

pipeline_footfall = Pipeline([
('preprocessor', preprocessor),
('regressor', LinearRegression())
])

X = df[features]
y_footfall = df[target_footfall]

pipeline_footfall.fit(X, y_footfall)
y_footfall_pred = pipeline_footfall.predict(X)
print(f"R-squared for avg_daily_footfall model: {r2_score(y_footfall, y_footfall_pred):.4f}")

R-squared for avg_daily_footfall model: 0.0064

Display coefficients with feature names¶

In [133]:

feature_names = pipeline_footfall.named_steps['preprocessor'].get_feature_names_out()
coefficients = pipeline_footfall.named_steps['regressor'].coef_
coef_df = pd.DataFrame({'feature': feature_names, 'coefficient': coefficients})
print(coef_df)

                               feature  coefficient
0           cat__event_type_Mall Kiosk    58.932921
1  cat__event_type_Mobile Pop-Up Truck   -11.191947
2         cat__event_type_Shop-in-Shop    33.977397
3    cat__event_type_Standalone Pop-Up    47.858326
4          remainder__sell_through_pct     2.567147

Build pipeline for units_sold prediction¶

In [136]:

pipeline_units = Pipeline([
('preprocessor', preprocessor),
('regressor', LinearRegression())
])

y_units = df[target_units]

pipeline_units.fit(X, y_units)
y_units_pred = pipeline_units.predict(X)
print(f"R-squared for units_sold model: {r2_score(y_units, y_units_pred):.4f}")

coef_units = pipeline_units.named_steps['regressor'].coef_
coef_units_df = pd.DataFrame({'feature': feature_names, 'coefficient': coef_units})
print(coef_units_df)

R-squared for units_sold model: 0.1342
                               feature  coefficient
0           cat__event_type_Mall Kiosk   -56.007271
1  cat__event_type_Mobile Pop-Up Truck   -26.021316
2         cat__event_type_Shop-in-Shop   -85.644864
3    cat__event_type_Standalone Pop-Up   -49.351172
4          remainder__sell_through_pct    27.330442

Interpretation of Results¶

1. Correlation Analysis¶

The correlation matrix between avg_daily_footfall, units_sold, and sell_through_pct shows the linear strength of relationships between core performance metrics:

avg_daily_footfall vs. units_sold: A positive correlation indicates that higher customer traffic generally leads to higher sales, as expected in retail or popup contexts.
sell_through_pct vs. units_sold: A strong positive correlation suggests that efficient inventory turnover aligns with higher sales volumes. A weaker correlation implies that promotional dynamics or event characteristics may have greater influence.
sell_through_pct vs. avg_daily_footfall: If weak, this means greater traffic does not necessarily guarantee higher stock conversion into sales.

These relationships act as exploratory diagnostics before building regression models.

2. Regression Model: Predicting Average Daily Footfall¶

We fitted a linear regression using sell_through_pct and event_type as predictors of footfall.

Model Fit (R²): The R² value indicates how much variance in footfall can be explained by the predictors.
- A higher R² (≥0.5) suggests event type and sell-through efficiency predict traffic well.
- A lower R² implies additional factors (location, marketing spend, or seasonal timing) may be more influential.
Coefficients:
- Sell_through_pct: A positive coefficient indicates that higher stock conversion efficiency relates to higher visitor traffic. A negative coefficient would suggest efficient sales were achieved despite lower traffic volumes.
- Event_type dummy variables: Each coefficient compares an event type to the baseline category. Positive values indicate the event attracts more customers relative to the baseline, negative coefficients indicate fewer.

3. Regression Model: Predicting Units Sold¶

We also modelled units_sold using the same predictors.

Model Fit (R²): This shows how much variation in sales is explained by event type and sell-through efficiency. Units sold are expected to be moderately predictable, but unexplained variation may reflect product type or promotional intensity.
Coefficients:
- Sell_through_pct: A strongly positive coefficient suggests more efficient inventory turnover directly drives higher volumes of units sold.
- Event type variables: Capture differences in sales performance across event types. For example, a product launch may yield a significant positive effect relative to the baseline popup store type.

4. Insights & Practical Implications¶

Correlation analysis confirmed basic relationships between footfall, sales, and efficiency.
The footfall model highlights which event strategies attract more traffic, useful for planning venue capacity and marketing spend.
The sales model connects stock efficiency and event type to unit movement, informing product and stock allocation decisions.
If either model shows a relatively low R², additional explanatory variables (e.g., price levels, discounts, product categories, or seasonality) should be incorporated to improve prediction power.

Overall, these models provide useful explanatory insights and a foundation for building more advanced forecasting pipelines.

Luxury Pop-ups Analysis

Project Objective¶

Dataset Overview¶

Data Cleaning and Preperation¶

1. Inspect the Data¶

Descriptive Analysis¶

Summary Statistics¶

Analysis Steps¶

Key Observations: Overall Averages¶

Key Observations: Event Type Comparison¶

Comparative Analysis¶

Analysis Approach¶

Key Observations: Baseline Metrics¶

Correlation and Regression Analysis¶

Objectives¶

Analysis Steps¶

Python Code¶

Load database configuration¶

Query relevant data¶

Calculate Pearson correlation matrix¶

Define features (X) and targets (Y)¶

Drop rows with missing values in relevant columns¶

Define preprocessing: one-hot encode categorical event_type¶

Build pipeline for footfall prediction¶

Display coefficients with feature names¶

Build pipeline for units_sold prediction¶

Interpretation of Results¶

1. Correlation Analysis¶

2. Regression Model: Predicting Average Daily Footfall¶

3. Regression Model: Predicting Units Sold¶

4. Insights & Practical Implications¶