Luxury Pop-ups Analysis
Comprehensive analysis of luxury pop-up retail performance using Python and data analytics to uncover market trends and customer behavior patterns.
%pip install pandas
%pip install nbconvert
%pip install pymysql
%pip install sqlalchemy
%pip install sklearn
Project Objective¶
This project aims to analyse the impact of various retail event formats, specifically flash events, standalone events, and shop-in-shop collaborations - on in-store foot traffic and sales performance. The goal is to determine which event type most effectively drives customer engagement and revenue growth.
The study looks to answer the following key questions:
- Which event format generates increase in store visits?
- How does each event type influence sales uplift relative to baseline periods?
- What is the duration and sustainability of the impact on both foot traffic and sales?
The insights derived from this analysis will guide strategic decisions regarding event marketing, enabling optimised allocation of promotional resources and improved commercial outcomes.
Dataset Overview¶
The dataset utilised for this analysis comprises daily records capturing key retail metrics linked to various event types. It includes the following critical variables:
- Date: The specific calendar day associated with each record.
- Event Type: Classification of events into three categories: flash events, standalone events and shop-in-shop collaborations.
- Foot Traffic: The number of customer visits recorded each day.
- Sales Revenue: Total sales value generated on a given day.
- ** Promotional Spend:** Markeing and event-related expenses incurred during event periods.
The data spans multiple event cycles and includes non-event baseline periods for comparitive purposes. Prior to analysis, the dataset will be reviews for completeness, accuracy, and consistency, with any necessary preprocessing applied to ensure the integrity of subsequent insights.
Assumptions and limitations related to the dataset, including potential external factors influencing foot traffic or sales (such as holidays or seasonal variations), will be carefully considered in the analysis phase.
Data Cleaning and Preperation¶
Accurate and consistent data is fundamental to producing reliable analysis. The following steps outline the data cleanining and preparation process using SQL to ensure the dataset is ready for further exploration.
1. Inspect the Data¶
First, review sample data and check for missing values in key columns:
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """-- View sample records
SELECT * FROM luxury_cosmetics_popups LIMIT 10;"""
df = pd.read_sql_query(query, engine)
df
event_id | brand | region | city | location_type | event_type | start_date | end_date | lease_length_days | sku | product_name | price_usd | avg_daily_footfall | units_sold | sell_through_pct | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POP100282 | Charlotte Tilbury | North America | Miami | Art/Design District | Flash Event | 2024-02-25 | 2024-03-02 | 6 | LE-UQYNQA1A | Charlotte Tilbury Glow Mascara | 62.21 | 1107 | 3056 | 98.90 |
1 | POP102014 | Valentino Beauty | North America | New York | Airport Duty-Free | Flash Event | 2024-03-17 | 2024-06-09 | 84 | LE-9E9FTDSM | Valentino Beauty Pearl Eyeshadow Palette | 77.93 | 1652 | 2782 | 86.32 |
2 | POP101719 | YSL Beauty | Europe | Berlin | Airport Duty-Free | Standalone Pop-Up | 2025-02-26 | 2025-03-10 | 12 | LE-W921CLUG | YSL Beauty Glow Eyeshadow Palette | 149.91 | 752 | 2720 | 69.18 |
3 | POP100994 | Hermes Beauty | North America | Chicago | Airport Duty-Free | Standalone Pop-Up | 2025-07-06 | 2025-08-04 | 29 | LE-MPO4BX6H | Hermes Beauty Pearl Highlighter | 80.32 | 1688 | 203 | 66.78 |
4 | POP102033 | Tom Ford Beauty | Europe | London | High-Street | Shop-in-Shop | 2024-12-06 | 2024-12-25 | 19 | LE-M3D94MYP | Tom Ford Beauty Noir Highlighter | 56.15 | 1012 | 1292 | 80.10 |
5 | POP100297 | Givenchy Beauty | Europe | Madrid | Airport Duty-Free | Mall Kiosk | 2025-02-09 | 2025-04-06 | 56 | LE-XC24B5BF | Givenchy Beauty Opulence Lipstick | 38.53 | 1405 | 489 | 57.73 |
6 | POP102051 | Guerlain | Asia-Pacific | Tokyo | Department Store Atrium | Mobile Pop-Up Truck | 2025-01-07 | 2025-02-25 | 49 | LE-UM8IQ2M4 | Guerlain Satin Blush | 64.51 | 833 | 594 | 86.46 |
7 | POP101590 | Rare Beauty | Europe | London | Art/Design District | Mall Kiosk | 2024-02-21 | 2024-05-08 | 77 | LE-VF37GYEW | Rare Beauty Lumiere Setting Powder | 58.70 | 951 | 2480 | 56.31 |
8 | POP100070 | Pat McGrath Labs | North America | Toronto | Luxury Mall | Shop-in-Shop | 2024-07-28 | 2024-08-19 | 22 | LE-62JBXHRU | Pat McGrath Labs Satin Blush | 47.17 | 1566 | 1574 | 72.40 |
9 | POP100290 | Estee Lauder | Europe | London | Airport Duty-Free | Mall Kiosk | 2024-03-31 | 2024-06-16 | 77 | LE-S5MMVMDZ | Estee Lauder Mirage Highlighter | 47.71 | 1333 | 2538 | 61.17 |
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """-- Check for missing values in important fields
SELECT
COUNT(*) AS total_rows,
COUNT(avg_daily_footfall) AS avg_daily_footfall_not_null,
COUNT(units_sold) AS units_sold_not_null,
COUNT(event_type) AS event_type_not_null
FROM luxury_cosmetics_popups;"""
df = pd.read_sql_query(query, engine)
df
total_rows | avg_daily_footfall_not_null | units_sold_not_null | event_type_not_null | |
---|---|---|---|---|
0 | 2133 | 2133 | 2133 | 2133 |
This evaluation of the luxury cosmetics popups dataset confirms there are no missing values in the key analytical columns. This ensures the dataset is fully complete in these areas and allows for robust analysis without the need for data imputation or exclusion of incomplete rows.
Descriptive Analysis¶
The descriptive analysis aims to summarise the key characteristics of the dataset and provide initial insights into foot traffic and sales performnace across different event types.
Summary Statistics¶
- Average Daily Footfall: Measures the typical number of visitors during events.
- Units Sold: Indicates the sales volume associated with each event. _ Event Types: Breakdown of events by category (e.g. flash, standalone, shop-in-shop).
Analysis Steps¶
- Calculate overall averages and distributions of 'avg_daily_footfall' and 'units_sold' across the dataset.
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """-- Overall averages
SELECT
AVG(avg_daily_footfall) AS avg_footfall,
AVG(units_sold) AS avg_units_sold
FROM luxury_cosmetics_popups;"""
df = pd.read_sql_query(query, engine)
df
avg_footfall | avg_units_sold | |
---|---|---|
0 | 1407.106 | 1937.4046 |
Key Observations: Overall Averages¶
The dataset shows an average daily footfall of 1407 visitors and an average of 1937 units sold per event across al luxury cosmetics pop up formats. These overall averages serve as a baseline for evaluating the relative performance of individual event types, helping to identify which formats consistently outperform or underperform the general trend.
- Group data by 'event_type' and comput average footfall and sales to identify differences driven by event formats.
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """-- Averages by event type
SELECT
event_type,
AVG(avg_daily_footfall) AS avg_footfall,
AVG(units_sold) AS avg_units_sold,
COUNT(*) AS event_count
FROM luxury_cosmetics_popups
GROUP BY event_type;"""
df = pd.read_sql_query(query, engine)
df
event_type | avg_footfall | avg_units_sold | event_count | |
---|---|---|---|---|
0 | Flash Event | 1385.9265 | 2038.0853 | 422 |
1 | Standalone Pop-Up | 1427.2918 | 1919.6071 | 425 |
2 | Shop-in-Shop | 1409.3727 | 1840.3227 | 440 |
3 | Mall Kiosk | 1434.4064 | 1870.7922 | 438 |
4 | Mobile Pop-Up Truck | 1376.2328 | 2028.0147 | 408 |
Key Observations: Event Type Comparison¶
Grouping the data by 'event_type' reveals notable differences in performance across event formats:
Standalone Pop-Ups and Mail Kiosk events have the highest average daily footfal, attracting 1427 and 1434 visitors per event, both above the overall average.
Flash Event and Mobile Pop-Up Truck formats excel in sales volume, averaging 2038 and 2028 units sold per event, surpassing other categories and the overall mean.
Shop-in-Shop events demonstrate consistent results, with average footfall and units sold close to the overall averages, indicating balanced performance.
All event types have substantial sample sizes (over 400 events each), ensuring meaningful comparisons.
These findings illustrate that certain event formats are more effective at driving customer visits while others are more successful in maximising units sold, enabling targeted strategies based on specific business objectives.
Comparative Analysis¶
This section focuses on comparing the impact of different even types - flash events, standalone events, and shop-in-shop collaborations; on average daily footfall and units sold. The goal is to quantify how each event format influences customer visits and sales performance relative to the overall baseline.
Analysis Approach¶
- Calculate percentage uplift in average daily footfall and units sold during each event type compared to the overall dataset averages.
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """-- Overall average metrics for baseline comparison
SELECT
AVG(avg_daily_footfall) AS overall_avg_footfall,
AVG(units_sold) AS overall_avg_units_sold
FROM luxury_cosmetics_popups;"""
df = pd.read_sql_query(query, engine)
df
overall_avg_footfall | overall_avg_units_sold | |
---|---|---|
0 | 1407.106 | 1937.4046 |
Key Observations: Baseline Metrics¶
- The overall average daily footfall across all event types is approximately 1407 visitors per day.
- The overall average units sold per event is approximately 1937 units.
These baseline figures provide essential context for evaluating the performance of indiviudal event types. Event formats that consistently exceed these averages can be regarded as more effective at attracting visitors or generating sales, while those falling below may warrant further review or optimisation.
- Assess the statistical significance of differences in footfall and sales between event types.
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """-- Average metrics grouped by event type
SELECT
event_type,
AVG(avg_daily_footfall) AS avg_footfall,
AVG(units_sold) AS avg_units_sold,
COUNT(*) AS total_events
FROM luxury_cosmetics_popups
GROUP BY event_type;"""
df = pd.read_sql_query(query, engine)
df
event_type | avg_footfall | avg_units_sold | total_events | |
---|---|---|---|---|
0 | Flash Event | 1385.9265 | 2038.0853 | 422 |
1 | Standalone Pop-Up | 1427.2918 | 1919.6071 | 425 |
2 | Shop-in-Shop | 1409.3727 | 1840.3227 | 440 |
3 | Mall Kiosk | 1434.4064 | 1870.7922 | 438 |
4 | Mobile Pop-Up Truck | 1376.2328 | 2028.0147 | 408 |
Key Observations:
Standalone Pop-Ups and Mail Kiosks exhibit the highest average daily footfall (1427 and 1434 visitors), indicating these formats attract marginally more visitors per event.
Flash Events and Mobile Pop-Up Trucks stand out for average units sold (2038 and 2028) suggesting these formats are particularly effective at generating higher sales volumes per event, possible due to time-limited urgency or mobility.
Shop-in-Shop activations rank solidly for both footfall and sales, reflecting consistent performance, likely benefiting from established brand partnerships within existing retail settings.
These results highlight the unique strengths of each event type. Standalone Pop-Ups and Mail Kiosks are optimal for maximising visitor engagement, whereas Flash Events and Mobile Pop-Up Trucks are effective sales drives.
Correlation and Regression Analysis¶
To deepen the understanding of relationships between variables, this section explores correlations and develops regression models to quantify the impact of event types and promotional activiies on average daily footfall and units sold.
Objectives¶
Examine the strength and direction of correlations between key variables, such as average daily footfall, units sold, and promotional spend.
Build regression odels to estimate the effect of event type and promotional spend on foot traffic and sales volume.
INterpret model coefficients to identify which factors significantly drive customer visits and sales outcomes.
Analysis Steps¶
- Calculate Pearson correlation coefficients between numeric variables to assess linear relationships.
Python Code¶
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import r2_score
from sqlalchemy import create_engine
import configparser
import numpy as np
Load database configuration¶
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
Query relevant data¶
import configparser
import pandas as pd
from sqlalchemy import create_engine
config = configparser.ConfigParser()
config.read('config.ini')
user = config['mysql']['user']
password = config['mysql']['password']
host = config['mysql']['host']
database = config['mysql']['database']
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
query = """
SELECT avg_daily_footfall, units_sold, sell_through_pct, event_type
FROM luxury_cosmetics_popups;
"""
df = pd.read_sql_query(query, engine)
df
avg_daily_footfall | units_sold | sell_through_pct | event_type | |
---|---|---|---|---|
0 | 1107 | 3056 | 98.90 | Flash Event |
1 | 1652 | 2782 | 86.32 | Flash Event |
2 | 752 | 2720 | 69.18 | Standalone Pop-Up |
3 | 1688 | 203 | 66.78 | Standalone Pop-Up |
4 | 1012 | 1292 | 80.10 | Shop-in-Shop |
... | ... | ... | ... | ... |
2128 | 1668 | 1180 | 64.06 | Shop-in-Shop |
2129 | 1350 | 1293 | 69.00 | Shop-in-Shop |
2130 | 2121 | 793 | 79.30 | Standalone Pop-Up |
2131 | 1210 | 548 | 70.71 | Mall Kiosk |
2132 | 987 | 2206 | 75.81 | Shop-in-Shop |
2133 rows × 4 columns
Calculate Pearson correlation matrix¶
correlations = df[['avg_daily_footfall', 'units_sold', 'sell_through_pct']].corr()
print("Correlation matrix:\n", correlations)
Correlation matrix: avg_daily_footfall units_sold sell_through_pct avg_daily_footfall 1.000000 -0.001518 0.063018 units_sold -0.001518 1.000000 0.365340 sell_through_pct 0.063018 0.365340 1.000000
Define features (X) and targets (Y)¶
features = ['sell_through_pct', 'event_type']
target_footfall = 'avg_daily_footfall'
target_units = 'units_sold'
Drop rows with missing values in relevant columns¶
df = df.dropna(subset=features + [target_footfall, target_units])
Define preprocessing: one-hot encode categorical event_type¶
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(drop='first'), ['event_type']),
],
remainder='passthrough' # Keep numeric columns like sell_through_pct as is
)
Build pipeline for footfall prediction¶
pipeline_footfall = Pipeline([
('preprocessor', preprocessor),
('regressor', LinearRegression())
])
X = df[features]
y_footfall = df[target_footfall]
pipeline_footfall.fit(X, y_footfall)
y_footfall_pred = pipeline_footfall.predict(X)
print(f"R-squared for avg_daily_footfall model: {r2_score(y_footfall, y_footfall_pred):.4f}")
R-squared for avg_daily_footfall model: 0.0064
Display coefficients with feature names¶
feature_names = pipeline_footfall.named_steps['preprocessor'].get_feature_names_out()
coefficients = pipeline_footfall.named_steps['regressor'].coef_
coef_df = pd.DataFrame({'feature': feature_names, 'coefficient': coefficients})
print(coef_df)
feature coefficient 0 cat__event_type_Mall Kiosk 58.932921 1 cat__event_type_Mobile Pop-Up Truck -11.191947 2 cat__event_type_Shop-in-Shop 33.977397 3 cat__event_type_Standalone Pop-Up 47.858326 4 remainder__sell_through_pct 2.567147
Build pipeline for units_sold prediction¶
pipeline_units = Pipeline([
('preprocessor', preprocessor),
('regressor', LinearRegression())
])
y_units = df[target_units]
pipeline_units.fit(X, y_units)
y_units_pred = pipeline_units.predict(X)
print(f"R-squared for units_sold model: {r2_score(y_units, y_units_pred):.4f}")
coef_units = pipeline_units.named_steps['regressor'].coef_
coef_units_df = pd.DataFrame({'feature': feature_names, 'coefficient': coef_units})
print(coef_units_df)
R-squared for units_sold model: 0.1342 feature coefficient 0 cat__event_type_Mall Kiosk -56.007271 1 cat__event_type_Mobile Pop-Up Truck -26.021316 2 cat__event_type_Shop-in-Shop -85.644864 3 cat__event_type_Standalone Pop-Up -49.351172 4 remainder__sell_through_pct 27.330442
Interpretation of Results¶
1. Correlation Analysis¶
The correlation matrix between avg_daily_footfall, units_sold, and sell_through_pct shows the linear strength of relationships between core performance metrics:
- avg_daily_footfall vs. units_sold: A positive correlation indicates that higher customer traffic generally leads to higher sales, as expected in retail or popup contexts.
- sell_through_pct vs. units_sold: A strong positive correlation suggests that efficient inventory turnover aligns with higher sales volumes. A weaker correlation implies that promotional dynamics or event characteristics may have greater influence.
- sell_through_pct vs. avg_daily_footfall: If weak, this means greater traffic does not necessarily guarantee higher stock conversion into sales.
These relationships act as exploratory diagnostics before building regression models.
2. Regression Model: Predicting Average Daily Footfall¶
We fitted a linear regression using sell_through_pct and event_type as predictors of footfall.
Model Fit (R²): The R² value indicates how much variance in footfall can be explained by the predictors.
- A higher R² (≥0.5) suggests event type and sell-through efficiency predict traffic well.
- A lower R² implies additional factors (location, marketing spend, or seasonal timing) may be more influential.
Coefficients:
- Sell_through_pct: A positive coefficient indicates that higher stock conversion efficiency relates to higher visitor traffic. A negative coefficient would suggest efficient sales were achieved despite lower traffic volumes.
- Event_type dummy variables: Each coefficient compares an event type to the baseline category. Positive values indicate the event attracts more customers relative to the baseline, negative coefficients indicate fewer.
3. Regression Model: Predicting Units Sold¶
We also modelled units_sold using the same predictors.
Model Fit (R²): This shows how much variation in sales is explained by event type and sell-through efficiency. Units sold are expected to be moderately predictable, but unexplained variation may reflect product type or promotional intensity.
Coefficients:
- Sell_through_pct: A strongly positive coefficient suggests more efficient inventory turnover directly drives higher volumes of units sold.
- Event type variables: Capture differences in sales performance across event types. For example, a product launch may yield a significant positive effect relative to the baseline popup store type.
4. Insights & Practical Implications¶
- Correlation analysis confirmed basic relationships between footfall, sales, and efficiency.
- The footfall model highlights which event strategies attract more traffic, useful for planning venue capacity and marketing spend.
- The sales model connects stock efficiency and event type to unit movement, informing product and stock allocation decisions.
- If either model shows a relatively low R², additional explanatory variables (e.g., price levels, discounts, product categories, or seasonality) should be incorporated to improve prediction power.
Overall, these models provide useful explanatory insights and a foundation for building more advanced forecasting pipelines.