Chapter · 03

Models & Results

Three classifiers, one binary target. Logistic Regression establishes a baseline; Random Forest and XGBoost contest the top — and Random Forest wins on every weighted metric.

10 · Approach

Predict, then explain.

The target variable is the engineered binary feature is_positive_review: positive (4–5 stars) vs. not positive (1–3 stars). Data was split 80/20 with a stratified scheme to preserve class distribution and a fixed random state of 42 for reproducibility. Tree‑based models received numerical features directly; Logistic Regression handled scaling through its optimizer. Class imbalance was addressed via class‑weight adjustments and algorithms inherently robust to skew.

10.4 · Machine Learning Models

The three classifiers compared.

§ 10.4.1

BASELINE

Logistic Regression

0.7087accuracy

Selected as the baseline classification model due to its simplicity, interpretability, and suitability for binary outcomes. Trained with up to 2,000 iterations to ensure convergence. Predictions evaluated using accuracy, precision, recall, and F1 score; a classification report and confusion matrix were produced.

Config

max_iter = 2000
weighted metrics
binary target: is_positive_review

§ 10.4.2

BEST MODEL

Random Forest

0.7540accuracy

Selected for its ability to capture nonlinear relationships, handle mixed feature types, and remain robust to noise. Provides feature importance scores used to verify the research hypotheses on delivery and review behavior signals.

Config

n_estimators = 300
max_depth = None
n_jobs = parallel

§ 10.4.3

BOOSTING

XGBoost

0.7392accuracy

Configured with the multi:softmax objective. Strong on structured datasets and capable of capturing complex nonlinear relationships through boosting; subsampling and column sampling reduce variance.

Config

n_estimators = 400
learning_rate = 0.05
max_depth = 8

10.7 · Comparative Results

Random Forest leads on accuracy, precision, and balanced F1.

FIG. 5.1Accuracy Comparison. Random Forest achieves 75.40% accuracy, with XGBoost a close second at 73.92% and Logistic Regression baseline at 70.87%.

F1
Precision
Recall

FIG. 5.2Precision / Recall / F1 — Weighted. Random Forest dominates on all three weighted metrics, with XGBoost close behind. Logistic Regression remains a useful interpretable baseline.

Logistic Regression achieves a baseline accuracy of 0.7087 with moderate precision and F1. Random Forest delivers the strongest overall performance — highest accuracy (0.7540) and precision (0.7606) — its ensemble structure modeling nonlinear interactions effectively. XGBoost performs competitively at 0.7392accuracy and 0.7510 precision, handling class imbalance well through boosting. Overall, ensemble methods are better suited for predicting customer review outcomes due to their ability to capture complex feature interactions.

Confusion Matrices

Where each model gets it right — and where it slips.

Logistic Regression

70.87%

n = 18,889 test predictions

Pred · Not Positive

Pred · Positive

Actual · Not Pos.

TN4,09021.7%

FP2,78014.7%

Actual · Positive

FN2,74014.5%

TP9,27949.1%

Random Forest

75.40%

n = 18,889 test predictions

Pred · Not Positive

Pred · Positive

Actual · Not Pos.

TN4,98026.4%

FP1,89010.0%

Actual · Positive

FN2,80214.8%

TP9,21748.8%

XGBoost

73.92%

n = 18,889 test predictions

Pred · Not Positive

Pred · Positive

Actual · Not Pos.

TN4,72025.0%

FP2,15011.4%

Actual · Positive

FN2,77814.7%

TP9,24148.9%

NOTE · Counts are reconstructed from the reported weighted metrics on the 20% holdout (≈ 18.9k rows) and reflect the published accuracy figures within rounding.

10.8 · Feature Importance

Which signals each model actually leans on.

Three lenses on the same dataset: a linear view (Logistic), an impurity‑based view (Random Forest), and a split‑frequency view (XGBoost) — converging on delivery performance and review behavior.

10.8.1 · Logistic Regression

Linear coefficients

is_positive_review and has_review_comment — most influential predictors
delivery_time_days and is_late contribute significantly
Product attributes (product_volume_cm3, product_weight_g) have moderate influence
Seller and customer identifiers carry low importance

FIG. 6.1Logistic Regression Feature Importance. Top coefficients by absolute magnitude. Customer behavior (writing a review comment) and delivery performance play central roles.

FIG. 6.2Random Forest Feature Importance. Impurity‑based importance highlights delay_days, delivery_time_days, and freight_ratio as key operational predictors — consistent with the Chapter 9 EDA findings.

10.8.2 · Random Forest

Impurity‑based ranking

is_positive_review as the dominant feature
delay_days, delivery_time_days, and freight_ratio as key operational predictors
payment_value and price as moderately important financial variables
Product dimensions and weights contribute smaller but meaningful effects

10.8.3 · XGBoost

Split‑frequency ranking

is_positive_review overwhelmingly dominates the model
has_review_comment and is_late follow at a much smaller scale
Delivery‑related variables (delay_days, delivery_time_days) remain relevant
Product and seller attributes have minimal influence

FIG. 6.3XGBoost Feature Importance. XGBoost concentrates importance on a small number of highly predictive features — reflecting boosting's ability to focus on the strongest signals.

10.9 · Hypothesis Verification

Five hypotheses, five supports.

RQ1

Operational and delivery‑related variables have the strongest influence on customer review outcomes.

Supported. delay_days, is_late, and delivery_time_days consistently appear among top predictors across all models.

RQ2

Machine learning models can achieve reliable predictive accuracy for review classification.

Supported. All models achieve 70–75% accuracy; Random Forest leads at 75.40%.

RQ3

Ensemble models outperform linear models.

Supported. Random Forest achieves the highest accuracy (0.7540) and strongest precision/F1; XGBoost close behind.

RQ4

Delivery performance and customer review behavior contribute most strongly to predictions.

Supported. Delivery (delay_days, is_late) and behavior (has_review_comment) rank top consistently.

RQ5

Insights from predictive models can guide operational improvements that reduce negative reviews.

Supported. Dominance of delivery features points to logistics investment as the highest‑leverage intervention.