𝗧𝗶𝘁𝗮𝗻𝗶𝗰 𝗦𝘂𝗿𝘃𝗶𝘃𝗮𝗹 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄
The Titanic project is a great start for machine learning. It teaches you the full data science workflow.
The goal is to predict survival. I used age, gender, and ticket fare.
I started with data cleaning. I used Pandas. I fixed missing values in the Age and Cabin columns.
Next was Exploratory Data Analysis. I found key patterns:
- Women survived more than men.
- First class passengers survived more.
- Age and fare mattered.
I used Matplotlib and Seaborn for charts.
I created new features to help the model:
- FamilySize
- CabinKnown
- IsAlone
- Passenger Title
I built a preprocessing pipeline. I scaled numbers and encoded categories.
I tested many algorithms. XGBoost worked best for this data. I used Optuna for tuning. This optimized the learning rate and depth.
I measured success with accuracy and classification reports.
I used SHAP to explain the predictions. This shows you which features matter most. Your model is no longer a black box.
This project covers everything from raw data to model interpretation. Use it to practice your skills.