top of page
1200px-RMS_Titanic_3.jpeg

Titanic Data

A Quick Look Under the Smokestacks

Titanic: Welcome

This is a short exploration of a set of data for the Titanic’s ill-fated maiden voyage, which contains records for 891 of the more than 2,000 passengers aboard when the ship sank.  The Pandas and Seaborn libraries for Python were used in this analysis.

Titanic: Text
Titanic: List

The data frame was loaded and info on the variables included was brought up for display.

Titanic1.png
Titanic2.png
Titanic: HTML Embed
Titanic3.png

We can see that most variables have a full set of 891 values corresponding to all passengers in the dataset.  We are missing embarkment info for two passengers, age info for 177 passengers, and, rather notably, accommodation (deck) info for the majority of people.

A quick look at the age range for those designated as children reveals that, in this dataset, children are defined as those age 15 and younger.

Titanic4.png

The distribution of passengers with respect to several categorical variables was examined, starting with city of embarkment.

Titanic5.png

Most people aboard the ship when it sank had boarded at the port where it was originally moored in Southampton, England, with smaller numbers having embarked during its stops in Cherbourg, France and Queenstown, Ireland.

Titanic6.png

There were clearly more men aboard than women or children.  As far as men versus women is concerned, the disparity may have had much to do with the social norms of the time.  In the western world of 1912, males had much more freedom of movement overall than did females; women faced many restrictions or societal expectations of restraint on the activities that they engaged in alone, especially if they were unwed.  So, there were probably a great number of men aboard the Titanic who were not traveling with women, and it is reasonable to guess that, by contrast, the majority of women aboard the ship were probably accompanied by men; most were likely married and traveling with their husbands (and children, if they had any).  Furthermore, it is likely true that of those women aboard who were traveling unaccompanied by an adult male (some of whom were with children and others of whom were not), most were coming to the United States to reunite with their husbands who had previously immigrated and found work; this was a very common practice at the time.  There would have likely been comparatively few single women on board, and also few married women whose travel was in the absence of and completely unrelated to their spouses.

karl_broback_P.webp

Karl Rudolf Brobäck, a single Swedish man, who perished in the sinking.

On a related note, we can see the numbers of men and women traveling without any family members present, and the percentages of all men and women aboard that those numbers represent.

Titanic21.png

There were many men traveling completely alone; in fact, this was true of the majority of men in this dataset (approximately 76 percent).  Less than half of women in the dataset were traveling alone, although the percentage is about 45, so this was nevertheless a fairly sizable proportion.  We don’t know how many of these women were traveling to meet husbands who were already stateside, but it is reasonable to guess that most were, given the social customs discussed above.  We would need information not present in the data to determine this for certain, however.

We can also look at the relative percentages of men and women out of all people traveling without family.  For this, we restrict our readout to people who had values of zero for both the “parch” variable (number of parents and children aboard) and the “sibsp” variable (number of siblings and spouses aboard), and then perform counts of men, women, and children within that selection.

Titanic22.png

Of those without family, just over three-quarters were men, and just under a quarter were women.  (To satisfy the curious and concerned, the relatively small number of children traveling without any family would have been accompanied by adult caretakers.)

Titanic23.png
Mollybrown.jpeg

Margaret Brown, a socialite who is perhaps the most famous survivor of the Titanic.  She is popularly known today as "the Unsinkable Molly Brown."

gus-cohen.webp

Gurshon Cohen, a young man from Great Britain who lodged with five other men who were not family members of his.  He survived the disaster.

When looking more broadly at the subject of whether passengers were traveling alone or in the company of others, and if accompanied, how big their party was, there were a wide range of cases and combinations, as suggested by the long printout above.  There were siblings traveling together, children traveling in the company of nannies and without parents present, adults traveling with their parents, strangers sharing cabins, and all manner of other arrangements.  Looking at all of this in depth is beyond the scope of this analysis.

Titanic7.png

There were far more third class travelers aboard than those in the other classes.  Since the ship had many more third class cabins than lodging for first and second class, this makes perfect sense.

I used histograms to show the distribution of passengers across age and fare.

Titanic8.png

For those passengers with age info recorded, most were clustered in the 15-50 age range.  Since we are dealing with only a portion of the entire dataset here, the proportions may not match up with those that would manifest if everyone had been included, but they are probably not drastically off, either; this is 80 percent of the data, after all.  Assuming these numbers are fairly representative, travelers tended to be older teens (defined as adults in the dataset) and those in their 20s, 30s, and 40s.  It would certainly make a good deal of sense that predominately healthy people in the prime of their life, seeking new opportunities and adventures, would be more likely to travel across an ocean than those in other age groups.  A great number of these people had no children with them, hence the much smaller numbers in the early age brackets.

Titanic9.png

I also produced an age histogram with a breakdown by sex.  In all but two 5-year age bins, we see that males outnumber females (the 10-15 bin has more girls than boys, and the 5-10 bin has an equal number of boys and girls).  It is clear from this that the overrepresentation of men aboard the ship was not confined to one or two adult age groups, but present in all of them.  This makes sense, as men’s greater degree of freedom in that era, mentioned above, was present regardless of age.

Titanic10.png
Titanic24.png

When it comes to fare, passengers were mostly clustered below the 100-pound mark, approximately.  The median fare, which accounts for outliers and skew, was only about 14 and a half pounds, as we can see.  It is clear that despite the Titanic’s renowned luxury accommodations, the preponderance of passengers did not pay exorbitant rates for their travel.  This is likely mainly due to the large proportion of third class travelers aboard.

Titanic11.png

A look at the median fares for men, women, and children reveals that men typically paid a much lower fare than did women and children.  I drilled down to proportions of passenger classes among men, women, and children to further explore this.

Titanic12.png
Titanic13.png

Among all three groups of people, the proportion traveling third class was higher than the proportions in the other two classes.  However, the actual third class proportions differed from each other.  There were more men traveling third class relative to all men in the dataset than there were third class women relative to all women, which explains the lower median fare for men.  However, there were an even greater proportion of third class children relative to all

Goodwinfamily.jpeg

The Goodwin family; all but baby Sidney pictured.  All eight family members died in the disaster.

children than there were third class men relative to the whole group of men, which would make one think that children should have a lower median fare than men.  Why men’s median fare was actually lower than children’s is not known.  It could be that fare took into account more than just the cost of accommodations, and things like amount of luggage were factored in.  If the children aboard had more baggage with them on average than men did, this might act to offset the low average cabin rates paid on their behalf.  Indeed, looking at median fares among men, women, and children within the various passenger classes shows that median rates for the same class differed between the three groups of people.  Without more information, these numbers are hard to fully explain.

We now turn our focus to the elephant in the room, survivorship.

Titanic14.png

The raw counts of survivors versus dead among the 891 people in this dataset show that more perished than survived, which was indeed true of the entire ship.

Examining survivorship filtered through some of the variables we explored above is fascinating.

Titanic15.png

We see that the vast majority of those who perished were men.  However, there are more men in our dataset than women and children, so it is helpful to look at the percentages of the different groups who died to see if any differentials exist in death rates.  In fact, we can already tell that the death rate for men was disproportionately high, given that the bar on our graph corresponding to dead men is much higher relative to dead women and children than the bar for living men is relative to living women and children.  It is still good, though, to obtain specific figures.

Titanic16.png

84 percent of the men in our dataset died, whereas only 24 percent of women and 41 percent of children died; this represents a vast difference.  The “women and children first” ethic of rescue triage doubtless applied here, as it nearly always does.

It is interesting to see the noticeably higher mortality rate among children versus women.  Many children may have lacked the stamina necessary to survive such a taxing ordeal owing in large part to their developmental state, and perished despite their age group being a focal point of rescue efforts.

Looking at who survived and died with respect to passenger class is very telling.

Titanic17.png
StateRoom.jpeg

A first class stateroom on board the ship.

We can see, based on the normalized counts of living and dead in each passenger class, that third class passengers had a much greater probability of dying than surviving, and second class passengers had a slightly greater rate of death than survival.  Only among the first class passengers is the situation reversed, with those alive representing a noticeably greater proportion of the whole than those dead.  It is not hard to imagine why first class passengers were so much more likely to survive than their counterparts in

the other classes.  Being VIPs, relatively speaking, they would likely have had the ear and attention of the ship’s crew and would have been catered to and prioritized during the emergency.  They also tended to have their berthing in the top three decks, which of course were the last to flood, and so would have had ample time to escape the foundering vessel.  The second class passengers’ greater probability of survival than that of the third class passengers follows similar logic.

Titanic18.png

Going off of our earlier assumption that having age information for four-fifths of our dataset provides us with the ability to perform fairly, if not perfectly, accurate age-related calculations, the age averages for living and dead were not vastly different from one another, using either median or mean; thus, it seems that age, on the whole, is a rather poor predictor of survival.

Titanic19.png

Those surviving paid higher fares, on average, than those who died, which tracks with the differential survival by class that we witnessed before.

Since deck information is missing for most passengers, it is unlikely that most analyses involving this variable would be of much practical use to us.  I did make an attempt to look at fare with respect to deck, however.

A bit of research reveals that decks A, B, and C accommodated mainly first class passengers; the other decks had higher percentages of second class and third class passengers.  When doing this specific analysis, one might think that, despite all the passengers omitted, we would still obtain a rough echo of what we would find if everybody were included, since the respective configurations of the decks and cabin types are theoretically limiting factors, and would seem to preclude the possibility that the “sample” statistics would be heavily skewed versus those for the whole population.  A quick check of plots with median and mean fares for different decks can confirm if the analysis is likely to be even remotely usable.

Adeck-promenade-deck.jpeg

The A-Deck Promenade.

Titanic20.png

Things look plausibly representative when it comes to decks B-G.  We can see that those staying on decks B and C had notably larger median fares than those on other decks, and the mean fares were even higher.  However, when it comes to deck A, we see a deviance from what we are expecting.  Deck A contained accommodations solely for first class passengers, and in fact, had some of the most luxurious state rooms on the ship, so we would expect the mean and median fares for this deck to be comparable to those for Decks B and C, if not higher.  This is not the case, however.  The mean and median fares for this deck come out much lower than those for B and C, and are actually more comparable to fares for decks D-G.  It stands to reason that somehow, lodging data for those staying in the deck A state rooms was mostly elided, with people in the smaller rooms on that deck having their accommodation info disproportionately recorded, thus giving an inaccurate picture of the fares for this level of the ship.

It is difficult to speculate as to why so much lodging information is missing, and whether or not there were systematic factors at play.  Again, the deck designation for those with state rooms on deck A, in particular, seems to be mostly absent from those records.  Given the relative paucity of lodging info, any attempt to glean real insights related to this variable will probably prove abortive.  Perhaps the missing information can one day be entered into a revised version of this dataset.

Titanic-lifeboat.gif

One of the Titanic's lifeboats, full of escaping passengers.

©2021 by Mike Kane. Proudly created with Wix.com

bottom of page