titanic解題報(bào)告

Predict survival on the Titanic using Excel, Python, R & Random Forests

1.Description

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.

2.DataSet

VARIABLE DESCRIPTIONS:
survival Survival
(0 = No; 1 = Yes)
pclass Passenger Class
(1 = 1st; 2 = 2nd; 3 = 3rd)
name Name
sex Sex
age Age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare
cabin Cabin
embarked Port of Embarkation
(C = Cherbourg; Q = Queenstown; S = Southampton)

3.Code(Python)


# Imports
import pandas as pd
import numpy as np
from pandas import Series,DataFrame

data_train = pd.read_csv("train.csv")
#print data_train.columns
#print data_train.info()
#print data_train.describe()

import matplotlib.pyplot as plt
fig = plt.figure()
fig.set(alpha=0.3)

plt.subplot2grid((2,3), (0,0))
data_train.Survived.value_counts().plot(kind='bar')
plt.title(u"Survive(1,Survived)")
plt.ylabel(u'Count')
#plt.show()

plt.subplot2grid((2,3), (0,1))
data_train.Pclass.value_counts().plot(kind='bar')
plt.ylabel(u'Count')
plt.title(u'Prank')

plt.subplot2grid((2,3), (0,2))
plt.scatter(data_train.Survived, data_train.Age)
plt.ylabel(u'Age')
plt.grid(b=True, which='major', axis='y')
plt.title(u'Survived by age(1, Survived)')

plt.subplot2grid((2,3), (1,0), colspan=2)
data_train.Age[data_train.Pclass == 1].plot(kind='kde')
data_train.Age[data_train.Pclass == 2].plot(kind='kde')
data_train.Age[data_train.Pclass == 3].plot(kind='kde')
plt.xlabel(u'Age')
plt.ylabel(u'density')
plt.title(u'Age of all Pclass')
plt.legend((u'class_1', u'class_2',u'class_3'), loc='best')

plt.subplot2grid((2,3), (1, 2))
data_train.Embarked.value_counts().plot(kind='bar')
plt.title(u'COunt of Embarked')
plt.ylabel(u'COunt')
#plt.show()

fig = plt.figure()
fig.set(alpha=0.2)

Survived_0 = data_train.Pclass[data_train.Survived == 0].value_counts()
Survived_1 = data_train.Pclass[data_train.Survived == 1].value_counts()
df = pd.DataFrame({u'Survived':Survived_1, 'unsurvived':Survived_0})
df.plot(kind='bar', stacked=True)
plt.title(u'Pclass of all')
plt.xlabel(u'Survive of all')
plt.ylabel(u'Count')
#plt.show()

fig = plt.figure()
fig.set(alpha=0.2)
Survived_0 = data_train.Embarked[data_train.Survived == 0].value_counts()
Survived_1 = data_train.Embarked[data_train.Survived == 1].value_counts()
df = pd.DataFrame({u'Survived':Survived_1, u'Unsurvived':Survived_0})
df.plot(kind='bar', stacked=True)
plt.title(u'Survive of all Embarked')
plt.xlabel(u'Embarked')
plt.ylabel(u'Count')
#plt.show()

fig = plt.figure()
fig.set(alpha=0.2)
Survived_m = data_train.Survived[data_train.Sex == 'male'].value_counts()
Survived_f = data_train.Survived[data_train.Sex == 'female'].value_counts()
df = pd.DataFrame({u'male':Survived_m, u'female':Survived_f})
df.plot(kind='bar', stacked=True)
plt.title('Survive by Sex')
plt.xlabel('Count')
plt.show()

fig = plt.figure()
fig.set(alpha=0.65)
plt.title(u'Survive by Pclass and Sex')

ax1 = fig.add_subplot(141)
data_train.Survived[data_train.Sex == 'female'][data_train.Pclass != 3].value_counts().plot(kind='bar', label="female highclass",  color='#FA2479')
ax1.set_xticklabels([u'Survived', u'Unsurvived'], rotation=0)
ax1.legend([u'female/highclass'], loc='best')

ax2 = fig.add_subplot(142, sharey=ax1)
data_train.Survived[data_train.Sex == 'female'][data_train.Pclass == 3].value_counts().plot(kind='bar', label='female, low class', color='pink')
ax2.set_xticklabels([u"Unsurvived", u"Survived"], rotation=0)
plt.legend([u"female/lowclass"], loc='best')

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 奇怪是我從來不安定 可憐是我未曾想好命 那么久都一個(gè)人過了 仍怕以后獨(dú)自賞風(fēng)景 我也想有個(gè)人撒嬌 我也想被愛人討好...
    北北小姐姐閱讀 478評(píng)論 0 0
  • 20160925,星期日,凌晨,深圳 我是個(gè)感性的人,活了三十年,每每看到感人的一幕,不論現(xiàn)實(shí)中還是視頻還是電影,...
    槿靈兒閱讀 282評(píng)論 0 0
  • -5thOasis- ---- 在這無(wú)垠荒漠之中,你是綠洲,在我心中的海市蜃樓搖曳蕩漾 我妄想掬起一瓢甘甜,卻無(wú)聲...
    BEAR貝爾閱讀 344評(píng)論 0 0
  • 趕緊上來寫篇簡(jiǎn)書壓壓驚。 事情經(jīng)過 課程背景槽點(diǎn)君八月底報(bào)了個(gè)數(shù)據(jù)科學(xué)的網(wǎng)課,那個(gè)網(wǎng)課客觀來講,有難度,要接觸一堆...
    槽點(diǎn)君Ezra閱讀 360評(píng)論 0 0

友情鏈接更多精彩內(nèi)容