代碼
%matplotlib inline
import matplotlib
import pandas as pd
df = pd.read_csv('/Users/yss/Downloads/Most-Recent-Cohorts-All-Data-Elements.csv', usecols=['INSTNM', 'REGION', 'ADM_RATE', 'SAT_AVG', 'COSTT4_A'] )
savedf = df
cleandf = df[df.ADM_RATE > 0]
df= cleandf
cleandf = df[df.SAT_AVG > 0]
df= cleandf
def sat(sat):
try:
t = sat/1000
except ValueError:
t = 0
return t
def expense(tuition):
try:
t = tuition/50000
except ValueError:
t = 0
return t
df.iloc[:, 3] = df.iloc[:, 3].apply(sat)
df.iloc[:, 4] = df.iloc[:, 4].apply(expense)
x= df[['REGION','SAT_AVG','ADM_RATE','COSTT4_A' ]]
y= x.set_index('REGION')
z=y.groupby('REGION').mean()
z.plot.bar(stacked=True)
代碼解釋
對(duì)上述代碼各部分進(jìn)行以下解釋。
首先是用pandas讀取數(shù)據(jù)我們指定的列,變成dataframe,并進(jìn)行過濾,提取有效數(shù)據(jù)。
df = pd.read_csv('/Users/yss/Downloads/Most-Recent-Cohorts-All-Data-Elements.csv', usecols=['INSTNM', 'REGION', 'ADM_RATE', 'SAT_AVG', 'COSTT4_A'] )
savedf = df
cleandf = df[df.ADM_RATE > 0]
df= cleandf
cleandf = df[df.SAT_AVG > 0]
df= cleandf
然后我們定義了兩個(gè)函數(shù),對(duì)數(shù)據(jù)進(jìn)行處理,這樣可以使兩列在同一數(shù)量級(jí),畫出的圖形更加美觀。
def sat(sat):
try:
t = sat/1000
except ValueError:
t = 0
return t
def expense(tuition):
try:
t = tuition/50000
except ValueError:
t = 0
return t
df.iloc[:, 3] = df.iloc[:, 3].apply(sat)
df.iloc[:, 4] = df.iloc[:, 4].apply(expense)
處理完成之后,將REGION字段作為我們的索引,這也是圖像的x軸變量。我們以REGION做分組,然后求出每組的均值,最后使用z.plot.bar(stacked=True)繪制圖形。
x= df[['REGION','SAT_AVG','ADM_RATE','COSTT4_A' ]]
y= x.set_index('REGION')
z=y.groupby('REGION').mean()
z.plot.bar(stacked=True)
遇到的問題
- 使用conda創(chuàng)建的虛擬環(huán)境,然后再使用pip安裝完matplot之后,產(chǎn)生以下異常
Traceback (most recent call last):
File "mp.py", line 2, in <module>
import matplotlib.pyplot as plt
File "/Users/yss/.anaconda/anaconda3/envs/py3.4/lib/python3.4/site-packages/matplotlib/pyplot.py", line 115, in <module>
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/Users/yss/.anaconda/anaconda3/envs/py3.4/lib/python3.4/site-packages/matplotlib/backends/__init__.py", line 62, in pylab_setup
[backend_name], 0)
File "/Users/yss/.anaconda/anaconda3/envs/py3.4/lib/python3.4/site-packages/matplotlib/backends/backend_macosx.py", line 17, in <module>
from matplotlib.backends import _macosx
RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.
按stack_over_flow上的方法可以解決。
具體是執(zhí)行shell命令:
echo backend: TkAgg > ~/.matplotlib/matplotlibrc