一直有看到網(wǎng)上有討論Python2和Python3的比較,最近公司也在考慮是否在spark-python大數(shù)據(jù)開發(fā)環(huán)境中升級到python3。通過本篇博文記錄Python2.7.13和Pthon3.5.3的各方面比較。
環(huán)境配置
這里繼續(xù)使用我們在之前博文里配置的環(huán)境。
因為是比較Python2和Python3差異,所以單純升級Python版本無法解決,我通過pyenv和virtualenv兩個工具來實現(xiàn)隔離的測試環(huán)境。
參考文檔:使用pyenv和virtualenv搭建python虛擬環(huán)境、使用 pyenv 可以在一個系統(tǒng)中安裝多個python版本
配置的步驟如下:
- 最開始是更新Tkinter,不然后續(xù)要重新再來一次,不要問我為什么知道...
sudo yum install tkinter -y
sudo yum install tk-devel tcl-devel -y
- 更新pyenv依賴軟件
sudo yum install readline readline-devel readline-static -y
yum install openssl openssl-devel openssl-static -y
yum install sqlite-devel -y
yum install bzip2-devel bzip2-libs -y
- 下載安裝pyenv,并下載python2.7.13和python3.5.3
git clone https://github.com/yyuu/pyenv.git ~/.pyenv
chgmod 777 -R ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
exec $SHELL
source ~/.bash_profile
pyenv install --list
pyenv install -v 2.7.13
pyenv install -v 3.5.3
- 下載安裝pyenv-virtualenv,并安裝兩個隔離環(huán)境
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bash_profile
source ~/.bash_profile
pyenv virtualenv 2.7.13 py2
pyenv virtualenv 3.5.3 py3
好,到此基本搞定兩個隔離的python環(huán)境,測試如下,我們可以發(fā)現(xiàn)當前的python環(huán)境從centos7默認的2.7.5切換到2.7.13再切換到3.5。
[kejun@localhost ~]$ python -V
Python 2.7.5
[kejun@localhost ~]$ pyenv activate py2
(py2) [kejun@localhost ~]$ python -V
Python 2.7.13
(py2) [kejun@localhost ~]$ pyenv deactivate
[kejun@localhost ~]$ pyenv activate py3
(py3) [kejun@localhost ~]$ python -V
Python 3.5.
詳細測試:
我們安裝了常用的數(shù)據(jù)分析第三方工具包,并做了安裝測試和樣例測試,樣例測試的腳本見最下。
| 分類 | 工具名 | 用途 |
|---|---|---|
| 數(shù)據(jù)收集 | scrapy | 網(wǎng)頁采集,爬蟲 |
| 數(shù)據(jù)收集 | scrapy-redis | 分布式爬蟲 |
| 數(shù)據(jù)收集 | selenium | web測試,仿真瀏覽器 |
| 數(shù)據(jù)處理 | beautifulsoup | 網(wǎng)頁解釋庫,提供lxml的支持 |
| 數(shù)據(jù)處理 | lxml | xml解釋庫 |
| 數(shù)據(jù)處理 | xlrd | excel文件讀取 |
| 數(shù)據(jù)處理 | xlwt | excel文件寫入 |
| 數(shù)據(jù)處理 | xlutils | excel文件簡單格式修改 |
| 數(shù)據(jù)處理 | pywin32 | excel文件的讀取寫入及復雜格式定制 |
| 數(shù)據(jù)處理 | Python-docx | Word文件的讀取寫入 |
| 數(shù)據(jù)分析 | numpy | 基于矩陣的數(shù)學計算庫 |
| 數(shù)據(jù)分析 | pandas | 基于表格的統(tǒng)計分析庫 |
| 數(shù)據(jù)分析 | scipy | 科學計算庫,支持高階抽象和復雜模型 |
| 數(shù)據(jù)分析 | statsmodels | 統(tǒng)計建模和計量經濟學工具包 |
| 數(shù)據(jù)分析 | scikit-learn | 機器學習工具庫 |
| 數(shù)據(jù)分析 | gensim | 自然語言處理工具庫 |
| 數(shù)據(jù)分析 | jieba | 中文分詞工具庫 |
| 數(shù)據(jù)存儲 | MySQL-python | mysql的讀寫接口庫 |
| 數(shù)據(jù)存儲 | mysqlclient | mysql的讀寫接口庫 |
| 數(shù)據(jù)存儲 | SQLAlchemy | 數(shù)據(jù)庫的ORM封裝 |
| 數(shù)據(jù)存儲 | pymssql | sql server讀寫接口庫 |
| 數(shù)據(jù)存儲 | redis | redis的讀寫接口 |
| 數(shù)據(jù)存儲 | PyMongo | mongodb的讀寫接口 |
| 數(shù)據(jù)呈現(xiàn) | matplotlib | 流行的數(shù)據(jù)可視化庫 |
| 數(shù)據(jù)呈現(xiàn) | seaborn | 美觀的數(shù)據(jù)可是湖庫,基于matplotlib |
| 工具輔助 | jupyter | 基于web的python IDE,常用于數(shù)據(jù)分析 |
| 工具輔助 | chardet | 字符檢查工具 |
| 工具輔助 | ConfigParser | 配置文件讀寫支持 |
| 工具輔助 | requests | HTTP庫,用于網(wǎng)絡訪問 |
# encoding=utf-8
import sys
import platform
import traceback
import gc
import ctypes
STD_OUTPUT_HANDLE= -11
FOREGROUND_BLACK = 0x0
FOREGROUND_BLUE = 0x01 # text color contains blue.
FOREGROUND_GREEN= 0x02 # text color contains green.
FOREGROUND_RED = 0x04 # text color contains red.
FOREGROUND_INTENSITY = 0x08 # text color is intensified.
class WinPrint:
"""
提供給Windows打印彩色字體使用
"""
std_out_handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
def set_cmd_color(self, color, handle=std_out_handle):
bool = ctypes.windll.kernel32.SetConsoleTextAttribute(handle, color)
return bool
def reset_color(self):
self.set_cmd_color(FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)
def print_red_text(self, print_text):
self.set_cmd_color(FOREGROUND_RED | FOREGROUND_INTENSITY)
print (print_text)
self.reset_color()
def print_green_text(self, print_text):
self.set_cmd_color(FOREGROUND_GREEN | FOREGROUND_INTENSITY)
print (print_text)
self.reset_color()
class UnixPrint:
"""
提供給Centos打印彩色字體
"""
def print_red_text(self, print_text):
print('\033[1;31m%s\033[0m'%print_text)
def print_green_text(self, print_text):
print('\033[1;32m%s\033[0m'%print_text)
py_env = "Python2" if sys.version.find("2.7") > -1 else "Python3"
sys_ver = "Windows" if platform.system().find("indows") > -1 else "Centos"
my_print = WinPrint() if platform.system().find("indows") > -1 else UnixPrint()
def check(sys_ver, py_env):
"""
裝飾器,統(tǒng)一輸入輸出
順便測試帶參數(shù)的裝飾器,非必須帶參數(shù)
"""
def _check(func):
def __check():
try:
func()
my_print.print_green_text(
"[%s,%s]: %s pass." % (sys_ver, py_env, func.__name__))
except:
traceback.print_exc()
my_print.print_red_text(
"[%s,%s]: %s fail." % (sys_ver, py_env, func.__name__))
return __check
return _check
def make_requirement(filepath, filename):
"""
處理pip requirements文件
"""
result = []
with open(filepath + "\\" + filename, "r") as f:
data = f.readlines()
for line in data:
if line.find("==") > -1:
result.append(line.split("==")[0] + "\n")
else:
result.append(line + "\n")
with open(filepath + "\\" + filename.split(".")[0] + "-clean.txt",
"w") as f1:
f1.writelines(result)
@check(sys_ver, py_env)
def test_scrapy():
from scrapy import signals
from selenium import webdriver
from scrapy.http import HtmlResponse
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
@check(sys_ver, py_env)
def test_matplotlib():
import matplotlib.pyplot as plt
l = [1, 2, 3, 4, 5]
h = [20, 14, 38, 27, 9]
w = [0.1, 0.2, 0.3, 0.4, 0.5]
b = [1, 2, 3, 4, 5]
fig = plt.figure()
ax = fig.add_subplot(111)
rects = ax.bar(l, h, w, b)
# plt.show()
@check(sys_ver, py_env)
def test_beautifulSoup():
from bs4 import BeautifulSoup
html_str = "<html><meta/><head><title>Hello</title></head><body onload=crash()>Hi all<p></html>"
soup = BeautifulSoup(html_str, "lxml")
# print (soup.get_text())
@check(sys_ver, py_env)
def test_lxml():
from lxml import html
html_str = "<html><meta/><head><title>Hello</title></head><body onload=crash()>Hi all<p></html>"
html.fromstring(html_str)
@check(sys_ver, py_env)
def test_xls():
import xlrd
import xlwt
from xlutils.copy import copy
excel_book2 = xlwt.Workbook()
del excel_book2
excel_book1 = xlrd.open_workbook("1.xlsx")
del excel_book1
import docx
doc = docx.Document("1.docx")
# print (doc)
del doc
gc.collect()
@check(sys_ver, py_env)
def test_data_analysis():
import pandas as pd
import numpy as np
data_list = np.array([x for x in range(100)])
data_serial = pd.Series(data_list)
# print (data_serial)
from scipy import fft
b = fft(data_list)
# print (b)
@check(sys_ver, py_env)
def test_statsmodels():
import statsmodels.api as sm
data = sm.datasets.spector.load()
data.exog = sm.add_constant(data.exog, prepend=False)
# print data.exog
@check(sys_ver, py_env)
def test_sklearn():
from sklearn import datasets
iris = datasets.load_iris()
data = iris.data
# print(data.shape)
@check(sys_ver, py_env)
def test_gensim():
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
from gensim import corpora
from collections import defaultdict
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
frequency = defaultdict(int)
for text in texts:
for token in text:
frequency[token] += 1
texts = [[token for token in text if frequency[token] > 1]
for text in texts]
dictionary = corpora.Dictionary(texts)
dictionary.save('deerwester.dict')
@check(sys_ver, py_env)
def test_jieba():
import jieba
seg_list = jieba.cut("我來到了北京參觀天安門。", cut_all=False)
# print("Default Mode: " + "/ ".join(seg_list)) # 精確模式
@check(sys_ver, py_env)
def test_mysql():
import MySQLdb as mysql
#測試pet_shop連接
db = mysql.connect(host="xx", user="yy", passwd="12345678", db="zz")
cur = db.cursor()
sql="select id from role;"
cur.execute(sql)
result = cur.fetchall()
db.close()
# print (result)
@check(sys_ver, py_env)
def test_SQLAlchemy():
from sqlalchemy import Column, String, create_engine,Integer
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('mysql://xxx/yy',echo=False)
DBSession = sessionmaker(bind=engine)
Base = declarative_base()
class rule(Base):
__tablename__="role"
id=Column(Integer,primary_key=True,autoincrement=True)
role_name=Column(String(100))
role_desc=Column(String(255))
new_rule=rule(role_name="test_sqlalchemy",role_desc="forP2&P3")
session=DBSession()
session.add(new_rule)
session.commit()
session.close()
@check(sys_ver, py_env)
def test_redis():
import redis
pool = redis.Redis(host='127.0.0.1', port=6379)
@check(sys_ver, py_env)
def test_requests():
import requests
r=requests.get(url="http://www.cnblogs.com/kendrick/")
# print (r.status_code)
@check(sys_ver, py_env)
def test_PyMongo():
from pymongo import MongoClient
conn=MongoClient("localhost",27017)
if __name__ == "__main__":
print ("[%s,%s] start checking..." % (sys_ver, py_env))
test_scrapy()
test_beautifulSoup()
test_lxml()
test_matplotlib()
test_xls()
test_data_analysis()
test_sklearn()
test_mysql()
test_SQLAlchemy()
test_PyMongo()
test_gensim()
test_jieba()
test_redis()
test_requests()
test_statsmodels()
print ("[%s,%s] finish checking." % (sys_ver, py_env))