讀寫文本數(shù)據(jù)

讀寫各種不同編碼的文本數(shù)據(jù)
使用帶有 rt 模式的 open() 函數(shù)讀取文本文件
寫入一個(gè)文本文件，使用帶有 wt 模式的 open() 函數(shù)
已存在文件中追加內(nèi)容，使用模式為 at 的 open() 函數(shù)
文件的讀寫操作使用默認(rèn)的系統(tǒng)編碼，在讀的時(shí)候最好轉(zhuǎn)換為utf-8
查看系統(tǒng)的默認(rèn)編碼

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

盡量使用with語句讀取文本

with open('somefile.txt', 'rt', newline='') as f:
    pass

如果知道文件的編碼格式,可以以指定的編碼讀取

with open('somefile.txt', 'rt', encoding='latin-1') as f:
    pass

打印輸出至文件中

在 print() 函數(shù)中指定 file 關(guān)鍵字參數(shù)

def print_out():
    with open('test.txt', 'wt') as f:
        print("Hello World", file=f)

使用其他分隔符或行終止符打印

可以使用在 print() 函數(shù)中使用 sep 和 end 關(guān)鍵字參數(shù)

>>> print ('aaa', 90, 51, sep=',', end='!!!')
aaa,90,51!!!
#在輸出中禁止換行
>>> for i in range(5):
...     print (i, end=' ')
0 1 2 3 4

讀寫字節(jié)數(shù)據(jù)

使用模式為 rb 或 wb 的 open() 函數(shù)來讀取或?qū)懭攵M(jìn)制數(shù)據(jù)
字節(jié)字符串和文本字符串在迭代時(shí)返回值是不同的

>>> t = "hello world"
>>> t[0]
'h'
>>> t = b"hello world"
>>> t[0]
104

從二進(jìn)制模式的文件中讀取或者寫入數(shù)據(jù)，需要相應(yīng)的解碼和編碼

def wr_encode():
    with open('test.bin', 'wb') as f:
        f.write('hello world'.encode('utf-8'))
def rd_decode():
    with open('test.bin', 'rb') as f:
        data = f.read(16)
        text = data.decode('utf-8')

防止寫的文件被覆蓋

在 open() 函數(shù)中使用 x 模式來代替 w 模式來防止文件被重復(fù)寫入

>>> with open('test.txt', 'wt') as f:
...     f.write('Hello world')
>>> with open('test.txt', 'xt') as f:
...     f.write('Hello world')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
      with open('test.txt', 'xt') as f:
      FileExistsError: [Errno 17] File exists: 'test.txt'
#也可以在寫前面檢測文件是否存在
>>> import os
>>> os.path.exists('test.txt')
True

固定大小的文件迭代

在一個(gè)固定長度記錄或者數(shù)據(jù)塊的集合上迭代，而不是在一個(gè)文件中一行一行的迭代
不斷的產(chǎn)生固定長度的可迭代數(shù)據(jù)塊,直到結(jié)束。如果迭代的字節(jié)大小不是整數(shù)倍，最后一次迭代數(shù)據(jù)字節(jié)數(shù)會(huì)小
b''代表達(dá)到文件結(jié)尾的返回值
必須以rb 二進(jìn)制模式打開

from functools import partial
RECORD_SIZE = 32
def size_rb():
    with open('somefile.data', 'rb') as f:
        records = iter(partial(f.read, RECORD_SIZE), b'')
        for r in records:
            print (r)

讀取二進(jìn)制數(shù)據(jù)到可變緩沖區(qū)中

直接讀取二進(jìn)制數(shù)據(jù)到一個(gè)可變緩沖區(qū)中，而不需要做任何的中間復(fù)制操作
文件對象的 readinto() 方法能被用來為預(yù)先分配內(nèi)存的數(shù)組填充數(shù)據(jù)
返回實(shí)際讀取的字節(jié)數(shù)

import os
def read_into_buffer(filename):
    #設(shè)定緩沖區(qū)大小
    buf = bytearray(os.path.getsize(filename))
    with open(filename, 'rb') as f:
        f.readinto(buf)
    return buf

if __name__ == '__main__':
    with open('sample.bin', 'wb') as f:
        f.write(b'Hello World')
    #讀取數(shù)據(jù)到緩沖區(qū)
    buf = read_into_buffer('sample.bin')
    with open('newsample.bin', 'wb') as f:
        f.write(buf)

內(nèi)存映射的二進(jìn)制文件

使用 mmap 模塊來內(nèi)存映射文件
mmap() 返回的 mmap 對象同樣也可以作為一個(gè)上下文管理器來使用,這時(shí)候底層的文件會(huì)被自動(dòng)關(guān)閉。
默認(rèn)情況下， memeory_map() 函數(shù)打開的文件同時(shí)支持讀和寫操作。任何的修改內(nèi)容都會(huì)復(fù)制回原來的文件中。如果需要只讀的訪問模式，可以給參數(shù) access 賦值為 mmap.ACCESS_READ
如果你想在本地修改數(shù)據(jù)，但是又不想將修改寫回到原始文件中，可以使用 mmap.ACCESS_COPY
多個(gè)Python解釋器內(nèi)存映射同一個(gè)文件，得到的 mmap 對象能夠被用來在解釋器直接交換數(shù)據(jù)。也就是說，所有解釋器都能同時(shí)讀寫數(shù)據(jù)，并且其中一個(gè)解釋器所做的修改會(huì)自動(dòng)呈現(xiàn)在其他解釋器中

import os
import mmap

def memory_map(filename, access=mmap.ACCESS_WRITE):
    size = os.path.getsize(filename)
    fd = os.open(filename, os.O_RDWR)
    return mmap.mmap(fd, size, access=access)

if __name__ == '__main__':
    size = 1000000
    with open('data', 'wb') as f:
        #設(shè)置文件的起始位置
        f.seek(size-1)
        f.write(b'\x00')

    with memory_map('data') as m:
        print (len(m))
        m[0:11] = b'hello world'
        print (m[0:11])

文件路徑名的操作

使用os.path

>>> import os
>>> path = '/Users/beazley/Data/data.csv'
>>> os.path.basename(path)
'data.csv'
>>> os.path.dirname(path)
'/Users/beazley/Data'
>>> os.path.join('tmp', 'data', os.path.basename(path))
'tmp/data/data.csv'
>>> path = '~/Data/data.csv'
>>> os.path.expanduser(path)
'/Users/gongyulei/Data/data.csv'
>>> os.path.splitext(path)
('~/Data/data', '.csv')

測試文件是否存在

>>> os.path.exists('/etc/passwd')
True
>>> os.path.exists('/tmp/spam')
False
>>> os.path.exists('/tmp/')
True
>>> os.path.isfile('/etc/passwd')
True
>>> os.path.isdir('/etc/passwd')
False
#軟連接測試
>>> os.path.islink('/usr/local/bin/python3')
True
#文件的真實(shí)路徑
>>> os.path.realpath('/usr/local/bin/python3')
'/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/bin/python3.6'
#獲取大小
>>> os.path.getsize('/etc/passwd')
5925

獲取文件夾中的文件列表

#結(jié)果會(huì)返回目錄中所有文件列表
>>> os.listdir('.')
['ch5_10.py', 'ch5_2.py', 'ch5_4.py', 'ch5_8.py', 'ch5_9.py', 'data', 'newsample.bin', '
sample.bin', 'somefile.data', 'test.bin', 'test.gz', 'test.txt']
#執(zhí)行某種過濾
>>> name = [name for name in os.listdir() if os.path.isfile(os.path.join('.', name))]
>>> name
['ch5_10.py', 'ch5_2.py', 'ch5_4.py', 'ch5_8.py', 'ch5_9.py', 'data', 'newsample.bin', '
sample.bin', 'somefile.data', 'test.bin', 'test.gz', 'test.txt']
>>> name = [name for name in os.listdir() if os.path.isdir(os.path.join('.', name))]
>>> name
['test']
>>> pyfils = [name for name in os.listdir() if name.endswith('.py')]
>>> pyfils
['ch5_10.py', 'ch5_2.py', 'ch5_4.py', 'ch5_8.py', 'ch5_9.py']

收集文件的其他數(shù)據(jù)

import os
import glob

def main():
    pyfiles = glob.glob('*.py')
    print (pyfiles)
    name_sz_date = [(name, os.path.getsize(name), os.path.getmtime(name))\
                   for name in pyfiles]
    for name, size, mtime in name_sz_date:
        print (name, size, mtime)

if __name__ == '__main__':
    main()

增加或改變已打開文件的編碼

給一個(gè)以二進(jìn)制模式打開的文件添加Unicode編碼/解碼方式,借助io.TextIOWrapper()
io.TextIOWrapper()是一個(gè)編碼和解碼Unicode的文本處理層

import urllib.request
import io

def encode_text():
    u = urllib.request.urlopen('http://www.python.org')
    f = io.TextIOWrapper(u, encoding='utf-8')
    #編碼成utf-8
    text = f.read()
    print (text)

if __name__ == '__main__':
    encode_text()

創(chuàng)建臨時(shí)文件和文件夾

在程序執(zhí)行時(shí)創(chuàng)建一個(gè)臨時(shí)文件或目錄，并希望使用完之后可以自動(dòng)銷毀掉
w+t'為文本模式,delete=False，表示臨時(shí)文件不會(huì)被刪除

from tempfile import NamedTemporaryFile, TemporaryDirectory

def create_tmp():
    #'w+t'為文本模式
    with NamedTemporaryFile('w+t', delete=False) as f:
        print ('filename is :', f.name)
        f.write('Hello World\n')
        f.write('Testing\n')
        f.seek(0)
        data = f.read()

    with TemporaryDirectory() as dirname:
        print ('dirname is :', dirname)

if __name__ == '__main__':
    create_tmp()

可以自己定制自己的臨時(shí)文件規(guī)則

>>> from tempfile import NamedTemporaryFile
>>> f = NamedTemporaryFile(prefix='mytemp', suffix='.txt', dir='/tmp')
>>> f.name
'/tmp/mytempurb_uyz4.txt'
>>>

序列化python對象

使用dump寫入，load解包。寫入和解包的數(shù)據(jù)類型可疑是類，List等

>>> import pickle
>>> f = open('test.txt', 'wb')
>>> pickle.dump([1, 2, 3], f)
>>> pickle.dump({'aaa', '111'}, f)
>>> f = open('test.txt', 'rb')
>>> pickle.load(f)
[1, 2, 3]
>>> pickle.load(f)
{'111', 'aaa'}

某些對象是無法進(jìn)行pickle的,比如打開的網(wǎng)絡(luò)連接，線程。class 可以通過getstate和setstats解決

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

文件和io(cookbook筆記)

文件和io(cookbook筆記)

讀寫文本數(shù)據(jù)

讀寫文本數(shù)據(jù)

打印輸出至文件中

使用其他分隔符或行終止符打印

讀寫字節(jié)數(shù)據(jù)

防止寫的文件被覆蓋

固定大小的文件迭代

讀取二進(jìn)制數(shù)據(jù)到可變緩沖區(qū)中

內(nèi)存映射的二進(jìn)制文件

文件路徑名的操作

測試文件是否存在

獲取文件夾中的文件列表

增加或改變已打開文件的編碼

創(chuàng)建臨時(shí)文件和文件夾

序列化python對象

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

文件和io(cookbook筆記)

讀寫文本數(shù)據(jù)

讀寫文本數(shù)據(jù)

打印輸出至文件中

使用其他分隔符或行終止符打印

讀寫字節(jié)數(shù)據(jù)

防止寫的文件被覆蓋

固定大小的文件迭代

讀取二進(jìn)制數(shù)據(jù)到可變緩沖區(qū)中

內(nèi)存映射的二進(jìn)制文件

文件路徑名的操作

測試文件是否存在

獲取文件夾中的文件列表

增加或改變已打開文件的編碼

創(chuàng)建臨時(shí)文件和文件夾

序列化python對象

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av