UnicodeDecodeError, invalid continuation byte

當用pandas庫讀取.csv文件時,出現(xiàn)如下報錯:
My Code:

impor tpandas as pd
df=pd.read_csv('C:\\Users\\登亮\\Desktop\\test.csv',encoding='utf-8')

Error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

Reason:
In binary, 0xE9 looks like1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example

>>>b'\xe9\x80\x80'.decode('utf-8')u'\u9000'

But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:

>>>u'\xe9'.encode('utf-8')b'\xc3\xa9'>>>u'\xe9'.encode('latin-1')b'\xe9'

(Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)
Solution:
Ttry calling read_csv withen coding='latin1',encoding='iso-8859-1'orencoding='cp1252'; these the various encodings found on Windows.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的閱讀 13,660評論 5 6
  • 之前不知怎的 經(jīng)常有寫不完的繁文 可能是圖片很美 也可能是思念故友 如今卻盯著手機 9格鍵盤24個拼音 依然打不出...
    蘿瑣閱讀 309評論 0 0
  • 當 樹發(fā)現(xiàn)了根的騙局 蒲公英在自由落體 夢境 在五月的清晨 丟失了失重感 當 樹找不到歸途 種子在鳥兒的肚子里 遠...
    云翳閱讀 3,795評論 0 4
  • 從小到大我一直是個固守成規(guī)的人,說白了,老實人一個,老師讓干嘛干嘛,說往東絕不敢往西的那種人。 小學的時候,壓根...
    曲奇_52閱讀 274評論 0 0

友情鏈接更多精彩內容