Pandas包函數(shù)快速查找手冊

使用python做數(shù)據(jù)分析最關鍵的庫之一pandas,在數(shù)據(jù)處理最為常用,pandas中的函數(shù)分為如下幾大類

1. 輸入輸出類

讀入數(shù)據(jù)類

  1. pd.read_csv(filename) -from a CSV file
  2. pd.read_excel(filename) -from a excel file
  3. pd.read_sql(query, connection_object) -Reads from a SQL table/database
  4. pd.read_json(json_string) - Reads from a JSON formatted string, URL or file.
  5. pd.read_html(url) - Parses an html URL, string or file and extracts tables to a list of dataframes
  6. pd.read_clipboard()- Takes the contents of your clipboard and passes it to read_table()
  7. pd.DataFrame(dict) -from a dict,keys for columns name, values for data as lists

輸出數(shù)據(jù)類函數(shù)

  1. df.to _excel(filename) - Writes to an Excel file
  2. df.to_csv(filename) Writes to a CSV file
  3. df.to_sql(table_name, connection_object) -writes to a SQL table
  4. df.to_json(filename) - Writes to a file in JSON format
  5. df.to_html(filename)- Saves as an HTML table
  6. df.to_clipboard() Writes to the clipboard

2. 生成測試數(shù)據(jù)

  1. pd.DataFrame(np.random.rand(20,5)) -生成一個20行5列的隨機浮點數(shù)數(shù)據(jù)框
  2. pd.Series(my_list) -由一個可迭代的my_list生成一個Series
  3. df.index = pd.date_range('1900/1/30',periods = df.shape[0]) -增加一個時間序列的index

3. 查看數(shù)據(jù)總體情況

  1. df.head(n)
  2. df.tail(n)
  3. df.shape() number of rows and columns
  4. df.info()- Index, Datatype and Memory information
  5. df.describe()- Summary statistics for numerical columns
  6. s.value_counts(dropna = False) -查看唯一的值并計數(shù)
  7. df.apply(pd.Series.value_couonts) - 對所有列唯一值計數(shù)

4. 數(shù)據(jù)選取

  1. df[col] 作為Series返回col列
  2. df[[col1, col2]] 返回多列數(shù)據(jù),作為新數(shù)據(jù)框返回
  3. s.iloc[0]- Selection by position
  4. s.loc[0]- Selection by index
  5. df.iloc[0,:] - First row
  6. df.iloc[0,0]- First element of first column

5. 數(shù)據(jù)清洗

  1. df.columns = ['a','b','c']- 重命名列名
  2. pd.isnull() - 檢查空值,返回布爾值數(shù)組
  3. pd.notnull() - Opposite of s.isnull()
  4. df.dropna()-刪除所有包含NA值的行 Drops all rows that contain null values
  5. df.dropna(axis=1) - 刪除所有包含NA的列Drops all columns that contain null values
  6. df.dropna(axis=1,thresh=n) - 刪除所有行中NA個數(shù)大于你的行 /Drops all rows have less than n non null values
  7. df.fillna(x) - 用X填充NA /Replaces all null values with x
  8. s.fillna(s.mean()) - 用均值填充NA /Replaces all null values with the mean (mean can be replaced with almost any function from the statistics section)
  9. s.astype(float) -將Series的數(shù)據(jù)類型轉換為float / Converts the datatype of the series to float
  10. s.replace(1,'one') - 用'one'代替1 /Replaces all values equal to 1 with 'one'
  11. s.replace([1,3],['one','three']) - Replaces all 1 with 'one' and 3 with 'three'
  12. df.rename(columns=lambda x: x + 1) - 對列進行大規(guī)模重命名 /Mass renaming of columns
  13. df.rename(columns={'old_name': 'new_ name'}) - 選擇性重命名列名 /Selective renaming
  14. df.set_index('column_one') - 更改index /Changes the index
  15. df.rename(index=lambda x: x + 1) - 大規(guī)模更改index
    /Mass renaming of index

6. 過濾、排序和分組

  1. df[df[col] > 0.5] - Rows where the col column is greater than 0.5
  2. df[(df[col] > 0.5) & (df[col] < 0.7)] - Rows where 0.7 > col > 0.5
  3. df.sort_values(col1) -按col1升序排序 Sorts values by col1 in ascending order
  4. df.sort_values(col2,ascending=False) -按col2降序排序 Sorts values by col2 in descending order
  5. df.sort_values([col1,col2], ascending=[True,False]) - Sorts values by col1 in ascending order then col2 in descending order
  6. df.groupby(col) - Returns a groupby object for values from one column
  7. df.groupby([col1,col2]) - Returns a groupby object values from multiple columns
  8. df.groupby(col1)[col2].mean()/df.groupby(col1).mean()[col2] - Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics section)
  9. df.pivot_table(index=col1,values= [col2,col3],aggfunc=mean) - 創(chuàng)建一個透視表,根據(jù)col1分組,計算col2,col3的均值 /Creates a pivot table that groups by col1 and calculates the mean of col2 and col3
  10. df.groupby(col1).agg(np.mean) - Finds the average across all columns for every unique column 1 group
  11. df.apply(np.mean) - Applies a function across each column
  12. df.apply(np.max, axis=1) - Applies a function across each row

7. 統(tǒng)計函數(shù)

These can all be applied to a series as well.

  1. df.describe() - Summary statistics for numerical columns
  2. df.mean() - Returns the mean of all columns
  3. df.corr() - Returns the correlation between columns in a DataFrame
  4. df.count() - Returns the number of non-null values in each DataFrame column
  5. df.max() - Returns the highest value in each column
  6. df.min() - Returns the lowest value in each column
  7. df.median() - Returns the median of each column
  8. df.std() - Returns the standard deviation of each column

8. 連接數(shù)據(jù)

  1. df1.append(df2) - Adds the rows in df1 to the end of df2 (columns should be identical)
  2. pd.concat([df1, df2],axis=1) - Adds the columns in df1 to the end of df2 (rows should be identical)
  3. df1.join(df2,on=col1,how='inner') - SQL-style joins the columns in df1 with the columns on df2 where the rows for col have identical values. how can be one of 'left', 'right', 'outer', 'inner'
最后編輯于
?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

  • pyspark.sql模塊 模塊上下文 Spark SQL和DataFrames的重要類: pyspark.sql...
    mpro閱讀 9,911評論 0 13
  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,841評論 0 10
  • pandas模塊 基本屬性 df.dtypes: data type of columns列...
    wong11閱讀 1,470評論 0 2
  • 日精進,今日體驗:今天活不多,在維修每一輛車都有不同的客戶,有好說的。有不好說的。認真對待每一位客戶。 贊 踩 小...
    隆非凡閱讀 101評論 0 0
  • 從今天開始,每天寫一篇日記,百兒八十的字,記錄一下自己兢兢業(yè)業(yè)亦或是碌碌無為的生活。 希望微信的朋友圈一直活下去,...
    內(nèi)心很帥在巴黎閱讀 2,614評論 48 74

友情鏈接更多精彩內(nèi)容