Understanding and visualizing data with Python

Course syllabus

Week 1 - Introduction to data
Week 2 - Univariate data
Week 3 - Multivariate data
Week 4 - Populations and samples


Week 1

Course content

  • Data can be numbers, images, words, audio
  • Two key types of data: Organic/process data, "Designed data collection"
  • i.i.d. means independent and identically distributed
  • In case data is not i.i.d., dependencies and differences need to be accounted for in analysis.
  • variable types
  1. continuous vs discrete
  2. ordinal vs nominal
    Quantitative discrete variables are numeric, measurable quantities with a set range of countable values
    Nominal variables consist of groups or names in which there is no inherent ordering.
    Ordinal variables consist of groups or names with an inherent ordering or ranking.
  3. Data types in python
  4. Introduction to libraries and data management

Week 2

Course content

  • categorical data, tables, bar charts and pie charts
  • histograms: shape, center, spread, outliers
  • numerical summaries: Min, 1st quartile(25%), Median(50%), 3rd quartile(75%), Max
  • standard score (empirical rule) 68-95-99.7 rule
    standard score = \frac{observation-mean}{standard deviation}
  • Boxplots
    Boxplots can hide gaps and clusters
  • Seaborn library(sis)
sns.distplot().set()
sns.boxplot()

Week 3

Course content

  • Gathering multivariate categorical data
  • Two way or contingency table
  • Marginal and conditional distribution
  • Two univariate bar chart, side by side bar chart, stacked bar chart, Mosaic plot
  • Association type: linear, quadratic, no association
  • Positive linear association, negative linear association
  • Association strength(weak, moderate, strong) - measured by Pearson correlation (R or \rho), number between -1 and 1
    Correlation does not imply causation
  • Simpson's paradox
  • Multivariate data selection
  • Multivariate distributions

Week 4

Course content

  • Sampling from well-defined populations
    Option 1: Conducting a population census
    Option 2: Probability sampling
    Option 3: Non-probability sampling
  • Probability sampling
    Simple random sampling
    Complex samples
  • Non probability sampling
  • Sampling distribution
  • Sampling variance
  • A sampling distribution is the distribution of all possible estimates that would arise from hypothetical repeated sampling, and larger sample sizes will result in a sampling distribution with less variance, meaning that estimates are more precise.
  • Making population inference based on only one sample
  • Inference for non-probability samples
  • Complex samples (stratification)
  • The empirical rule of distribution
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容