Information Processing for IoT

Outline

5.1 Information Theory
5.2 Information Technology
5.3 Data quality
5.4 Data cleaning
5.5 Data fusion
5.6 Data storage
5.7 Data mining
5.8 Multimedia information processing

5.3 Data quality 數(shù)據(jù)質(zhì)量

Uncertain Data 不確定數(shù)據(jù)

  • Data uncertainty occur during:
Name 名字
Data collection 數(shù)據(jù)收集
Data transmission 數(shù)據(jù)傳輸
Data processing 數(shù)據(jù)處理

Causes of Data Uncertainty

Name 名字
Environmental factors 環(huán)境因素
Low battery power 電池電量低
Packet losses 丟包

Classification of Data Uncertainty

  • Source Classification 根據(jù)不確定數(shù)據(jù)的來源分類 (重點(diǎn))
Name 實(shí)例 翻譯
Undesirable uncertainty Noisy sensor data
Imprecise GPS Data
Unreliable extracted/integrated data 不可靠的提取/集成數(shù)據(jù)
Desirable uncertainty Medical data with generalized attributes 具有通用屬性的醫(yī)療數(shù)據(jù)
Cloaked trajectory data 隱藏的軌跡數(shù)據(jù)
  • Granularity Classification 根據(jù)粒度分類
Name 翻譯
Tuple Uncertainty 元組的不確定性
Attribute Uncertainty 屬性不確定性
  • Correlations Classification 根據(jù)相互關(guān)系分類
Name 翻譯
Independent Uncertainty 獨(dú)立的不確定性
Correlated Uncertainty 相關(guān)的不確定性
Uncertainty with Local Correlations 局部相關(guān)不確定性

Meaning of Data Quality 數(shù)據(jù)質(zhì)量的意義(重點(diǎn))

  • Generally, you have a problem if the data doesn’t mean what you think it does, or should.
    通常情況下,如果數(shù)據(jù)的含義與您認(rèn)為的不同,或者不應(yīng)該相同,那么就會(huì)出現(xiàn)問題
  • Data quality problems are expensive and pervasive.
    數(shù)據(jù)質(zhì)量問題昂貴且普遍存在

Conventional Definition of Data Quality 數(shù)據(jù)質(zhì)量的常規(guī)標(biāo)準(zhǔn)(定義

Name 翻譯 解釋
Accuarcy 精度 recorded correctly
Completeness 完整 All data was recorded
Uniqueness 獨(dú)一 recorded once
Timeliness 及時(shí) The data is kept up to date
Consistency 一致 The data agrees with itself

5.4 Data Cleaning 數(shù)據(jù)清理

the process of detecting and correcting (or removing) errors and inconsistencies from data in order to improve the quality of data.
To identifying incomplete, incorrect, inaccurate, irrelevant, etc.
從數(shù)據(jù)中檢測(cè)和糾正(或消除)錯(cuò)誤和不一致以提高數(shù)據(jù)質(zhì)量的過程。
該技術(shù)目的在于識(shí)別不完整、不正確、不準(zhǔn)確、不相關(guān)等。

Data cleaning tasks 數(shù)據(jù)清洗的任務(wù) (重點(diǎn))

Name 翻譯
Fill in missing values 填充缺失的值
Identify outliers and smooth out noisy data 識(shí)別異常值并平滑噪聲數(shù)據(jù)
Correct inconsistent data 糾正不一致的數(shù)據(jù)
Resolve redundancy caused by data integration 解決數(shù)據(jù)集成造成的冗余

Methods to Handle Noisy Data

Name 解釋
Binning 裝箱法,把數(shù)據(jù)按箱處理Smooth掉邊緣數(shù)據(jù)
Regression 回歸函數(shù)擬合
Clustering 聚類,檢測(cè)到不屬于大類的元素,刪掉
Combined inspection 計(jì)算機(jī)和人工檢查相結(jié)合

Sensor Cleaning Pipeline

Sensor Cleaning Pipeline

Uses temporal and spatial characteristics of sensor data
利用傳感器數(shù)據(jù)的時(shí)空特性

Step 1: Point
  • Operates: Single value of sensor stream.
    操作:單值傳感器流。
  • Purpose: Filter individual values
    目的:過濾單獨(dú)的值
    ① Errant (dirty / faulty) RFID tags
    錯(cuò)誤的RFID標(biāo)簽
    ② Obvious outliers
    明顯的異常值
    ③ Conversion of raw data into tuples
    將原始數(shù)據(jù)轉(zhuǎn)換為元組


    Step 1: Point
Step 2: Smoothing
  • Purpose: Interpolates (inserts) lost readings
    目的:插入丟失的讀數(shù)
    ①Temporal interpolation
    時(shí)間插值
    ②Outlier detection
    異常值檢測(cè)
  • Method: Window based queries
    方法:基于窗口的查詢


    Step 2: Smoothing
Step 3: Merge
  • Purpose: Spatial interpolation
    目的:空間插值
  • 例如:在一個(gè)空間顆粒中,通過計(jì)算來自不同塵埃的讀數(shù)的平均值,并忽略偏離平均值兩個(gè)偏差之外的單個(gè)讀數(shù)。
Step 3: Merge
Step 4: Arbitrate 仲裁
  • Purpose: Remove
    目的:刪除
    ① conflicting readings
    沖突的讀數(shù)
    ② de-duplication
    重復(fù)數(shù)據(jù)刪除
Step 4: Arbitrate
Step 5: Virtualize 虛擬化
  • Purpose: Multi-source integration
    目的:多源集成
Step 5: Virtualize

Data Fusion 數(shù)據(jù)融合

  • 概念(重點(diǎn))
    Data fusion combine data from multiple sources and gather that information in order to achieve inferences, which will be more efficient and potentially more accurate than if they were achieved by means of a single source.
    數(shù)據(jù)融合將來自多個(gè)來源的數(shù)據(jù)組合起來,并收集這些信息,以實(shí)現(xiàn)推斷,這將比通過單一來源實(shí)現(xiàn)更有效和更準(zhǔn)確。

  • 填空題
    Sensors only give an estimate of the measured physical property
    傳感器只能對(duì)測(cè)量到的物理性質(zhì)作出估計(jì)。
    Nature of errors often determine the preferred fusion algorithm
    誤差的性質(zhì)往往決定了融合算法的首選。

Three Processing Architectures 三個(gè)處理架構(gòu)

Name 翻譯
Data-level fusion 數(shù)據(jù)級(jí)融合
Feature-level fusion 特征級(jí)融合
Decision-level fusion 決策級(jí)融合
  • Data-level fusion: Direct fusion of sensor data
    數(shù)據(jù)級(jí)融合: 傳感器數(shù)據(jù)的直接融合,
  • Feature-level fusion: Representation of sensor data via feature vectors, with subsequent fusion of the feature vectors
    特征級(jí)融合: 通過特征向量表示傳感器數(shù)據(jù),然后融合特征向量
  • Decision-level fusion: Processing of each sensor to achieve high-level inferences or decisions, which are subsequently combined.
    決策級(jí)融合 :對(duì)每個(gè)傳感器進(jìn)行處理,以實(shí)現(xiàn)高級(jí)推理或決策,然后將這些推理或決策組合在一起。


    Data Fusion

Data-level Fusion

  • 使用條件: if the sensors are measuring the same physical phenomena.
    如果傳感器測(cè)量的是相同的物理現(xiàn)象
Data-level Fusion

Data Storage 數(shù)據(jù)存儲(chǔ)

Database System

  • Database: collection of persistent data
    數(shù)據(jù)庫:持久數(shù)據(jù)的收集
  • Data: Known facts that can be recorded and have an implicit meaning.
    數(shù)據(jù):可以記錄并具有隱含意義的已知事實(shí)。
  • Database Management System (DBMS): software system that supports creation, population, and querying of a database
    數(shù)據(jù)庫管理系統(tǒng)(DBMS):支持?jǐn)?shù)據(jù)庫的創(chuàng)建、填充和查詢的軟件系統(tǒng)
  • Database System: DBMS + Database
    數(shù)據(jù)庫系統(tǒng):DBMS +數(shù)據(jù)庫

DBMS 功能

Name 解釋
Define 定義特定的數(shù)據(jù)庫
Construct 構(gòu)造初始數(shù)據(jù)庫
Manipulate 增刪改查數(shù)據(jù)庫
Share a database 數(shù)據(jù)庫共享
  • Define a database.
    根據(jù)數(shù)據(jù)類型、結(jié)構(gòu)和約束定義特定的數(shù)據(jù)庫
  • Construct or Load the initial database.
    在輔助存儲(chǔ)介質(zhì)上構(gòu)造或加載初始數(shù)據(jù)庫內(nèi)容
  • Manipulate the database:
    操作數(shù)據(jù)庫:
    ① Retrieval, Modification
    檢索,修改
    ② Accessing the database through Web applications
    通過Web應(yīng)用程序訪問數(shù)據(jù)庫
  • Share a database
    共享數(shù)據(jù)庫允許多個(gè)用戶和程序同時(shí)訪問數(shù)據(jù)庫

Data Storage Solution 數(shù)據(jù)存儲(chǔ)解決方案(重點(diǎn))

Name 解釋
Direct Attached Storage 直接連接存儲(chǔ)器(DAS)
Network Attached Storage 網(wǎng)絡(luò)附加存儲(chǔ)(NAS)
Storage Area Network 存儲(chǔ)區(qū)域網(wǎng)絡(luò)(SAN)
  • Direct Attached Storage (DAS)
    Characteristics: Storage devices attached directly to servers (only point of access)
    直接連接到服務(wù)器的存儲(chǔ)設(shè)備(僅訪問點(diǎn))
DAS
  • Network Attached Storage (NAS)
    Characteristics: more reliable than DAS, limited by LAN bandwidth.
NAS
  • Storage Area Network (SAN)
    Characteristics: more expensive


    SAN

5.7 Data Mining 數(shù)據(jù)挖掘

Major Data Mining Tasks 數(shù)據(jù)挖掘的主要任務(wù)

Name 解釋
Classification 分類,預(yù)測(cè)項(xiàng)目類
Association Rule Discovery 關(guān)聯(lián)發(fā)現(xiàn)
Clustering 聚類,查找項(xiàng)目類
Sequential Pattern Discovery 順序模式發(fā)現(xiàn)
Deviation Detection 偏差檢測(cè)
Forecasting 預(yù)測(cè)
Description 描述
Link analysis 尋找聯(lián)系和關(guān)聯(lián)

Classification 分類

  • 定義
    Find a model for class attribute as a function of the
    values of other attributes.
    將class屬性作為其他屬性值的函數(shù)來查找模型。

  • test set 測(cè)試集
    A test set is used to determine the accuracy of the model.
    測(cè)試集用于確定模型的準(zhǔn)確性。

  • Classification method 分類方法

Name 解釋
Decision Tree 決策樹
Naive Bayesian classifiers 樸素貝葉斯分類器
Using association rule 使用關(guān)聯(lián)規(guī)則
Neural networks 神經(jīng)網(wǎng)絡(luò)

Clustering 聚類定義

Given a set of data points, each having a set ofattributes, and a similarity measure among them.

5.8 Multimedia Information Processing 多媒體信息處理

  • 定義
    Multimedia is a combination of text, graphic, sound, animation, and video that is delivered interactively to the user by electronic or digitally manipulated means.
    多媒體是文本、圖形、聲音、動(dòng)畫和視頻的組合,通過電子或數(shù)字操作的方式交互地傳遞給用戶

Digital Image Processing 數(shù)字圖像處理

  • Digital Image
    A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels.
    數(shù)字圖像是二維圖像的一種表示,它是一組有限的數(shù)字值,稱為圖像元素或像素。
  • Pixel values 像素值
    typically represent gray levels, colours, opacities etc.
    表示灰度、顏色、不透明度。
  • 填空:Remember digitization implies that a digital image is an approximation of a real scene.

Major tasks for digital Image Processing

  • Improvement of pictorial information for human interpretation.
    改善圖像信息的人類解釋。
  • Processing of image data for storage, transmission and representation for autonomous machine perception.
    用于存儲(chǔ)、傳輸和表示自主機(jī)器感知的圖像數(shù)據(jù)處理。

Processing level

Processing level
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • pyspark.sql模塊 模塊上下文 Spark SQL和DataFrames的重要類: pyspark.sql...
    mpro閱讀 9,911評(píng)論 0 13
  • 個(gè)人自行閱讀時(shí)候,翻譯的文檔。因?yàn)楸容^渣,如果有更合理或者錯(cuò)誤的地方煩勞告知,我會(huì)做修改。Oracle Data ...
    窩窩的小黑屋閱讀 1,301評(píng)論 0 3
  • 一、源題QUESTION 36Your database is open and the LISTENER lis...
    貓貓_tomluo閱讀 1,455評(píng)論 0 2
  • 不知不覺訓(xùn)練營(yíng)已經(jīng)進(jìn)入了尾聲,感覺自己才領(lǐng)悟到萬能寫作法的精髓,即將面臨著畢業(yè)! 記得剛開營(yíng)時(shí),小伙伴們的自我介紹...
    津聿閱讀 152評(píng)論 0 0
  • 配置要求 對(duì)于 Kubernetes 初學(xué)者,推薦在阿里云采購如下配置:(您也可以使用自己的虛擬機(jī)、私有云等您最容...
    huangstts閱讀 393評(píng)論 0 0

友情鏈接更多精彩內(nèi)容