Outline

5.1 Information Theory
5.2 Information Technology
5.3 Data quality
5.4 Data cleaning
5.5 Data fusion
5.6 Data storage
5.7 Data mining
5.8 Multimedia information processing

5.3 Data quality 數(shù)據(jù)質(zhì)量

Uncertain Data 不確定數(shù)據(jù)

Data uncertainty occur during:

Name	名字
Data collection	數(shù)據(jù)收集
Data transmission	數(shù)據(jù)傳輸
Data processing	數(shù)據(jù)處理

Causes of Data Uncertainty

Name	名字
Environmental factors	環(huán)境因素
Low battery power	電池電量低
Packet losses	丟包

Classification of Data Uncertainty

Source Classification 根據(jù)不確定數(shù)據(jù)的來源分類（重點(diǎn)）

Name	實(shí)例	翻譯
Undesirable uncertainty	Noisy sensor data
	Imprecise GPS Data
	Unreliable extracted/integrated data	不可靠的提取/集成數(shù)據(jù)
Desirable uncertainty	Medical data with generalized attributes	具有通用屬性的醫(yī)療數(shù)據(jù)
	Cloaked trajectory data	隱藏的軌跡數(shù)據(jù)

Granularity Classification 根據(jù)粒度分類

Name	翻譯
Tuple Uncertainty	元組的不確定性
Attribute Uncertainty	屬性不確定性

Correlations Classification 根據(jù)相互關(guān)系分類

Name	翻譯
Independent Uncertainty	獨(dú)立的不確定性
Correlated Uncertainty	相關(guān)的不確定性
Uncertainty with Local Correlations	局部相關(guān)不確定性

Meaning of Data Quality 數(shù)據(jù)質(zhì)量的意義（重點(diǎn)）

Generally, you have a problem if the data doesn’t mean what you think it does, or should.
通常情況下，如果數(shù)據(jù)的含義與您認(rèn)為的不同，或者不應(yīng)該相同，那么就會(huì)出現(xiàn)問題
Data quality problems are expensive and pervasive.
數(shù)據(jù)質(zhì)量問題昂貴且普遍存在

Conventional Definition of Data Quality 數(shù)據(jù)質(zhì)量的常規(guī)標(biāo)準(zhǔn)（定義

Name	翻譯	解釋
Accuarcy	精度	recorded correctly
Completeness	完整	All data was recorded
Uniqueness	獨(dú)一	recorded once
Timeliness	及時(shí)	The data is kept up to date
Consistency	一致	The data agrees with itself

5.4 Data Cleaning 數(shù)據(jù)清理

the process of detecting and correcting (or removing) errors and inconsistencies from data in order to improve the quality of data.
To identifying incomplete, incorrect, inaccurate, irrelevant, etc.
從數(shù)據(jù)中檢測(cè)和糾正(或消除)錯(cuò)誤和不一致以提高數(shù)據(jù)質(zhì)量的過程。
該技術(shù)目的在于識(shí)別不完整、不正確、不準(zhǔn)確、不相關(guān)等。

Data cleaning tasks 數(shù)據(jù)清洗的任務(wù) （重點(diǎn)）

Name	翻譯
Fill in missing values	填充缺失的值
Identify outliers and smooth out noisy data	識(shí)別異常值并平滑噪聲數(shù)據(jù)
Correct inconsistent data	糾正不一致的數(shù)據(jù)
Resolve redundancy caused by data integration	解決數(shù)據(jù)集成造成的冗余

Methods to Handle Noisy Data

Name	解釋
Binning	裝箱法，把數(shù)據(jù)按箱處理Smooth掉邊緣數(shù)據(jù)
Regression	回歸函數(shù)擬合
Clustering	聚類，檢測(cè)到不屬于大類的元素，刪掉
Combined inspection	計(jì)算機(jī)和人工檢查相結(jié)合

Sensor Cleaning Pipeline

Uses temporal and spatial characteristics of sensor data
利用傳感器數(shù)據(jù)的時(shí)空特性

Step 1: Point

Operates: Single value of sensor stream.
操作:單值傳感器流。
Purpose: Filter individual values
目的:過濾單獨(dú)的值
① Errant (dirty / faulty) RFID tags
錯(cuò)誤的RFID標(biāo)簽
② Obvious outliers
明顯的異常值
③ Conversion of raw data into tuples
將原始數(shù)據(jù)轉(zhuǎn)換為元組

Step 1: Point

Step 2: Smoothing

Purpose: Interpolates (inserts) lost readings
目的:插入丟失的讀數(shù)
①Temporal interpolation
時(shí)間插值
②Outlier detection
異常值檢測(cè)
Method: Window based queries
方法:基于窗口的查詢

Step 2: Smoothing

Step 3: Merge

Purpose: Spatial interpolation
目的:空間插值
例如:在一個(gè)空間顆粒中，通過計(jì)算來自不同塵埃的讀數(shù)的平均值，并忽略偏離平均值兩個(gè)偏差之外的單個(gè)讀數(shù)。

Step 3: Merge

Step 4: Arbitrate 仲裁

Purpose: Remove
目的：刪除
① conflicting readings
沖突的讀數(shù)
② de-duplication
重復(fù)數(shù)據(jù)刪除

Step 4: Arbitrate

Step 5: Virtualize 虛擬化

Purpose: Multi-source integration
目的:多源集成

Step 5: Virtualize

Data Fusion 數(shù)據(jù)融合

概念（重點(diǎn)）
Data fusion combine data from multiple sources and gather that information in order to achieve inferences, which will be more efficient and potentially more accurate than if they were achieved by means of a single source.
數(shù)據(jù)融合將來自多個(gè)來源的數(shù)據(jù)組合起來，并收集這些信息，以實(shí)現(xiàn)推斷，這將比通過單一來源實(shí)現(xiàn)更有效和更準(zhǔn)確。
填空題
Sensors only give an estimate of the measured physical property
傳感器只能對(duì)測(cè)量到的物理性質(zhì)作出估計(jì)。
Nature of errors often determine the preferred fusion algorithm
誤差的性質(zhì)往往決定了融合算法的首選。

Three Processing Architectures 三個(gè)處理架構(gòu)

Name	翻譯
Data-level fusion	數(shù)據(jù)級(jí)融合
Feature-level fusion	特征級(jí)融合
Decision-level fusion	決策級(jí)融合

Data-level fusion: Direct fusion of sensor data
數(shù)據(jù)級(jí)融合: 傳感器數(shù)據(jù)的直接融合，
Feature-level fusion: Representation of sensor data via feature vectors, with subsequent fusion of the feature vectors
特征級(jí)融合: 通過特征向量表示傳感器數(shù)據(jù)，然后融合特征向量
Decision-level fusion: Processing of each sensor to achieve high-level inferences or decisions, which are subsequently combined.
決策級(jí)融合 :對(duì)每個(gè)傳感器進(jìn)行處理，以實(shí)現(xiàn)高級(jí)推理或決策，然后將這些推理或決策組合在一起。

Data Fusion

Data-level Fusion

使用條件: if the sensors are measuring the same physical phenomena.
如果傳感器測(cè)量的是相同的物理現(xiàn)象

Data-level Fusion

Data Storage 數(shù)據(jù)存儲(chǔ)

Database System

Database: collection of persistent data
數(shù)據(jù)庫:持久數(shù)據(jù)的收集
Data: Known facts that can be recorded and have an implicit meaning.
數(shù)據(jù):可以記錄并具有隱含意義的已知事實(shí)。
Database Management System (DBMS): software system that supports creation, population, and querying of a database
數(shù)據(jù)庫管理系統(tǒng)(DBMS):支持?jǐn)?shù)據(jù)庫的創(chuàng)建、填充和查詢的軟件系統(tǒng)
Database System: DBMS + Database
數(shù)據(jù)庫系統(tǒng):DBMS +數(shù)據(jù)庫

DBMS 功能

Name	解釋
Define	定義特定的數(shù)據(jù)庫
Construct	構(gòu)造初始數(shù)據(jù)庫
Manipulate	增刪改查數(shù)據(jù)庫
Share a database	數(shù)據(jù)庫共享

Define a database.
根據(jù)數(shù)據(jù)類型、結(jié)構(gòu)和約束定義特定的數(shù)據(jù)庫
Construct or Load the initial database.
在輔助存儲(chǔ)介質(zhì)上構(gòu)造或加載初始數(shù)據(jù)庫內(nèi)容
Manipulate the database:
操作數(shù)據(jù)庫:
① Retrieval, Modification
檢索，修改
② Accessing the database through Web applications
通過Web應(yīng)用程序訪問數(shù)據(jù)庫
Share a database
共享數(shù)據(jù)庫允許多個(gè)用戶和程序同時(shí)訪問數(shù)據(jù)庫

Data Storage Solution 數(shù)據(jù)存儲(chǔ)解決方案（重點(diǎn)）

Name	解釋
Direct Attached Storage	直接連接存儲(chǔ)器(DAS)
Network Attached Storage	網(wǎng)絡(luò)附加存儲(chǔ)(NAS)
Storage Area Network	存儲(chǔ)區(qū)域網(wǎng)絡(luò)(SAN)

Direct Attached Storage (DAS)
Characteristics: Storage devices attached directly to servers (only point of access)
直接連接到服務(wù)器的存儲(chǔ)設(shè)備(僅訪問點(diǎn))

DAS

Network Attached Storage (NAS)
Characteristics: more reliable than DAS, limited by LAN bandwidth.

NAS

Storage Area Network (SAN)
Characteristics: more expensive

SAN

5.7 Data Mining 數(shù)據(jù)挖掘

Major Data Mining Tasks 數(shù)據(jù)挖掘的主要任務(wù)

Name	解釋
Classification	分類，預(yù)測(cè)項(xiàng)目類
Association Rule Discovery	關(guān)聯(lián)發(fā)現(xiàn)
Clustering	聚類，查找項(xiàng)目類
Sequential Pattern Discovery	順序模式發(fā)現(xiàn)
Deviation Detection	偏差檢測(cè)
Forecasting	預(yù)測(cè)
Description	描述
Link analysis	尋找聯(lián)系和關(guān)聯(lián)

Classification 分類

定義
Find a model for class attribute as a function of the
values of other attributes.
將class屬性作為其他屬性值的函數(shù)來查找模型。
test set 測(cè)試集
A test set is used to determine the accuracy of the model.
測(cè)試集用于確定模型的準(zhǔn)確性。
Classification method 分類方法

Name	解釋
Decision Tree	決策樹
Naive Bayesian classifiers	樸素貝葉斯分類器
Using association rule	使用關(guān)聯(lián)規(guī)則
Neural networks	神經(jīng)網(wǎng)絡(luò)

Clustering 聚類定義

Given a set of data points, each having a set ofattributes, and a similarity measure among them.

5.8 Multimedia Information Processing 多媒體信息處理

定義
Multimedia is a combination of text, graphic, sound, animation, and video that is delivered interactively to the user by electronic or digitally manipulated means.
多媒體是文本、圖形、聲音、動(dòng)畫和視頻的組合，通過電子或數(shù)字操作的方式交互地傳遞給用戶

Digital Image Processing 數(shù)字圖像處理

Digital Image
A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels.
數(shù)字圖像是二維圖像的一種表示，它是一組有限的數(shù)字值，稱為圖像元素或像素。
Pixel values 像素值
typically represent gray levels, colours, opacities etc.
表示灰度、顏色、不透明度。
填空：Remember digitization implies that a digital image is an approximation of a real scene.

Major tasks for digital Image Processing

Improvement of pictorial information for human interpretation.
改善圖像信息的人類解釋。
Processing of image data for storage, transmission and representation for autonomous machine perception.
用于存儲(chǔ)、傳輸和表示自主機(jī)器感知的圖像數(shù)據(jù)處理。

Processing level

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Information Processing for IoT

Information Processing for IoT

Outline

5.3 Data quality 數(shù)據(jù)質(zhì)量

Uncertain Data 不確定數(shù)據(jù)

Causes of Data Uncertainty

Classification of Data Uncertainty

Meaning of Data Quality 數(shù)據(jù)質(zhì)量的意義（重點(diǎn)）

Conventional Definition of Data Quality 數(shù)據(jù)質(zhì)量的常規(guī)標(biāo)準(zhǔn)（定義

5.4 Data Cleaning 數(shù)據(jù)清理

Data cleaning tasks 數(shù)據(jù)清洗的任務(wù) （重點(diǎn)）

Methods to Handle Noisy Data

Sensor Cleaning Pipeline

Step 1: Point

Step 2: Smoothing

Step 3: Merge

Step 4: Arbitrate 仲裁

Step 5: Virtualize 虛擬化

Data Fusion 數(shù)據(jù)融合

Three Processing Architectures 三個(gè)處理架構(gòu)

Data-level Fusion

Data Storage 數(shù)據(jù)存儲(chǔ)

Database System

DBMS 功能

Data Storage Solution 數(shù)據(jù)存儲(chǔ)解決方案（重點(diǎn)）

5.7 Data Mining 數(shù)據(jù)挖掘

Major Data Mining Tasks 數(shù)據(jù)挖掘的主要任務(wù)

Classification 分類

Clustering 聚類定義

5.8 Multimedia Information Processing 多媒體信息處理

Digital Image Processing 數(shù)字圖像處理

Major tasks for digital Image Processing

Processing level

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Information Processing for IoT

Outline

5.3 Data quality 數(shù)據(jù)質(zhì)量

Uncertain Data 不確定數(shù)據(jù)

Causes of Data Uncertainty

Classification of Data Uncertainty

Meaning of Data Quality 數(shù)據(jù)質(zhì)量的意義（重點(diǎn)）

Conventional Definition of Data Quality 數(shù)據(jù)質(zhì)量的常規(guī)標(biāo)準(zhǔn)（定義

5.4 Data Cleaning 數(shù)據(jù)清理

Data cleaning tasks 數(shù)據(jù)清洗的任務(wù) （重點(diǎn)）

Methods to Handle Noisy Data

Sensor Cleaning Pipeline

Step 1: Point

Step 2: Smoothing

Step 3: Merge

Step 4: Arbitrate 仲裁

Step 5: Virtualize 虛擬化

Data Fusion 數(shù)據(jù)融合

Three Processing Architectures 三個(gè)處理架構(gòu)

Data-level Fusion

Data Storage 數(shù)據(jù)存儲(chǔ)

Database System

DBMS 功能

Data Storage Solution 數(shù)據(jù)存儲(chǔ)解決方案（重點(diǎn)）

5.7 Data Mining 數(shù)據(jù)挖掘

Major Data Mining Tasks 數(shù)據(jù)挖掘的主要任務(wù)

Classification 分類

Clustering 聚類定義

5.8 Multimedia Information Processing 多媒體信息處理

Digital Image Processing 數(shù)字圖像處理

Major tasks for digital Image Processing

Processing level

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av