
有這樣一張表,表數(shù)據(jù)及結(jié)果如下:

可以看出,school_name的字段值有重復(fù)數(shù)據(jù)(Abraham Lincoln High School 和Agoura High School分別出現(xiàn)兩次),那么如何刪除這兩條數(shù)據(jù),從而只讓這兩個數(shù)值出現(xiàn)一次呢?
//刪除重復(fù)記錄,保存Id最小的一條
delete FROM `test` WHERE `school_name`
in
(SELECT `school_name`
FROM `test`
GROUP BY `school_name`
HAVING COUNT( * ) >1) and school_id
not in
(select min(school_id) from test group by school_id having count(* )>1)
//刪除重復(fù)記錄,保存Id最大的一條
delete FROM `test` WHERE `school_name`
in
(SELECT `school_name`
FROM `test`
GROUP BY `school_name`
HAVING COUNT( * ) >1) and school_id
not in
(select max(school_id) from test group by school_id having count(* )>1)
一.單個字段的操作
Select 重復(fù)字段 From 表 Group By 重復(fù)字段 Having Count(*)>1
查看是否有重復(fù)的數(shù)據(jù):
GROUP BY <列名序列>
HAVING <組條件表達式>
查詢出:根據(jù)dname分組,同時滿足having字句中組條件表達式(重復(fù)次數(shù)大于1)的那些組
count(*)與count(1) 其實沒有什么差別,用哪個都可以
count(*) 與 count(列名)的區(qū)別:
count(*)將返回表格中所有存在的行的總數(shù)包括值為null的行,然而count(列名)將返回表格中除去null以外的所有行的總數(shù)(有默認值的列也會被計入)
查詢?nèi)恐貜?fù)的數(shù)據(jù):
Select * From 表 Where 重復(fù)字段
In
(Select 重復(fù)字段 From 表 Group By 重復(fù)字段 Having Count(*)>1)
二.刪除全部重復(fù)數(shù)據(jù):
將上面的查詢select改為delete(這樣會出錯的)
DELETE
FROM
dept
WHERE
dname IN (
SELECT
dname
FROM
dept
GROUP BY
dname
HAVING
count(1) > 1
)
會出現(xiàn)如下錯誤:[Err] 1093 - You can't specify target table 'dept' for update in FROM clause
原因是:更新這個表的同時又查詢了這個表,查詢這個表的同時又去更新了這個表,可以理解為死鎖。
mysql不支持這種更新查詢同一張表的操作
解決辦法:把要更新的幾列數(shù)據(jù)查詢出來做為一個第三方表,然后篩選更新。
DELETE
FROM
dept
WHERE
dname IN (
SELECT
t.dname
FROM
(
SELECT
dname
FROM
dept
GROUP BY
dname
HAVING
count(1) > 1
) t
)
三.查詢表中多余重復(fù)數(shù)據(jù)
根據(jù)depno來判斷,除了rowid最小的一個
方法一
SELECT
*
FROM
dept
WHERE
dname IN (
SELECT
dname
FROM
dept
GROUP BY
dname
HAVING
COUNT(1) > 1
)
AND deptno NOT IN (
SELECT
MIN(deptno)
FROM
dept
GROUP BY
dname
HAVING
COUNT(1) > 1
)
這種寫法正確,但是查詢的速度太慢,可以試一下方法二
方法二
根據(jù)dname分組,查找出deptno最小的;然后再查找deptno不包含剛才查出來的,這樣就查詢出了所有的重復(fù)數(shù)據(jù)(除了deptno最小的那行)
SELECT *
FROM
dept
WHERE
deptno NOT IN (
SELECT
dt.minno
FROM
(
SELECT
MIN(deptno) AS minno
FROM
dept
GROUP BY
dname
) dt
)
第三種方法
SELECT
*
FROM
table_name AS ta
WHERE
ta.唯一鍵 <> ( SELECT max( tb.唯一鍵 ) FROM table_name AS tb WHERE ta.判斷重復(fù)的列 = tb.判斷重復(fù)的列 );
四.刪除重復(fù)數(shù)據(jù)只保留一條
第一種方法:
DELETE
FROM
dept
WHERE
dname IN (
SELECT
t.dname
FROM
(
SELECT
dname
FROM
dept
GROUP BY
dname
HAVING
count(1) > 1
) t
)
AND deptno NOT IN (
SELECT
dt.mindeptno
FROM
(
SELECT
min(deptno) AS mindeptno
FROM
dept
GROUP BY
dname
HAVING
count(1) > 1
) dt
)
第二種方法
與上面查詢的第二種方法對應(yīng),只是將select改為delete
DELETE
FROM
dept
WHERE
deptno NOT IN (
SELECT
dt.minno
FROM
(
SELECT
MIN(deptno) AS minno
FROM
dept
GROUP BY
dname
) dt
)
第三種方法
DELETE
FROM
table_name AS ta
WHERE
ta.唯一鍵 <> (
SELECT
t.maxid
FROM
(
SELECT max( tb.唯一鍵 ) AS maxid FROM table_name AS tb WHERE ta.判斷重復(fù)的列 = tb.判斷重復(fù)的列 )
t )
五.多個字段的操作(補充)
和單個字段相比,多個字段就是將group by 的字段增加為你想要的即可
DELETE
FROM
dept
WHERE
(dname, db_source) IN (
SELECT
t.dname,
t.db_source
FROM
(
SELECT
dname,
db_source
FROM
dept
GROUP BY
dname,
db_source
HAVING
count(1) > 1
) t
)
AND deptno NOT IN (
SELECT
dt.mindeptno
FROM
(
SELECT
min(deptno) AS mindeptno
FROM
dept
GROUP BY
dname,
db_source
HAVING
count(1) > 1
) dt
)
六.小結(jié)
上面的方法還有很多需要優(yōu)化的地方,數(shù)據(jù)量太大的話,執(zhí)行起來很慢
簡單優(yōu)化:
- 在經(jīng)常查詢的字段上加上索引
- 將*改為你需要查詢出來的字段,不要全部查詢出來
- 小表驅(qū)動大表用IN,大表驅(qū)動小表用EXISTS。
IN適合的情況是外表數(shù)據(jù)量小的情況,而不是外表數(shù)據(jù)大的情況.
因為IN會遍歷外表的全部數(shù)據(jù),假設(shè)a表100條,b表10000條那么遍歷次數(shù)就是100*10000次,而exists則是執(zhí)行100次去判斷a表中的數(shù)據(jù)是否在b表中存在,它只執(zhí)行了a.length次數(shù)。
至于哪一個效率高是要看情況的,因為in是在內(nèi)存中比較的,而exists則是進行數(shù)據(jù)庫查詢操作的。
