SQL實(shí)戰(zhàn)分析
數(shù)據(jù)來源:https://pan.baidu.com/s/1zBobf63JTbhNKxZ9P-Ju1Q
正式開始(只用到一個表,以下用x代替)
1.先看一下所有數(shù)據(jù)
select * from x
2.看過數(shù)據(jù)之后正式開始查找,接下來查看地點(diǎn)在北京且職位是數(shù)據(jù)分析師的數(shù)據(jù)
(即北京的數(shù)據(jù)分析師)
select * from x
where city = '北京' and positionName = '數(shù)據(jù)分析師'
3.接上面查找地點(diǎn)在北京或者職位是數(shù)據(jù)分析師的數(shù)據(jù)
select * from x
where city = '北京' or positionName = '數(shù)據(jù)分析師'
4.查找北京或者上海的數(shù)據(jù)分析師的數(shù)據(jù)
select * from x
where city in('上海','北京') and positionName = '數(shù)據(jù)分析師'
以上的邏輯判斷的都只用到了'=',按實(shí)際需求可以有 大于'>',小于'<',不等于'!='
5.再按上面舉個例子,查找公司ID超過超過500的數(shù)據(jù)
select * from x
where companyId >= 500
6.positionLables這一列描述很雜,下面我們模糊查找一下包含數(shù)據(jù)分析的數(shù)據(jù)
select * from x
where positionLables like '%數(shù)據(jù)分析%'
7.以城市為組看一下,不同城市的職位數(shù)量情況
select city,count(distinct positionId) from x
group by city
8.在7的條件下再加入招聘學(xué)歷限制
select city,workYear,count(distinct positionId) from x
group by city,workYear
9.有些數(shù)據(jù)拿到手,不一定能直接用,需要清洗,如本表的salary(格式為nk-nk)
我們需要得到平均工資,我們先把最低工資洗出來,此處用left函數(shù)截取,用location定位。
select salary,left(salary,locate('k',salary)-1) as bottomsalary from x
得到最低工資,再來看看最高工資
10.清洗最高工資
select salary,substr(salary,locate('-',salary)+1,(length(salary)-locate('-',salary))-1) as topsalary from x
11.結(jié)合起來再結(jié)合招聘學(xué)歷要求可以得到以下結(jié)果
select city,workYear,avg((bottomsalary+topsalary)/2) as avgsalary
from (select city,salary,workYear,left(salary,locate('K',salary)-1) as bottomsalary,substr(salary,locate('-',salary)+1,(length(salary)-locate('-',salary))-1) as topsalary
from 11dataanalyst) as t1 where salary not like '%以上%'
group by city,workYear
order by city,avgsalary
以上便得到,各個城市,不同學(xué)歷之間的平均工資。
12.查詢招聘數(shù)量top3的公司數(shù)據(jù)
select companyId,companyShortName,count(distinct positionId)
from 11dataanalyst
group by companyId
order by count(distinct positionId) desc limit 3
13.查詢出O2O、電子商務(wù)、互聯(lián)網(wǎng)金融這三個行業(yè),哪個行業(yè)的平均薪資最高.
select industryField,avg((bottomsalary+topsalary)/2)
from (select city,salary,industryField,left(salary,locate('K',salary)-1) as bottomsalary,substr(salary,locate('-',salary)+1,(length(salary)-locate('-',salary))-1) as topsalary
from 11dataanalyst) as t1 where salary not like '%以上%' and industryField in ('o2o','電子商務(wù)','互聯(lián)網(wǎng)')
group by industryField
order by avg((bottomsalary+topsalary)/2) desc