Machine Learning in a Week-機(jī)器學(xué)習(xí)一周入門(mén)實(shí)踐

Machine Learning in a Week

機(jī)器學(xué)習(xí)一周入門(mén)實(shí)踐


Getting into machine learning(ml) can seem like an unachievable task from the outside.

在外界來(lái)看,機(jī)器學(xué)習(xí)的入門(mén)是一件難以企及的任務(wù)。

And it definitely can be, if you attack it from the wrong end.

事實(shí)是一旦你用錯(cuò)誤的姿勢(shì)打開(kāi),你的確可能永遠(yuǎn)不會(huì)真正入門(mén)。

However, after dedicating one week to learning the basics of the subject, I found it to be much more accessible than I anticipated.

然而,當(dāng)我花了一周的時(shí)間學(xué)習(xí)機(jī)器學(xué)習(xí)相關(guān)的基礎(chǔ)內(nèi)容,我發(fā)現(xiàn)機(jī)器學(xué)習(xí)的門(mén)檻并沒(méi)有我想象中那么高。

This article is intended to give others who’re interested in getting into ml a roadmap of how to get started, drawing from the experiences I made in my intro week.

本文是希望給那些同樣對(duì)機(jī)器學(xué)習(xí)的同學(xué)一個(gè)藍(lán)圖,給大家分享我是怎么開(kāi)始并規(guī)劃我這一周的。


Background

背景

Before my machine learning week, I had been reading about the subject for a while, and had gone through half of Andrew Ng’s course on Coursera and a few other theoretical courses. So I had a tiny bit of conceptual understanding of ml, though I was completely unable to transfer any of my knowledge into code. This is what I want to change.

在我開(kāi)始機(jī)器學(xué)習(xí)的這一周前,我已經(jīng)閱讀過(guò)這個(gè)學(xué)科的內(nèi)容有一陣并在Coursera上學(xué)習(xí)了一半Andrew Ng的課程以及一些其他相關(guān)的理論課程。所以我已經(jīng)有一些機(jī)器學(xué)習(xí)基礎(chǔ)概念的了解。我并沒(méi)有將這些知識(shí)應(yīng)用并轉(zhuǎn)化為代碼,這也是我為啥想要改變并開(kāi)始新的一周的原因。

I wanted to be able to solve problems with ml by the end of the week, even though this meant skipping a lot of fundamentals,and going for a top-down approach, instead of bottoms up.

我希望我可以在這周結(jié)束的時(shí)候能夠通過(guò)機(jī)器學(xué)習(xí)解決一些實(shí)際的問(wèn)題。這意味著我需要跳過(guò)基本的概念,通過(guò)自上而下的方法來(lái)學(xué)習(xí),而不是自下而上的方式。

After asking for advice on Hacker News, I came to the conclusion that Python’s Scikit Learn-Module was the best starting point. This module gives you a wealth of algorithms to choose from, reducing the actual machine learning to a few lines of code.

通過(guò)在Hacker News上尋求的建議,我了解到Python Scikit learn模塊是一個(gè)最好開(kāi)始的一個(gè)點(diǎn),這個(gè)模塊給我們提供了一系列算法實(shí)踐,讓我們可以通過(guò)很少的代碼來(lái)調(diào)用這些算法,用于處理實(shí)際的機(jī)器學(xué)習(xí)任務(wù)。


Monday: Learning some practicalities

周一:學(xué)習(xí)一些實(shí)例

I started off the week by looking for video tutorials which involved Scikit learn. I finally landed on Sentdex’s tutorial on how to use ml for investing in stocks, which gave me the necessary knowledge to move on to the next step.

在這一周的開(kāi)始,我們通過(guò)觀看一些介紹ScikitLearn視頻教程來(lái)學(xué)習(xí)。最終我決定登錄Sentdex’s教程學(xué)習(xí)機(jī)器學(xué)習(xí)在股票投資上如何應(yīng)用,讓我獲取必要的知識(shí)來(lái)進(jìn)入到下一步。

The good thing about the Sentdex tutorials that the instructor takes you through all the steps of gathering the data.As you go along, you realize that fetching and cleaning up the data can be much more time consuming than doing the actually machine learning. So the ability to write scripts to scrape data from files or crawl the web are essential skills for aspiring machine learning learning geeks.

Sentdex教程中有一點(diǎn)很贊的是給你詳細(xì)介紹了數(shù)據(jù)收集相關(guān)的步驟。當(dāng)你開(kāi)始做機(jī)器學(xué)習(xí)以后,你會(huì)意識(shí)到抓取、清洗數(shù)據(jù)上花費(fèi)的時(shí)間往往會(huì)多于真正去做機(jī)器學(xué)習(xí)的時(shí)間。所以,通過(guò)寫(xiě)腳本從文件中收集數(shù)據(jù)或者在網(wǎng)上爬取數(shù)據(jù)的能力是一個(gè)有追求的機(jī)器學(xué)習(xí)極客必須的技能。

I have re-watched several of the videos later on, to help me when I’ve been stuck with problem, so I’d recommend you to do the same.

我被卡住的時(shí)候,會(huì)去反復(fù)觀看這些視頻,這解決了我的疑問(wèn),所以也推薦你這么去實(shí)踐。

However, if you already know how to scrape data from websites, this tutorial might not be the perfect fit, as a lot of the videos evolve around data fetching. In that case, the Udacity’s Intro to Machine Learning might be a better place to start

另外,如果你已經(jīng)具備了從網(wǎng)上收集數(shù)據(jù)的技能,這個(gè)教程可能并沒(méi)有能特別適合你,不過(guò)關(guān)于數(shù)據(jù)抓取的視頻教程晚上還有很多。真那樣的話,Udacity’s Intro to Machine Learning應(yīng)該會(huì)是個(gè)更好的開(kāi)始。


Tuesday: Applying it to a real problem

周二:應(yīng)用機(jī)器學(xué)習(xí)到一個(gè)真實(shí)的問(wèn)題

Tuesday I wanted to see if I could use what I had learned to solve an actual problem. As another developer in my coding cooperative was working on Bank of England’s data visualization competition, I teamed up with him to check out the datasets the bank has released. The most interesting data was their household surveys. This is an annual survey the bank perform on a few thousand households, regarding money related subjects.

周二我想看看有沒(méi)有什么真實(shí)的問(wèn)題能把我學(xué)到的機(jī)器學(xué)習(xí)相關(guān)的知識(shí)應(yīng)用上。另外有一個(gè)開(kāi)發(fā)童鞋,是我的開(kāi)發(fā)伙伴,我們一起組隊(duì)參加了大英銀行數(shù)據(jù)可視化比賽,比賽支持我們下載銀行公布出來(lái)的數(shù)據(jù)。里面最讓我們感興趣的數(shù)據(jù)就是家庭調(diào)研數(shù)據(jù):銀行每年對(duì)成千上萬(wàn)的家庭進(jìn)行一項(xiàng)主題和收入相關(guān)的調(diào)研。

The Problem we decided to solve was the following:

我們決定想要解決的問(wèn)題閾:

Given a person education level, age and income, can the computer predict its gender?

給定一個(gè)人的教育情況,年齡和收入,預(yù)測(cè)樣本的性別

I Played around with the dataset, spent a few hours cleaning up the data, and used the Scikit Learn map to find a suitable algorithm for the problem.

我開(kāi)始和這些數(shù)據(jù)集打交道,花了幾小時(shí)的時(shí)間來(lái)清洗數(shù)據(jù),然后在Scikit Learn map中找到一個(gè)合適的算法來(lái)解決上述問(wèn)題。

We ended up with a success ratio at around63%, which isn’t impressive at all. But the machine did at least manage to guess a little better than flipping a coin, which would have given a success rate at 50%.

我們算法最終將預(yù)測(cè)準(zhǔn)確率穩(wěn)定在63%左右。這并不是一個(gè)令人亮瞎雙眼的結(jié)果,但至少已經(jīng)比拋硬幣的50%的準(zhǔn)確率高了一些了。

Seeing results is like fuel to your motivation, so I’d recommend you doing this for yourself, once you have a basic grasp of how to use Scikit Learn

看到結(jié)果能點(diǎn)燃你的激情,所以我推薦你自己親手完成這個(gè)過(guò)程,這樣你會(huì)讓你對(duì)Scikit learn有一個(gè)直觀的把握。

It’s a pivotal moment when you realize that you can start using ml to solve in real life problems.

關(guān)鍵的是讓自己意識(shí)到你已經(jīng)開(kāi)始使用機(jī)器學(xué)習(xí)來(lái)解決一些生活中的實(shí)際問(wèn)題了。


Wednesday: From the ground up

周三:從頭開(kāi)始

After playing around with various Scikit Learn modules, I decided to try and write linear regression algorithm from the ground up.

當(dāng)我已經(jīng)玩過(guò)了Scikit learn不同的模型,我決定嘗試自己重頭寫(xiě)一個(gè)線性回歸算法。

I wanted to do this, because I felt (and still feel) that I really don’t understand what’s happening on under the hood.

從頭做一個(gè)算法,是因?yàn)槲矣X(jué)得至今都沒(méi)有真正理解在算法的內(nèi)部發(fā)生了什么,我嘗試去理解內(nèi)部的邏輯。

Luckily, the Courera course goes into detail on how a few of the algorithms work, which came to great use at great use at this point. More specifically, ti describes the underlying concepts of using linear regressing with gradient descent.

幸運(yùn)的是,Coursera課程會(huì)詳細(xì)介紹一些算法的工作原理以及使用的方式。尤其是課程詳細(xì)介紹了基于梯度下降的線性回歸算法的基本概念。

This has definitely been the most effective of learning technique, as it forces you to understand the steps that are going on ‘under the hood’. I strongly recommend you to do this at some point.

將你的精力都集中在理解算法‘內(nèi)部’發(fā)生了什么,絕對(duì)是非常有效的一種學(xué)習(xí)方式。我強(qiáng)烈推薦在這個(gè)階段你也需要通過(guò)這種方式學(xué)習(xí)。

I plan to rewrite my own implementations of more complex algorithms as I go along, but I prefer doing this after I’ve played around with the respective algorithms in Scikit Learn.

我計(jì)劃重寫(xiě)更多復(fù)雜的算法實(shí)踐,不過(guò)當(dāng)前我更需要在我完全掌握應(yīng)用Scikit Learn中各個(gè)算法,所以我計(jì)劃以后再去完成算法的重寫(xiě)。


Thursday: Start competing

周四:開(kāi)始比賽

On Thursday, I started doing Kaggle’s introductory tutorials. Kaggle is a platform for machine learning competitions,where you can submit solutions to problems released by companies or organizations.

周四,我開(kāi)始接觸Kaggle論壇上的介紹教程。Kaggle是一個(gè)機(jī)器學(xué)習(xí)競(jìng)賽的平臺(tái),在平臺(tái)上你可以提交基于一些公司/組織公布數(shù)據(jù)問(wèn)題的解決方案。

I recommend you trying out Kaggle after having a little bit of a theoretical and practical understanding of machine learning. You’ll need this in order to start using Kaggle. Otherwise, it will be more frustrating than rewarding.

我推薦你在有一些機(jī)器學(xué)習(xí)的理論知識(shí)了解和實(shí)際練習(xí)之后再參加Kaggle的比賽。你會(huì)需要用到這些知識(shí),不然貿(mào)然去參賽得到的挫敗感比獲得的成就感大得多。

The Bag of Words tutorial guides you through every steps you need to take in order to enter a submission to a competition, plus gives you a brief and exciting introduction into natural language Processing(NLP). I ended the tutorial with much higher interest in NLP than I had when entering it.

詞袋模型的教程會(huì)引導(dǎo)你一步一步提交一次比賽結(jié)果,另外給你簡(jiǎn)要并激奮的介紹了和NLP(自然語(yǔ)言處理)相關(guān)的內(nèi)容。這也讓我除了提交的流程之外更多的對(duì)NLP產(chǎn)生了興趣。


Friday: Back to school

周五:回到學(xué)校

Friday, I continued working on the Kaggle tutorials, and also started Udacity’s Intro to Machine Learning. I’m currently half ways through, and find it quite enjoyable.

周五,我繼續(xù)把時(shí)間花在Kaggle上,也開(kāi)始了學(xué)習(xí)Udacity’s Intro to Machine Learning課程,現(xiàn)在已經(jīng)完成了一半的學(xué)習(xí),我發(fā)現(xiàn)里面有很多有意思的東西。

It’s a lot easier the Coursera course, as it doesn’t go in depth in the algorithms. But it’s also more practical, as it teaches you Scikit Learn, which is a whole lot easier to apply to the real world than writing algorithms from the ground up in Octave, as you do in the Coursera course.

在Coursera的課程中有很多相對(duì)更簡(jiǎn)單的課程,并沒(méi)有詳細(xì)深入的介紹這些算法。相對(duì)來(lái)說(shuō),更多的Scikit Learn相關(guān)的練習(xí),這些聯(lián)系比起從Octave上從頭開(kāi)始寫(xiě)一個(gè)算法來(lái)說(shuō)更容易在現(xiàn)實(shí)中得到應(yīng)用。


The road ahead

前方的路

Doing it for a week hasn’t just been great fun, it has also helped my awareness of its usefulness of machine learning in society. The more I learn about it, the more I see which areas it can be used to solve problems.

過(guò)去的一周不僅僅讓我獲得了極大的成就感,還讓我意識(shí)到機(jī)器學(xué)習(xí)在社會(huì)中的應(yīng)用。我越對(duì)機(jī)器學(xué)習(xí)了解越多,發(fā)現(xiàn)有越多的領(lǐng)域可以用機(jī)器學(xué)習(xí)的方式來(lái)解決。

If you’re interested in getting into machine learning, I strongly recommend you setting off a few days or evenings and simply dive into it.

如果你有興趣進(jìn)入機(jī)器學(xué)習(xí)的世界,強(qiáng)烈推薦你騰出一些天或者一些晚上出來(lái),好好的研究下這個(gè)領(lǐng)域。

Choose a top down approach if you’re not ready for the heavy stuff, and get into problem solving as quickly as possible.

如果你還沒(méi)有準(zhǔn)備好全面深入的學(xué)習(xí)這些東西,建議選擇由上至下的方法,從盡快找一個(gè)需要解決的問(wèn)題域開(kāi)始。

Good luck

祝你好運(yùn)

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 機(jī)器學(xué)習(xí)(Machine Learning)&深度學(xué)習(xí)(Deep Learning)資料(Chapter 1) 注...
    Albert陳凱閱讀 22,842評(píng)論 9 477
  • 傍晚在回學(xué)校的路上,經(jīng)過(guò)漢中門(mén)地鐵站出口那片廣場(chǎng)時(shí),我看到好幾個(gè)賣(mài)小吃的三輪車,幾個(gè)大媽在賣(mài)手抓餅,一個(gè)老大爺在賣(mài)...
    漂泊瓶閱讀 400評(píng)論 0 0
  • 喜歡一個(gè)人,是一件治愈的事兒。 偷偷喜歡一個(gè)人,則是一件治愈又致郁的事兒。 還是感謝你出現(xiàn)在我生命里啊, 我才有了...
    一朵屋里安安閱讀 695評(píng)論 0 0
  • 不喜歡她的他,在等哪個(gè)她?
    最美的時(shí)光Y閱讀 130評(píng)論 0 0
  • 我:西瓜真好吃。網(wǎng)友:其他水果就不好吃嗎?難道就西瓜努力?我:我只想表達(dá)西瓜好吃,沒(méi)想牽扯其他水果。網(wǎng)友:你就說(shuō)其...
    我有一口小白牙閱讀 295評(píng)論 6 0

友情鏈接更多精彩內(nèi)容