3.1「Stanford Algorithms」O(n log n) Algorithm for Counting Inversions1 - Part1

In this next series of videos, we'll get some more practice applying the divide and conquer algorithm and design paradigm to various problems.

This will also give us a glimpse of the kinds of application [inaudible] to which it's been successfully applied.

We're gonna start by extending the merge sort algorithm to solve a problem involving counting the number of inversions of an array.

Before we tackle the specific problem of counting the number of inversions in an array, let me say a few words about the divide and conquer paradigm in general.

So again, you've already seen the totally canonical example of divide and conquer, namely merge sort.

So the following three conceptual steps will be familiar to you.

The first step, no prizes for guessing is you divide.

The problem.

Into smaller sub-problems.

Sometimes this division happens only in your mind.

It's really more of a conceptual step than part of your code.

Sometimes you really do copy over parts of the input into say new arrays to pass on to your recursive calls.

The second step, again no prizes here, is you conquer the sub-problems just using recursion.

So for example, in Merge Sort, you conceptually divide the array into two different pieces.

And then you [inaudible] with the conquer or sort to the first half of the array.

And you, you do the same thing with the second half of the array.

Now, of course, it's not quite as easy as just doing these two steps.

Dividing the problem, and then solving the sub problems recursively.

Usually, you have some extra cleanup work after the recursive calls, and to stitch together the solutions to the sub problems into one for the big problem, the problem that you actually care about.

Recall, for example, in Merge Sort, after our recursive calls, the left half of the array was sorted, the right half of the array was sorted.

But we still had to stitch those together.

Merge into a sorted version of the entire array.

So the [inaudible] step is to combine.

The solutions to the subproblem into one problem.

Generally the largest amount of ingenuity happens in the third step.

How do you actually quickly combine solutions to subproblems into one to the original problem? Sometimes you also get some cleverness in the first step with division.

Sometimes it's as simple as just spliting a ray in two.

But there are cases where the division step also has some ingenuity.

Now let's move on to the specific problem of counting inversions and see how to apply this divide and conquer paradygm.

So let begin by defining the problem formally now.

We're given as input an array A with a length N.

And you can define the problem so that the array a contains any ole distinct numbers.

But, let's just keep thing simple and assume that it contains the numbers one through n.

The integers in that range in some order.

That captures the essence of the problem.

And the goal is to compute the number of inversions of this array so what's an inversion you may ask well an inversion is just a pair of array [inaudible] I and J with I smaller than J so that earlier array entry the I entry is bigger than the latter one the Jake one so one thing that should be evident is that if the array contains these numbers in sorted order if the array is simply one two three four all the way up to N then the number of inversions is zero.

The converse you might also want to think through if the array has any other ordering of the numbers between one and N other than the assorted one, then it's going to have a non.

Of zero number of inversions.

Let's look at another example.

So'spose we have an array of six entries.

So the numbers one thru six in the following order.

One, three, five followed by two, four, six.

So how many inversions does this array have? So again what we need to look for are pairs of array entries so that the earlier or left entry is bigger than the later or right entry.

So one example which we see right here would five and two.

Those are right next to each other and out of order, the earlier entry is bigger than the other one.

But there's others, there's three and two for example Those are out of order.

And, five and four are also out of order.

And I'll leave it to you to check that those are the only three pairs that are out of order.

So summarizing the inversions in this array of length six are 3-2, 5-2, and 5-4.

Corresponding to the array entries, 2-4, 3-4, and 3-5.

Pictorially, we can think of it thusly, we can first.

Write down the numbers in order, one up to six.

And then we can write down the numbers again but, ordered in the way that their given in the input array.

So, one three five two four six.

And then we can connect the dots, meaning we connect one to one.

Reconnect two to two, and so on.

It turns out, and I'll leave to for you to, to think this through, that the number of crossing pairs of line segments prescisely correspond to the number of inversions.

So we see that there are one, two, three crossing line segments.

And these are exactly in correspondence with the three inversions, we found earlier.

Five and two, three and two, and five and four.

Now, [inaudible] wanna solve this problem you might ask.

Well there's few reasons that come up.

One would be to have a numerical similarity measure that quantifies how close to [inaudible] lists are to each other.

So for example, suppose I took you and a friend, and I took, identified ten movies that both of you had seen.

And I asked each of you to order, or to rank these movies from your most favorite to your least favorite.

Now I can form an array, and compute inversions.

And it quantifies, in some sense, how dissimilar your two rankings are to each other.

So in more detail, in the first entry of this array, I would put down the ranking that your friend gave to your favorite movie.

So if you had your favorite movie, Star Wars or whatever.

And your friend only thought it was the fifth best out of the ten, then I would write down a five in the first entry of this array.

Generally, I would take your second favorite movie.

I would look at how your friend ranked that.

I would put that in the second entry of the array and so on, all the way up to the tenth entry of the array, where I would put your friend's ranking of your least favorite movie.

Now, if you have exactly identical preferences, if you rank them exactly the same way, the number of inversions of this array would be zero.

And in general, the more inversions this array has, it quantifies that your lists look more and more different from each other.

Now why might you want to do this why might you want to know whether two different people ranked things in the similar way had similar preferences well one reason might be what's called collaborative filtering, probably many of you have had the experience of going to a website and if you've made a few purchases through this website it starts recommending further purchases for you, so one way to solve this problem under the hood, is to look at your purchases look at what you seem to like, find other people who have similar preferences similar history look at things they've bought that you haven't, and then recommend.

New products to you based on what similar customers seemed to have bought.

So this problem captures some of the essence of identifying which customers or which people are similar based on data about what they prefer.

So just to make sure we're all on the same page, let me pause for a brief quiz.

We've already noticed that a given array will have zero inversions, if and only if it's in sorted order.

If it only contains the numbers of one through N in order.

So, on the other side, what is the largest number of inversions an array could possibly have? Let's say, just for an array of size six, like the one in this example here.


在接下來的視頻系列中,我們將獲得更多實踐,將分而治之算法和設計范例應用于各種問題。

這也將使我們了解成功應用了哪些應用程序[聽不清]。

我們將從擴展合并排序算法開始,以解決涉及計算數(shù)組反轉(zhuǎn)數(shù)的問題。

在我們解決對數(shù)組中的求逆數(shù)進行計數(shù)的特定問題之前,讓我先談一些關于分而治之的范式。

同樣,您已經(jīng)看到了分而治之的完全典范示例,即合并排序。

因此,您將熟悉以下三個概念性步驟。

第一步,猜猜是沒有獎品的。

問題。

分為較小的子問題。

有時這種分裂只發(fā)生在您的腦海中。

這實際上是更多概念性步驟,而不是代碼的一部分。

有時,您確實確實將部分輸入復制到新數(shù)組中,以傳遞給遞歸調(diào)用。

第二步,這里也沒有獎品,是您僅使用遞歸即可解決子問題。

因此,例如,在“合并排序”中,您從概念上將數(shù)組分為兩個不同的部分。

然后,您[聽不清]用征服或排序到數(shù)組的前半部分。

而您,對數(shù)組的后半部分執(zhí)行相同的操作。

現(xiàn)在,當然,這并不像完成這兩個步驟那樣容易。

劃分問題,然后遞歸解決子問題。

通常,在遞歸調(diào)用之后,您需要進行一些額外的清理工作,并將子問題的解決方案組合為一個大問題,即您真正關心的問題。

回想一下,例如,在“合并排序”中,在我們進行遞歸調(diào)用之后,對數(shù)組的左半部分進行了排序,對數(shù)組的右半部分進行了排序。

但是我們?nèi)匀槐仨殞⑺鼈兛p合在一起。

合并到整個數(shù)組的排序版本中。

因此,[聽不清]步驟是合并。

該子問題的解決方案成為一個問題。

通常,最大的創(chuàng)造力發(fā)生在第三步。

您實際上如何快速將子問題的解決方案組合為原始問題?有時,在除法的第一步中,您也會變得很聰明。

有時,就像將光線一分為二一樣簡單。

但是在某些情況下,分割步驟也有一些技巧。

現(xiàn)在,讓我們繼續(xù)研究反轉(zhuǎn)計數(shù)的特定問題,并了解如何應用這種分而治之。

因此,讓我們從現(xiàn)在開始正式定義問題開始。

我們將輸入長度為N的數(shù)組A作為輸入。

并且您可以定義問題,以便數(shù)組a包含任何ole互不相同的數(shù)字。

但是,讓我們保持簡單,并假設它包含數(shù)字1到n。

該范圍內(nèi)的整數(shù)按一定順序排列。

這抓住了問題的實質(zhì)。

而且目標是計算此數(shù)組的求逆數(shù),因此您可能會問什么是求逆?求逆只是一對數(shù)組[聽不清] I和J,且I小于J,因此,較早的數(shù)組入口I入口較大比后一個Jake更重要的一點是,如果數(shù)組按排序順序包含這些數(shù)字,并且該數(shù)組只是一二三四一直到N,則求逆數(shù)為零。

相反,您可能還想考慮一下,如果數(shù)組在除N之外的其他任何一個介于1和N之間的數(shù)字,則它將有一個非整數(shù)。

反轉(zhuǎn)數(shù)為零。

讓我們看另一個例子。

因此,假設我們有一個包含六個條目的數(shù)組。

因此,數(shù)字按以下順序從一到六。

一,三,五,然后是二,四,六。

那么這個數(shù)組有多少個反轉(zhuǎn)呢?因此,我們再次需要尋找的是成對的數(shù)組條目,以便較早或較左的條目大于較晚或較右的條目。

因此,我們在這里看到的一個示例將是五個和兩個。

那些是彼此相鄰且順序混亂的,較早的條目要比另一個條目大。

但是還有其他一些,例如三個和兩個,它們是亂序的。

并且,五個和四個也出現(xiàn)故障。

我將留給您檢查,以確保只有三對故障。

因此,總結此長度為6的數(shù)組中的反演為3-2、5-2和5-4。

對應于數(shù)組條目2-4、3-4和3-5。

在圖形上,我們可以這樣思考,我們可以首先。

依次寫下數(shù)字,最多1個。

然后我們可以再次寫下數(shù)字,但是按照輸入數(shù)組中給定的方式排序。

因此,一三五二四六。

然后我們可以連接點,這意味著我們一對一地連接。

重新連接兩個到兩個,依此類推。

事實證明,我將讓您仔細考慮一下,線段的交叉對的數(shù)量精確地對應于反轉(zhuǎn)的數(shù)量。

因此,我們看到有1、2、3個交叉線段。

這些恰好與我們先前發(fā)現(xiàn)的三個反演相對應。

五和二,三和二,五和四。

現(xiàn)在,[聽不清]想解決您可能會問的問題。

好吧,有幾個原因。

一種方法是采用數(shù)值相似性度量,以量化[聽不清]列表彼此之間的接近程度。

例如,假設我?guī)愫鸵粋€朋友,然后帶我確定了你們倆都看過的十部電影。

我請你們每個人訂購,或?qū)⑦@些電影從您最喜歡的到最不喜歡的排列。

現(xiàn)在,我可以形成一個數(shù)組,并計算反轉(zhuǎn)。

從某種意義上說,它量化了兩個排名之間的差異。

因此,更詳細地講,在該數(shù)組的第一個條目中,我將放下您的朋友對您最喜歡的電影的排名。

因此,如果您有自己喜歡的電影,《星球大戰(zhàn)》或其他電影。

而您的朋友只認為這是十個最佳中的第五個,那么我會在該數(shù)組的第一項中寫下一個五個。

通常,我會拍第二部您喜歡的電影。

我會看看你的朋友如何評價的。

我會將其放在數(shù)組的第二個條目中,依此類推,一直到數(shù)組的第十個條目,然后將您的朋友在您最不喜歡的電影中的排名放在此處。

現(xiàn)在,如果您具有完全相同的首選項,如果以完全相同的方式對它們進行排名,則此數(shù)組的反轉(zhuǎn)數(shù)將為零。

通常,此數(shù)組的反轉(zhuǎn)次數(shù)更多,它可以量化列表之間的差異越來越大。

現(xiàn)在,為什么要執(zhí)行此操作,為什么要知道兩個不同的人是否以相似的方式對事物進行排名,是否具有相似的偏好?一個原因可能就是所謂的協(xié)作過濾,也許你們中的許多人都有訪問網(wǎng)站的經(jīng)驗如果您通過該網(wǎng)站進行了幾次購買,它就會開始為您推薦進一步的購買,因此,解決這個問題的一種方法是查看您的購買,看看您的喜好,找到其他有購買意愿的人相似的偏好相似的歷史記錄會查看他們購買的,您沒有的東西,然后推薦。

根據(jù)相似客戶購買的商品為您提供新產(chǎn)品。

因此,此問題捕獲了一些本質(zhì),這些本質(zhì)是基于有關他們偏愛的數(shù)據(jù)來識別哪些客戶或哪些人是相似的。

因此,為了確保我們都在同一頁面上,讓我暫停一下簡短的測驗。

我們已經(jīng)注意到,給定數(shù)組只有在按排序順序時才具有零反轉(zhuǎn)。

如果僅按順序包含1到N的數(shù)字。

那么,另一方面,數(shù)組可能具有的最大反轉(zhuǎn)數(shù)是多少?假設僅是大小為6的數(shù)組,例如此處的示例。


O(n log n) Algorithm for Counting Inversions I - Question 1

What is the largest-possible number of inversions that a 6-element array can have?

A 15

B 21

C 36

D 64

最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內(nèi)容