Vertica的這些事(十六)——Vertica如何建表

大家看到題目可能會想,建表誰不會呀,還用講嘛?但是如何建表能使你的應(yīng)用查詢更加快呢?

Anatomy of a Projection

The [CREATE PROJECTION]statement defines the individual elements of a projection, as the following graphic shows.


image.png

The previous example contains the following significant elements:

Column List and Encoding

Lists every column in the projection and defines the encoding for each column. Unlike traditional database architectures, HP Vertica operates on encoded data representations. Therefore, HP recommends that you use data encoding because it results in less disk I/O.

Base Query

Identifies all the columns to incorporate in the projection through column name and table name references. The base query for large table projections can contain PK/FK joins to smaller tables.

Sort Order

The sort order optimizes for a specific query or commonalities in a class of queries based on the query predicate. The best sort orders are determined by the WHERE clauses. For example, if a projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of the needed data is found together in the sort order, so the query runs almost instantaneously.
You can also optimize a query by matching the projection's sort order to the query's GROUP BY clause. If you do not specify a sort order, HP Vertica uses the order in which columns are specified in the column definition as the projection's sort order.
The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once. For maximum performance, do not sort projections on LONG VARBINARY and LONG VARCHAR columns.

Segmentation

The segmentation clause determines whether a projection is segmented across nodes within the database. Segmentation distributes contiguous pieces of projections, calledsegments, for large and medium tables across database nodes. Segmentation maximizes database performance by distributing the load. Use SEGMENTED BY HASH to segment large table projections.
For small tables, use the UNSEGMENTED keyword to direct HP Vertica to replicate these tables, rather than segment them. Replication creates and stores identical copies of projections for small tables across all nodes in the cluster. Replication ensures high availability and recovery.
For maximum performance, do not segment projections on LONG VARBINARY and LONG VARCHAR columns.

以上來自官網(wǎng),理解如下:

Projection的解析
Sort Order
1、  order by 后表中插入的數(shù)據(jù)是有序的,所以order by 的列就源自于你在查詢語句時(shí)使用的where 字句的內(nèi)容。例如,如果字句查詢中有where x=1 and y=2,那么建立projection時(shí)order by (x, y)查詢的時(shí)候就會迅速定位到符合條件的數(shù)據(jù)
2、  group by 后面的字段,出現(xiàn)在order by 中也可以優(yōu)化查詢。
3、  order by 不要建立在LONG VARBINARY and LONG VARCHAR的列
Segmentation
1、  Segmentation by hash()就是按照某一列,打散數(shù)據(jù),把數(shù)據(jù)均勻的分布在各個(gè)節(jié)點(diǎn)上,對于大表,要記得使用。所以 hash里的列是主鍵最好,也就是說該列數(shù)據(jù)不重復(fù)的值越多,越適合做hash.
2、  Segmentation by 的列不要用LONG VARBINARY and LONG VARCHAR columns.

根據(jù)以上規(guī)則建表,可以更大程度優(yōu)化你的查詢性能。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容