SpringBatch系列之并發(fā)并行能力

1、概要

大多數(shù)任務(wù)都能夠通過簡單的單進(jìn)程單線程任務(wù)處理好,但是還有一大部分現(xiàn)實訴求無法滿足。批量任務(wù)存在兩種并行模式

  • 單進(jìn)程、多線程
  • 多進(jìn)程

我們也可以細(xì)分為

  • 多線程Step(單進(jìn)程) Multi-thread Step
  • 并行Step(單進(jìn)程) Parallel Steps
  • 對Step進(jìn)行遠(yuǎn)程分塊(多進(jìn)程)Remote Chunking of Step
  • 對Step進(jìn)行分區(qū) Partitioning a Step

今天我們將通過兩個例子來解釋多線程和并行任務(wù)...目前還僅限于單進(jìn)程模式,后面會繼續(xù)通過示例的方式說明多線程模式

2、開啟并發(fā)并行之旅

項目依賴就不多說了,在之前的入門文章中已經(jīng)說明。但是我們還需要添加如下兩個依賴

<!-- https://mvnrepository.com/artifact/com.thoughtworks.xstream/xstream -->
<dependency>
      <groupId>com.thoughtworks.xstream</groupId>
      <artifactId>xstream</artifactId>
      <version>1.4.12</version>
</dependency>

        <!-- https://mvnrepository.com/artifact/org.springframework/spring-oxm -->
<dependency>
       <groupId>org.springframework</groupId>
        <artifactId>spring-oxm</artifactId>
</dependency>

2.1 準(zhǔn)備腳本

create table TRANSACTION
(
    ACCOUNT   varchar(32) null,
    AMOUNT    decimal     null,
    TIMESTAMP datetime    null
);

我們創(chuàng)建了一張表,用于儲存文件中的數(shù)據(jù)。

2.2、準(zhǔn)備CSV數(shù)據(jù)

5113971498870901,-546.68,2018-02-08 17:46:12
4041373995909987,-37.06,2018-02-02 21:10:33
3573694401052643,-784.93,2018-02-04 13:01:30
3543961469650122,925.44,2018-02-05 23:41:50
....

2.3、準(zhǔn)備XM文件

<transactions>
    <transaction>
        <account>633110684460535475</account>
        <amount>961.93</amount>
        <timestamp>2018-02-03 18:30:51</timestamp>
    </transaction>
    <transaction>
        <account>3555221131716404</account>
        <amount>759.62</amount>
        <timestamp>2018-02-12 20:02:01</timestamp>
    </transaction>
    <transaction>
        <account>30315923571992</account>
        <amount>648.92</amount>
        <timestamp>2018-02-12 23:16:45</timestamp>
    </transaction>
    ......
</transactions>

2.4、多線程Step

最簡單開啟spring batch并發(fā)處理能力的辦法就是將TaskExecutor添加到Step的配置中,如下

@Configuration
public class MultiThreadJobConfiguration extends BaseJobConfiguration {

    public FlatFileItemReader<Transaction> fileTransactionReader() {
        Resource resource = new FileSystemResource("csv/bigtransactions.csv");
        return new FlatFileItemReaderBuilder<Transaction>()
                .saveState(false)
                .resource(resource)
                .delimited()
                .names(new String[]{"account", "amount", "timestamp"})
                .fieldSetMapper(fieldSet -> {
                    Transaction transaction = new Transaction();
                    transaction.setAccount(fieldSet.readString("account"));
                    transaction.setAmount(fieldSet.readBigDecimal("amount"));
                    transaction.setTimestamp(fieldSet.readDate("timestamp", "yyyy-MM-dd HH:mm:ss"));
                    return transaction;
                })
                .build();
    }

    @Bean
    @StepScope
    public JdbcBatchItemWriter<Transaction> writer(@Qualifier("dataSource") DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Transaction>()
                .dataSource(dataSource)
                .beanMapped()
                .sql("INSERT INTO TRANSACTION (ACCOUNT, AMOUNT, TIMESTAMP) VALUES (:account, :amount, :timestamp)")
                .build();
    }

    @Bean("multithreadedJob")
    public Job multithreadedJob() {
        return this.jobs.get("multithreadedJob")
                .start(step1())
                .build();
    }

    @Bean
    public Step step1() {
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        taskExecutor.setCorePoolSize(4);
        taskExecutor.setMaxPoolSize(4);
        taskExecutor.afterPropertiesSet();

        return this.steps.get("multithreadedStep")
                .<Transaction, Transaction>chunk(1000)
                .reader(fileTransactionReader())
                .writer(writer(null))
                .taskExecutor(taskExecutor)
                .build();
    }
}

以上代碼說明,我們分了4個線程,read和writer按照每塊1000條數(shù)據(jù)執(zhí)行。使用我當(dāng)前的Intel? Core? i5-3210M CPU @ 2.50GHz × 4機(jī)器讀取60000萬條數(shù)據(jù)并且落地花費(fèi)時間1分半鐘。調(diào)整chunk大小,經(jīng)過測試也會發(fā)現(xiàn)對于性能也存在一定的影響,實際生產(chǎn)環(huán)境中使用需要調(diào)整優(yōu)化chunk大小。

2.5、并行Step

并行的代碼看起來稍微復(fù)雜一點(diǎn),個人理解并行任務(wù)和多線程并發(fā)任務(wù)沒有本質(zhì)區(qū)別,只是區(qū)別于不同的業(yè)務(wù)場景,并行任務(wù)區(qū)別于并發(fā)任務(wù)關(guān)鍵在于并行任務(wù)將一個大任務(wù)拆分為多個Flow,一個Flow可以串聯(lián)多個Flow,一個Flow可以包含多個Step.下面是一個例子,并行讀取兩個文件,一個csv文件,一個xml文件。

@Configuration
public class ParallelJobConfiguration extends BaseJobConfiguration {

    @Bean
    @StepScope
    public FlatFileItemReader<Transaction> fileTransactionReader() {
        Resource resource = new FileSystemResource("data/csv/bigtransactions.csv");
        return new FlatFileItemReaderBuilder<Transaction>()
                .saveState(false)
                .resource(resource)
                .delimited()
                .names(new String[]{"account", "amount", "timestamp"})
                .fieldSetMapper(fieldSet -> {
                    Transaction transaction = new Transaction();
                    transaction.setAccount(fieldSet.readString("account"));
                    transaction.setAmount(fieldSet.readBigDecimal("amount"));
                    transaction.setTimestamp(fieldSet.readDate("timestamp", "yyyy-MM-dd HH:mm:ss"));
                    return transaction;
                })
                .build();
    }

    @Bean
    @StepScope
    public StaxEventItemReader<Transaction> xmlTransactionReader() {
        Resource resource = new FileSystemResource("data/xml/bigtransactions.xml");
        Map<String, Class> map = new HashMap<>();
        map.put("transaction", Transaction.class);
        map.put("account", String.class);
        map.put("amount", BigDecimal.class);
        map.put("timestamp", Date.class);
        XStreamMarshaller marshaller = new XStreamMarshaller();
        marshaller.setAliases(map);
        String[] formats = {"yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd"};
        marshaller.setConverters(new DateConverter("yyyy-MM-dd HH:mm:ss", formats));

        return new StaxEventItemReaderBuilder<Transaction>()
                .name("xmlFileTransactionReader")
                .resource(resource)
                .addFragmentRootElements("transaction")
                .unmarshaller(marshaller)
                .build();
    }

    @Bean
    @StepScope
    public JdbcBatchItemWriter<Transaction> jdbcBatchItemWriter(@Qualifier("dataSource") DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Transaction>()
                .dataSource(dataSource)
                .beanMapped()
                .sql("INSERT INTO TRANSACTION (ACCOUNT, AMOUNT, TIMESTAMP) VALUES (:account, :amount, :timestamp)")
                .build();
    }


    @Bean("parallelJob")
    public Job parallelStepsJob() {

        return this.jobs.get("parallelJob")
                .start(parallelFlow())
                .end()
                .build();
    }

    @Bean
    public Flow parallelFlow() {
        return new FlowBuilder<Flow>("parallelFlow")
                .split(new SimpleAsyncTaskExecutor())
                .add(flow1(), flow2())
                .build();
    }

    @Bean
    public Flow flow1() {
        return new FlowBuilder<Flow>("flow1")
                .start(step1())
                .build();
    }

    @Bean
    public Flow flow2() {
        return new FlowBuilder<Flow>("flow2")
                .start(step2())
                .build();
    }

    @Bean("xmlStep")
    public Step step1() {
        return this.steps.get("xmlStep")
                .<Transaction, Transaction>chunk(1000)
                .reader(xmlTransactionReader())
                .writer(jdbcBatchItemWriter(null))
                .build();
    }

    @Bean("fileStep")
    public Step step2() {
        return this.steps.get("fileStep")
                .<Transaction, Transaction>chunk(1000)
                .reader(fileTransactionReader())
                .writer(jdbcBatchItemWriter(null))
                .build();
    }

2.6、運(yùn)行任務(wù)

# 執(zhí)行多線程任務(wù)
curl http://localhost:8080/launchMultiThreadjob

# 執(zhí)行并行任務(wù)
curl http://localhost:8080/launchParallelJobjob

# 或者通過瀏覽器打開上面的地址

3、參考文檔

4、源碼

https://github.com/cattles/fucking-great-springbatch

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容