1、概要
大多數(shù)任務(wù)都能夠通過簡單的單進(jìn)程單線程任務(wù)處理好,但是還有一大部分現(xiàn)實訴求無法滿足。批量任務(wù)存在兩種并行模式
- 單進(jìn)程、多線程
- 多進(jìn)程
我們也可以細(xì)分為
- 多線程Step(單進(jìn)程) Multi-thread Step
- 并行Step(單進(jìn)程) Parallel Steps
- 對Step進(jìn)行遠(yuǎn)程分塊(多進(jìn)程)Remote Chunking of Step
- 對Step進(jìn)行分區(qū) Partitioning a Step
今天我們將通過兩個例子來解釋多線程和并行任務(wù)...目前還僅限于單進(jìn)程模式,后面會繼續(xù)通過示例的方式說明多線程模式
2、開啟并發(fā)并行之旅
項目依賴就不多說了,在之前的入門文章中已經(jīng)說明。但是我們還需要添加如下兩個依賴
<!-- https://mvnrepository.com/artifact/com.thoughtworks.xstream/xstream -->
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4.12</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.springframework/spring-oxm -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-oxm</artifactId>
</dependency>
2.1 準(zhǔn)備腳本
create table TRANSACTION
(
ACCOUNT varchar(32) null,
AMOUNT decimal null,
TIMESTAMP datetime null
);
我們創(chuàng)建了一張表,用于儲存文件中的數(shù)據(jù)。
2.2、準(zhǔn)備CSV數(shù)據(jù)
5113971498870901,-546.68,2018-02-08 17:46:12
4041373995909987,-37.06,2018-02-02 21:10:33
3573694401052643,-784.93,2018-02-04 13:01:30
3543961469650122,925.44,2018-02-05 23:41:50
....
2.3、準(zhǔn)備XM文件
<transactions>
<transaction>
<account>633110684460535475</account>
<amount>961.93</amount>
<timestamp>2018-02-03 18:30:51</timestamp>
</transaction>
<transaction>
<account>3555221131716404</account>
<amount>759.62</amount>
<timestamp>2018-02-12 20:02:01</timestamp>
</transaction>
<transaction>
<account>30315923571992</account>
<amount>648.92</amount>
<timestamp>2018-02-12 23:16:45</timestamp>
</transaction>
......
</transactions>
2.4、多線程Step
最簡單開啟spring batch并發(fā)處理能力的辦法就是將TaskExecutor添加到Step的配置中,如下
@Configuration
public class MultiThreadJobConfiguration extends BaseJobConfiguration {
public FlatFileItemReader<Transaction> fileTransactionReader() {
Resource resource = new FileSystemResource("csv/bigtransactions.csv");
return new FlatFileItemReaderBuilder<Transaction>()
.saveState(false)
.resource(resource)
.delimited()
.names(new String[]{"account", "amount", "timestamp"})
.fieldSetMapper(fieldSet -> {
Transaction transaction = new Transaction();
transaction.setAccount(fieldSet.readString("account"));
transaction.setAmount(fieldSet.readBigDecimal("amount"));
transaction.setTimestamp(fieldSet.readDate("timestamp", "yyyy-MM-dd HH:mm:ss"));
return transaction;
})
.build();
}
@Bean
@StepScope
public JdbcBatchItemWriter<Transaction> writer(@Qualifier("dataSource") DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Transaction>()
.dataSource(dataSource)
.beanMapped()
.sql("INSERT INTO TRANSACTION (ACCOUNT, AMOUNT, TIMESTAMP) VALUES (:account, :amount, :timestamp)")
.build();
}
@Bean("multithreadedJob")
public Job multithreadedJob() {
return this.jobs.get("multithreadedJob")
.start(step1())
.build();
}
@Bean
public Step step1() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(4);
taskExecutor.afterPropertiesSet();
return this.steps.get("multithreadedStep")
.<Transaction, Transaction>chunk(1000)
.reader(fileTransactionReader())
.writer(writer(null))
.taskExecutor(taskExecutor)
.build();
}
}
以上代碼說明,我們分了4個線程,read和writer按照每塊1000條數(shù)據(jù)執(zhí)行。使用我當(dāng)前的Intel? Core? i5-3210M CPU @ 2.50GHz × 4機(jī)器讀取60000萬條數(shù)據(jù)并且落地花費(fèi)時間1分半鐘。調(diào)整chunk大小,經(jīng)過測試也會發(fā)現(xiàn)對于性能也存在一定的影響,實際生產(chǎn)環(huán)境中使用需要調(diào)整優(yōu)化chunk大小。
2.5、并行Step
并行的代碼看起來稍微復(fù)雜一點(diǎn),個人理解并行任務(wù)和多線程并發(fā)任務(wù)沒有本質(zhì)區(qū)別,只是區(qū)別于不同的業(yè)務(wù)場景,并行任務(wù)區(qū)別于并發(fā)任務(wù)關(guān)鍵在于并行任務(wù)將一個大任務(wù)拆分為多個Flow,一個Flow可以串聯(lián)多個Flow,一個Flow可以包含多個Step.下面是一個例子,并行讀取兩個文件,一個csv文件,一個xml文件。
@Configuration
public class ParallelJobConfiguration extends BaseJobConfiguration {
@Bean
@StepScope
public FlatFileItemReader<Transaction> fileTransactionReader() {
Resource resource = new FileSystemResource("data/csv/bigtransactions.csv");
return new FlatFileItemReaderBuilder<Transaction>()
.saveState(false)
.resource(resource)
.delimited()
.names(new String[]{"account", "amount", "timestamp"})
.fieldSetMapper(fieldSet -> {
Transaction transaction = new Transaction();
transaction.setAccount(fieldSet.readString("account"));
transaction.setAmount(fieldSet.readBigDecimal("amount"));
transaction.setTimestamp(fieldSet.readDate("timestamp", "yyyy-MM-dd HH:mm:ss"));
return transaction;
})
.build();
}
@Bean
@StepScope
public StaxEventItemReader<Transaction> xmlTransactionReader() {
Resource resource = new FileSystemResource("data/xml/bigtransactions.xml");
Map<String, Class> map = new HashMap<>();
map.put("transaction", Transaction.class);
map.put("account", String.class);
map.put("amount", BigDecimal.class);
map.put("timestamp", Date.class);
XStreamMarshaller marshaller = new XStreamMarshaller();
marshaller.setAliases(map);
String[] formats = {"yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd"};
marshaller.setConverters(new DateConverter("yyyy-MM-dd HH:mm:ss", formats));
return new StaxEventItemReaderBuilder<Transaction>()
.name("xmlFileTransactionReader")
.resource(resource)
.addFragmentRootElements("transaction")
.unmarshaller(marshaller)
.build();
}
@Bean
@StepScope
public JdbcBatchItemWriter<Transaction> jdbcBatchItemWriter(@Qualifier("dataSource") DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Transaction>()
.dataSource(dataSource)
.beanMapped()
.sql("INSERT INTO TRANSACTION (ACCOUNT, AMOUNT, TIMESTAMP) VALUES (:account, :amount, :timestamp)")
.build();
}
@Bean("parallelJob")
public Job parallelStepsJob() {
return this.jobs.get("parallelJob")
.start(parallelFlow())
.end()
.build();
}
@Bean
public Flow parallelFlow() {
return new FlowBuilder<Flow>("parallelFlow")
.split(new SimpleAsyncTaskExecutor())
.add(flow1(), flow2())
.build();
}
@Bean
public Flow flow1() {
return new FlowBuilder<Flow>("flow1")
.start(step1())
.build();
}
@Bean
public Flow flow2() {
return new FlowBuilder<Flow>("flow2")
.start(step2())
.build();
}
@Bean("xmlStep")
public Step step1() {
return this.steps.get("xmlStep")
.<Transaction, Transaction>chunk(1000)
.reader(xmlTransactionReader())
.writer(jdbcBatchItemWriter(null))
.build();
}
@Bean("fileStep")
public Step step2() {
return this.steps.get("fileStep")
.<Transaction, Transaction>chunk(1000)
.reader(fileTransactionReader())
.writer(jdbcBatchItemWriter(null))
.build();
}
2.6、運(yùn)行任務(wù)
# 執(zhí)行多線程任務(wù)
curl http://localhost:8080/launchMultiThreadjob
# 執(zhí)行并行任務(wù)
curl http://localhost:8080/launchParallelJobjob
# 或者通過瀏覽器打開上面的地址