SchemaCompatibilityException: Unable to validate the rewritten record

spark 3.2.3
hudi 0.11.0

spark 寫hudi,commit失敗。.hoodie目錄下,有commit.request和inflight,沒有commit文件

-rw-r--r--@ 1 lqq  staff  1572  5 23 09:54 20230512145004274.rollback
-rw-r--r--@ 1 lqq  staff     0  5 23 09:54 20230512145004274.rollback.inflight
-rw-r--r--@ 1 lqq  staff  1384  5 23 09:54 20230512145004274.rollback.requested
-rw-r--r--@ 1 lqq  staff     0  5 23 09:54 20230522173618331.commit.requested
-rw-r--r--@ 1 lqq  staff  3123  5 23 09:54 20230522173618331.inflight

查看log,發(fā)現(xiàn)有錯誤日志,但是沒有打印具體的錯誤信息

 ERROR HoodieSparkSqlWriter$: UPSERT failed with errors

繼續(xù)查看源碼查,發(fā)現(xiàn)打印具體錯誤日志為TRACE級別

  private def commitAndPerformPostOperations(spark: SparkSession,
                                             schema: StructType,
                                             writeResult: HoodieWriteResult,
                                             parameters: Map[String, String],
                                             client: SparkRDDWriteClient[HoodieRecordPayload[Nothing]],
                                             tableConfig: HoodieTableConfig,
                                             jsc: JavaSparkContext,
                                             tableInstantInfo: TableInstantInfo
                                            ): (Boolean, common.util.Option[java.lang.String], common.util.Option[java.lang.String]) = {

23/05/25 11:57:45 TRACE HoodieSparkSqlWriter$: Printing out the top 100 errors
........
    } else {
      log.error(s"${tableInstantInfo.operation} failed with errors")
      if (log.isTraceEnabled) {
        log.trace("Printing out the top 100 errors")
        writeResult.getWriteStatuses.rdd.filter(ws => ws.hasErrors)
          .take(100)
          .foreach(ws => {
            log.trace("Global error :", ws.getGlobalError)
            if (ws.getErrors.size() > 0) {
              ws.getErrors.foreach(kt =>
                log.trace(s"Error for key: ${kt._1}", kt._2))
            }
          })
      }
      (false, common.util.Option.empty(), common.util.Option.empty())
    }

降低日志級別(參考http://www.itdecent.cn/u/c2bc3695bc47),重跑程序,打印出了具體的錯誤日志

org.apache.hudi.exception.SchemaCompatibilityException: Unable to validate the rewritten record {"gender": "male",  "id": 708075384135690,  "count": null} against schema {{"name":"id","type":["null","long"],{"name":"gender","type":["null","string"],"default":null},{"name":"count","type":["null","int"],"default":null}}

原因: schema不兼容。 count字段,之前寫入hudi的是int類型,新寫一批寫入是指定為long類型,導(dǎo)致寫入失敗
解決方法:改回int類型或者刪除hudi表重新寫入

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容