alpine容器
ali倉庫版本:
https://mirrors.aliyun.com/alpine/v3.16/main/
https://mirrors.aliyun.com/alpine/v3.16/community/
宿主機(jī)docker版本要求:
docker version 20.10.8
使用apk安裝基礎(chǔ)環(huán)境:
apk add python3 python3-dev py3-pip gcc g++ make --allow-untrusted
安裝環(huán)境python版本: python 3.10
注:不建議安裝PyMuPDF,非常的慢,而且容易報(bào)錯(cuò),坑多且不好平,可以用pdf2image替代
pdf2image介紹
廢話不多,直接上dockerfile
FROM 容器地址
RUN echo 'https://mirrors.aliyun.com/alpine/v3.16/main/' > /etc/apk/repositories
RUN echo 'https://mirrors.aliyun.com/alpine/v3.16/community/' >> /etc/apk/repositories
RUN cat /etc/apk/repositories
RUN apk update --allow-untrusted
#python3基礎(chǔ)環(huán)境
RUN apk add python3 python3-dev py3-pip gcc g++ make --allow-untrusted
#pdf轉(zhuǎn)圖片
RUN apk add poppler poppler-utils --allow-untrusted
RUN pip3 install poppler-utils
RUN pip3 install pdf2image
#pdf提取文字
RUN pip3 install PyPDF2
RUN pip3 install python-pptx
#libreoffice
RUN apk add libreoffice openjdk13-jre-headless freetype freetype-dev --allow-untrusted
RUN mkdir /usr/share/fonts
#字體文件,不然字體會(huì)解析失敗
COPY docker/msyh.ttf /usr/share/fonts
RUN chmod 777 /usr/share/fonts
libreoffice實(shí)現(xiàn)pptx轉(zhuǎn)pdf命令:
soffice --headless --convert-to pdf ./test.pptx --outdir ./