用Maven创建一个新的Java项目
让我们从创建一个新的Java项目,名为 pdf_utils ,使用maven:
mvn archetype:generate \
-DgroupId=com.pdf.pdf_utils \
-DartifactId=pdf_utils \
-DarchetypeArtifactId=maven-archetype-quickstart \
-DarchetypeVersion=1.4 \
-DinteractiveMode=false
然后,打开 pdf_utils/pom.xml file,并在依赖项中添加依赖项 e节:
<dependencies>
...
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.27</version>
</dependency>
...
</dependencies>
还更改目标和源编译器版本:
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>17</source>
<target>17</target>
</configuration>
</plugin>
</plugins>
然后将生成的 src/main/java/com/pdf/pdf/pdf_utils/app.java 类重命名为 pdfutils
如何裁剪PDF
现在,我们将开发一个Java功能,该功能在PDF文档中的每个页面上播种,并将裁剪的内容保存在新的PDF文件中。以毫米为单位指定的裁剪坐标将作为功能的输入。
- 添加辅助功能以在MM <->单位之间转换
- 将MM转换为单位
- 提取文档的文件名
- 裁剪每页
- 保存结果&附加“ - 编写”到Ouyput文件的名称
public String cropPDFMM(float x, float y, float width, float height, String srcFilePath) throws IOException {
// helper functions to convert between mm <-> units
Function<Float, Float> mmToUnits = (Float a) -> a / 0.352778f;
Function<Float, Float> unitsToMm = (Float a) -> a * 0.352778f;
// convert mm to units
float xUnits = mmToUnits.apply(x);
float yUnits = mmToUnits.apply(y);
float widthUnits = mmToUnits.apply(width);
float heightUnits = mmToUnits.apply(height);
// extract the doc's file name
File srcFile = new File(srcFilePath);
String fileName = srcFile.getName();
int dotIndex = fileName.lastIndexOf('.');
String fileNameWithoutExtension = (dotIndex == -1) ? fileName : fileName.substring(0, dotIndex);
// crop each page
PDDocument doc = PDDocument.load(srcFile);
int nrOfPages = doc.getNumberOfPages();
PDRectangle newBox = new PDRectangle(
xUnits,
yUnits,
widthUnits,
heightUnits);
for (int i = 0; i < nrOfPages; i++) {
doc.getPage(i).setCropBox(newBox);
}
// save the result & append -cropped to the file name
File outFile = new File(fileNameWithoutExtension + "-cropped.pdf");
doc.save(outFile);
doc.close();
return outFile.getCanonicalPath();
}
让我们测试 croppdfmm 通过从 main 函数中调用它:
public static void main( String[] args )
{
String srcFilePath = "/Users/user/.../file.pdf";
PDFUtils app = new PDFUtils();
try {
///// crop pdf
float x = 18f;
float y = 20f;
float width = 140f;
float height = 210f;
String resultFilePath = app.cropPDFMM(x, y, width, height, srcFilePath);
System.out.println( "Done!" );
} catch (Exception e) {
System.out.println(e);
}
}
您应该在当前目录中看到一个名为 file-cropped.pdf 的文件。
如何从PDF中删除页面
要从文档中删除特定页面,我们可以使用一系列整数范围。每个范围都由一个开始页面和一个末端页面组成([startpage1,endPage1,startpage2,endpage2,...])。该函数通过文档的每个页面迭代,并检查页码是否不在数组中的任何指定范围之外。如果页面不在任何范围内,则将其附加到新文档中。
- 添加辅助功能以测试页面是否在范围内
- 测试每个页码 - >将其附加到临时性。 Doc
- 保存温度。 Doc。 - >覆盖输入文件
public void removePages(String srcFilePath, Integer[] pageRanges) throws IOException {
// a helper function to test if a page is within a range
BiPredicate<Integer, Integer[]> pageInInterval = (Integer page, Integer[] allPages) -> {
for (int j = 0; j < allPages.length; j+=2) {
int startPage = allPages[j];
int endPage = allPages[j+1];
if (page >= startPage-1 && page < endPage) {
return true;
}
}
return false;
};
File srcFile = new File(srcFilePath);
PDDocument pdfDocument = PDDocument.load(srcFile);
PDDocument tmpDoc = new PDDocument();
// test if a page is within a range
// if not, append the page to a temp. doc.
for (int i = 0; i < pdfDocument.getNumberOfPages(); i++) {
if (pageInInterval.test(i, pageRanges)) {
continue;
}
tmpDoc.addPage(pdfDocument.getPage(i));
}
// save the temporary doc.
tmpDoc.save(new File(srcFilePath));
tmpDoc.close();
pdfDocument.close();
}
让我们通过在主要功能中调用 emovepages 来对其进行测试:
///// remove pages
app.removePages(resultFilePath, new Integer[] {1, 21, 376, 428});
它将覆盖输入(裁剪)文件。
如何拆分PDF
现在,我们将引入一个函数,该函数可以将PDF拆分为多个单独的PDF,每个结果文件包含指定数量的页面。该功能期望两个输入:源PDF文档的路径和每个拆分文件中所需的页面数。
- 提取源文件的名称
- 对于每个nrofpages
- 将它们附加到临时文件
- 使用源文件的名称 +索引保存临时文档
public void splitPDF(String srcFilePath, int nrOfPages) throws IOException {
// extract file's name
File srcFile = new File(srcFilePath);
String fileName = srcFile.getName();
int dotIndex = fileName.lastIndexOf('.');
String fileNameWithoutExtension = (dotIndex == -1) ? fileName : fileName.substring(0, dotIndex);
PDDocument pdfDocument = PDDocument.load(srcFile);
// extract every nrOfPages to a temporary document
// append an index to its name and save it
for (int i = 1; i < pdfDocument.getNumberOfPages(); i+=nrOfPages) {
Splitter splitter = new Splitter();
int fromPage = i;
int toPage = i+nrOfPages;
splitter.setStartPage(fromPage);
splitter.setEndPage(toPage);
splitter.setSplitAtPage(toPage - fromPage );
List<PDDocument> lst = splitter.split(pdfDocument);
PDDocument pdfDocPartial = lst.get(0);
File f = new File(fileNameWithoutExtension + "-" + i + ".pdf");
pdfDocPartial.save(f);
pdfDocPartial.close();
}
pdfDocument.close();
}
这是完整的 main()函数:
public static void main( String[] args ){
String srcFilePath = "/Users/user/pdfs/file.pdf";
PDFUtils app = new PDFUtils();
try {
///// crop pdf
float x = 18f;
float y = 20f;
float width = 140f;
float height = 210f;
String resultFilePath = app.cropPDFMM(x, y, width, height, srcFilePath);
///// remove pages
app.removePages(resultFilePath, new Integer[] {1, 21, 376, 428});
///// split pages
app.splitPDF(resultFilePath, 20);
System.out.println( "Done!" );
} catch (Exception e) {
System.out.println(e);
}
}
合并每张纸2页(2UP)
- 创建一个临时文档
- 迭代原始文档的页面。 - >获取左右页
- 创建一个具有正确维度的新“输出”页面
- 附加左页(0,0)
- 附加右页,翻译为(左页的宽度,0)
- 保存温度。 doc->覆盖源文件
public void mergePages(String srcFilePath) throws IOException {
// SOURCE: https://stackoverflow.com/questions/12093408/pdfbox-merge-2-portrait-pages-onto-a-single-side-by-side-landscape-page
File srcFile = new File(srcFilePath);
PDDocument pdfDocument = PDDocument.load(srcFile);
PDDocument outPdf = new PDDocument();
for (int i = 0; i < pdfDocument.getNumberOfPages(); i+=2) {
PDPage page1 = pdfDocument.getPage(i);
PDPage page2 = pdfDocument.getPage(i+1);
PDRectangle pdf1Frame = page1.getCropBox();
PDRectangle pdf2Frame = page2.getCropBox();
PDRectangle outPdfFrame = new PDRectangle(pdf1Frame.getWidth()+pdf2Frame.getWidth(), Math.max(pdf1Frame.getHeight(), pdf2Frame.getHeight()));
// Create output page with calculated frame and add it to the document
COSDictionary dict = new COSDictionary();
dict.setItem(COSName.TYPE, COSName.PAGE);
dict.setItem(COSName.MEDIA_BOX, outPdfFrame);
dict.setItem(COSName.CROP_BOX, outPdfFrame);
dict.setItem(COSName.ART_BOX, outPdfFrame);
PDPage newP = new PDPage(dict);
outPdf.addPage(newP);
// Source PDF pages has to be imported as form XObjects to be able to insert them at a specific point in the output page
LayerUtility layerUtility = new LayerUtility(outPdf);
PDFormXObject formPdf1 = layerUtility.importPageAsForm(pdfDocument, page1);
PDFormXObject formPdf2 = layerUtility.importPageAsForm(pdfDocument, page2);
AffineTransform afLeft = new AffineTransform();
layerUtility.appendFormAsLayer(newP, formPdf1, afLeft, "left" + i);
AffineTransform afRight = AffineTransform.getTranslateInstance(pdf1Frame.getWidth(), 0.0);
layerUtility.appendFormAsLayer(newP, formPdf2, afRight, "right" + i);
}
outPdf.save(srcFile);
outPdf.close();
pdfDocument.close();
}
更新 main()对其进行测试:
...
///// 2 pages per sheet
app.mergePages(resultFilePath);
...
这是导入的完整列表:
package com.pdf.pdf_utils;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.multipdf.LayerUtility;
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import java.awt.geom.AffineTransform;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.function.BiPredicate;
import java.util.function.Function;
源代码可用here。