将PDF的数据提取到Excel中-DEV365 开发者社区

当您坐在办公室中时，许多带有名称和数字的PDF表格会交给您。下一个任务是收集所有数据并将其保存到Excel电子表格中。您可以决定将数据复制并粘贴到Excel，但这是一项艰巨的任务，可能需要几个小时才能复制数据。在这里，我想向您推荐Spire.PDF for java，您可以在几行代码中轻松地将PDF表单中的数据提取到Excel工作表中。

spire.pdf for Java是一个PDF API，它使Java应用程序无需使用Adobe Acrobat即可读取，写入，保存和打印PDF文档。使用此Java PDF组件，开发人员和程序员可以实现富裕功能，以创建从头开始或处理现有PDF文件的PDF文件。让我们向您展示如何从PDF文件中提取数据，然后将它们存储到以下方面的Excel工作表：

Convert PDF to Excel directly

Export table data from PDF to Excel

为Java安装Spire.pdf

首先，您需要将spire.pdf.jar文件添加为Java程序中的依赖项。可以从此链接下载The JAR file。如果使用Maven，则可以通过将以下代码添加到项目的pom.xml文件中轻松地导入JAR文件。

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>9.7.0</version>
    </dependency>
</dependencies>

将PDF转换为Excel

以下是将PDF文档转换为Excel的步骤：

初始化 pdfdocument的实例 class。
使用 pdfdocument.loadfromfile（String） method。
使用 pdfdocument.savetofile （字符串，fileformat）方法将文档保存到Excel。

import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;

public class PdftoExcel {
    public static void main(String[] args) throws Exception {

            //Initialize an instance of PdfDocument class
            PdfDocument pdf = new PdfDocument();
            //Load the PDF document
            pdf.loadFromFile("Sample.pdf");

            //Save the PDF document to XLSX
            pdf.saveToFile("PdfToExcel.xlsx", FileFormat.XLSX);
        }
    }

从PDF导出表数据到Excel

当您将整个PDF文件转换为Excel时，您可能会发现寄宿生消失了，并且获得了您不需要的其他数据。如果您想保留Excel上的所有样式，则只需从PDF页面中提取表中的日期，而 将它们导出为单个excel 工作表。

import com.spire.pdf.PdfDocument; import com.spire.pdf.utilities.PdfTable; import com.spire.pdf.utilities.PdfTableExtractor; import com.spire.xls.ExcelVersion; import com.spire.xls.Workbook; import com.spire.xls.Worksheet; public class ExtractTableDataAndSaveInExcel { public static void main(String[] args) throws Exception { //Load a sample PDF document PdfDocument pdf = new PdfDocument("Sample1.pdf"); //Create a PdfTableExtractor instance PdfTableExtractor extractor = new PdfTableExtractor(pdf); //Extract tables from the first page PdfTable[] pdfTables = extractor.extractTable(0); //Create a Workbook object, Workbook wb = new Workbook(); //Remove default worksheets wb.getWorksheets().clear(); //If any tables are found if (pdfTables != null && pdfTables.length > 0) { //Loop through the tables for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) { //Add a worksheet to workbook String sheetName = String.format("Table - %d", tableNum + 1); Worksheet sheet = wb.getWorksheets().add(sheetName); //Loop through the rows in the current table for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) { //Loop through the columns in the current table for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) { //Extract data from the current table cell String text = pdfTables[tableNum].getText(rowNum, colNum); //Insert data into a specific cell sheet.get(rowNum + 1, colNum + 1).setText(text); } } //Auto fit column width for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) { sheet.autoFitColumn(sheetColNum + 1); } } } //Save the workbook to an Excel file wb.saveToFile("ExportTableToExcel1.xlsx", ExcelVersion.Version2016); } }

结论

在本文中，我们演示了如何在PDF表中导出日期，然后使用Java将其存储至Excel。使用Java的Spire.pdf，我们还可以从PDF文件中提取所有文本和图像以进行不同的情况。您可以检查PDF forum的更多功能以操作PDF文件。

相关话题：

Java Extract Text from a PDF Document