使用Google Vision以更结构化的格式阅读发票收据
#php #google #ocr #vision

我最近的任务是使用Google Vision API从发票中提取文本。

我以前曾使用过Google Vision,但是主要是从一本书中的页面中,该书本仅是从上到下的直线和直线。但是,由于标准愿景的出色工作,我发现在阅读发票时(例如)(例如)在左侧购买的产品以及图像右侧的价格,当Vision返回原始响应时,它将沿右手侧“列”,然后将它们放在文本的底部,这将使响应无结构化,并且很难用纯文本阅读。

我面临的挑战是我的客户需要尽可能结构的数据以确保数据完整性。

我以为我会分享我的经验和完成这项任务的方法。

该过程的第一步是使用Google Vision读取图像。这需要将图像转换为base64编码的字符串,以便可以在请求中将其发送到API,然后我使用Guzlezle HTTP客户端将邮政请求发送到Google Vision API:

// Get image contents as base64
$image_base64 = base64_encode(file_get_contents("path/to/your/image.jpg"));

// Use Guzzle to get the OCR data
$client = new \GuzzleHttp\Client();
$yourApiKey = "YOUR API KEY HERE";
$response = $client->request('POST', "https://vision.googleapis.com/v1/images:annotate?key={$yourApiKey}", [
    'json' => [
        'requests' => [
            [
                'image' => [
                    'content' => $image_base64
                ],
                'features' => [
                    [
                        'type' => 'TEXT_DETECTION',
                        'maxResults' => 1,
                    ]
                ]
            ]
        ]
    ],
]);

// Get the response JSON from Vision
$response = json_decode( (string) $response->getBody(), 1 );

我收到API的响应后,我将JSON解码并从数据中提取文本注释。

接下来,我需要从Textannotations数组中提取整个收据的顶点,并计算收据文本的“中心”。我通过使用Array_Reduce通过$ Centervertices数组进行迭代并总和x和y值来完成此操作:

// Get textAnnotations
$textAnnotations = $response['responses'][0]['textAnnotations'];

// Get verticies for the whole receipt
$centervertices = $textAnnotations[0]['boundingPoly']['vertices'];

// Calculate the "center" of the receipt text
$centerA = [
    "x" => array_reduce($centervertices, function($carry, $item) {
        return $carry + $item['x'];
    })/count($centervertices),
    "y" => array_reduce($centervertices, function($carry, $item) {
        return $carry + $item['y'];
    })/count($centervertices)
];

然后,我需要从Textannotations数组中提取最长字符串的顶点,计算最长字符串的“中心”,并计算收据正在运行的角度。我使用array_reduce函数提取最长的字符串和atan2和pi函数来计算角度。

// Get vertices for the longest string
$vertices = array_reduce(array_slice($textAnnotations,1), function($carry, $item) {
    return strlen($item['description']) > strlen($carry['description']) ? $item : $carry;
})['boundingPoly']['vertices'];

// Calculate the "center" of the longest string
$centerB = [
    "x" => array_reduce($vertices, function($carry, $item) {
        return $carry + $item['x'];
    })/count($vertices),
    "y" => array_reduce($vertices, function($carry, $item) {
        return $carry + $item['y'];
    })/count($vertices)
];

// Calculate the angle the receipt is running
$xDiff = $vertices[0]["x"] - $centerB["x"];
$yDiff = $vertices[0]["y"] - $centerB["y"];
$angle = (atan2($yDiff, $xDiff) * 180 / pi()) + 180;
$angle_to_rotate = -(pi() * ($angle-5) / 180);

最后一步是通过在固定旋转后重新映射和排序行来重新分配文本线。我使用array_map函数来更新顶点的坐标,并向每一行添加新的键'linesert'。然后,我使用使用用户功能根据“ LinesErt”键对数组进行排序。我对添加新键“ columnsort”的单词做了同样的操作,然后使用用户将数组分类,然后使用空间爆炸:

// Remap/sort the rows after fixing rotation
$textAnnotationsSorted = array_map(function($row) use ($angle,$angle_to_rotate,$centerA) {
    $vertices = $row['boundingPoly']['vertices'];
    $new_vertices = array();
    foreach ($vertices as $vertex) {
        $x = $vertex["x"] - $centerA["x"];
        $y = $vertex["y"] - $centerA["y"];
        $new_x = ($x * cos($angle_to_rotate)) - ($y * sin($angle_to_rotate)) + $centerA["x"];
        $new_y = ($x * sin($angle_to_rotate)) + ($y * cos($angle_to_rotate)) + $centerA["y"];
        $new_vertices[] = ["x" => $new_x, "y" => $new_y];
    }
    $row['boundingPoly']['vertices'] = $new_vertices;
    $row['lineSort'] = $row['boundingPoly']['vertices'][0]['y'];
    return $row;
}, array_slice($textAnnotations,1));
usort($textAnnotationsSorted, function($a, $b) {
    return $a['lineSort'] - $b['lineSort'];
});
$textAnnotationsSorted = array_values($textAnnotationsSorted);

// Setup faux rows
$newRows = [];
$index = 0;

// Setup base Y vertical
$curY = $textAnnotationsSorted[0]['boundingPoly']['vertices'][0]['y'];

// Loop the sorted rows and append faux rows
foreach( array_values($textAnnotationsSorted) as $v ) {
    if( $v['boundingPoly']['vertices'][0]['y'] > $curY + 15) {
        $index++;
    }
    if( $v['boundingPoly']['vertices'][0]['y'] < $curY - 15 ) {
        $index--;
    }
    $newRows[$index][] = $v;
    $curY = $v['boundingPoly']['vertices'][0]['y'];
}

// Loop faux rows and sort columns
foreach( $newRows as &$row ) {
    $row = array_map(function($v){
        $v['columnSort'] = $v['boundingPoly']['vertices'][0]['x'];
        return $v;
    }, $row);
    usort($row, function($a, $b) {
        return $a['columnSort'] - $b['columnSort'];
    });
    $row = implode(" ", array_map(function($item) {
        return $item['description'];
    }, $row));
}

然后我能够回应重新格式的OCR数据:

// Echo the newly orders OCR data
echo implode("\n", $newRows);

我希望本教程可以帮助其他人了解我返回更人性化的OCR响应的过程。


如果您喜欢这篇文章,请提供一些支持!

Buy me a coffee