我最近的任务是使用Google Vision API从发票中提取文本。
我以前曾使用过Google Vision,但是主要是从一本书中的页面中,该书本仅是从上到下的直线和直线。但是,由于标准愿景的出色工作,我发现在阅读发票时(例如)(例如)在左侧购买的产品以及图像右侧的价格,当Vision返回原始响应时,它将沿右手侧“列”,然后将它们放在文本的底部,这将使响应无结构化,并且很难用纯文本阅读。
我面临的挑战是我的客户需要尽可能结构的数据以确保数据完整性。
我以为我会分享我的经验和完成这项任务的方法。
该过程的第一步是使用Google Vision读取图像。这需要将图像转换为base64编码的字符串,以便可以在请求中将其发送到API,然后我使用Guzlezle HTTP客户端将邮政请求发送到Google Vision API:
// Get image contents as base64
$image_base64 = base64_encode(file_get_contents("path/to/your/image.jpg"));
// Use Guzzle to get the OCR data
$client = new \GuzzleHttp\Client();
$yourApiKey = "YOUR API KEY HERE";
$response = $client->request('POST', "https://vision.googleapis.com/v1/images:annotate?key={$yourApiKey}", [
'json' => [
'requests' => [
[
'image' => [
'content' => $image_base64
],
'features' => [
[
'type' => 'TEXT_DETECTION',
'maxResults' => 1,
]
]
]
]
],
]);
// Get the response JSON from Vision
$response = json_decode( (string) $response->getBody(), 1 );
我收到API的响应后,我将JSON解码并从数据中提取文本注释。
接下来,我需要从Textannotations数组中提取整个收据的顶点,并计算收据文本的“中心”。我通过使用Array_Reduce通过$ Centervertices数组进行迭代并总和x和y值来完成此操作:
// Get textAnnotations
$textAnnotations = $response['responses'][0]['textAnnotations'];
// Get verticies for the whole receipt
$centervertices = $textAnnotations[0]['boundingPoly']['vertices'];
// Calculate the "center" of the receipt text
$centerA = [
"x" => array_reduce($centervertices, function($carry, $item) {
return $carry + $item['x'];
})/count($centervertices),
"y" => array_reduce($centervertices, function($carry, $item) {
return $carry + $item['y'];
})/count($centervertices)
];
然后,我需要从Textannotations数组中提取最长字符串的顶点,计算最长字符串的“中心”,并计算收据正在运行的角度。我使用array_reduce函数提取最长的字符串和atan2和pi函数来计算角度。
// Get vertices for the longest string
$vertices = array_reduce(array_slice($textAnnotations,1), function($carry, $item) {
return strlen($item['description']) > strlen($carry['description']) ? $item : $carry;
})['boundingPoly']['vertices'];
// Calculate the "center" of the longest string
$centerB = [
"x" => array_reduce($vertices, function($carry, $item) {
return $carry + $item['x'];
})/count($vertices),
"y" => array_reduce($vertices, function($carry, $item) {
return $carry + $item['y'];
})/count($vertices)
];
// Calculate the angle the receipt is running
$xDiff = $vertices[0]["x"] - $centerB["x"];
$yDiff = $vertices[0]["y"] - $centerB["y"];
$angle = (atan2($yDiff, $xDiff) * 180 / pi()) + 180;
$angle_to_rotate = -(pi() * ($angle-5) / 180);
最后一步是通过在固定旋转后重新映射和排序行来重新分配文本线。我使用array_map函数来更新顶点的坐标,并向每一行添加新的键'linesert'。然后,我使用使用用户功能根据“ LinesErt”键对数组进行排序。我对添加新键“ columnsort”的单词做了同样的操作,然后使用用户将数组分类,然后使用空间爆炸:
// Remap/sort the rows after fixing rotation
$textAnnotationsSorted = array_map(function($row) use ($angle,$angle_to_rotate,$centerA) {
$vertices = $row['boundingPoly']['vertices'];
$new_vertices = array();
foreach ($vertices as $vertex) {
$x = $vertex["x"] - $centerA["x"];
$y = $vertex["y"] - $centerA["y"];
$new_x = ($x * cos($angle_to_rotate)) - ($y * sin($angle_to_rotate)) + $centerA["x"];
$new_y = ($x * sin($angle_to_rotate)) + ($y * cos($angle_to_rotate)) + $centerA["y"];
$new_vertices[] = ["x" => $new_x, "y" => $new_y];
}
$row['boundingPoly']['vertices'] = $new_vertices;
$row['lineSort'] = $row['boundingPoly']['vertices'][0]['y'];
return $row;
}, array_slice($textAnnotations,1));
usort($textAnnotationsSorted, function($a, $b) {
return $a['lineSort'] - $b['lineSort'];
});
$textAnnotationsSorted = array_values($textAnnotationsSorted);
// Setup faux rows
$newRows = [];
$index = 0;
// Setup base Y vertical
$curY = $textAnnotationsSorted[0]['boundingPoly']['vertices'][0]['y'];
// Loop the sorted rows and append faux rows
foreach( array_values($textAnnotationsSorted) as $v ) {
if( $v['boundingPoly']['vertices'][0]['y'] > $curY + 15) {
$index++;
}
if( $v['boundingPoly']['vertices'][0]['y'] < $curY - 15 ) {
$index--;
}
$newRows[$index][] = $v;
$curY = $v['boundingPoly']['vertices'][0]['y'];
}
// Loop faux rows and sort columns
foreach( $newRows as &$row ) {
$row = array_map(function($v){
$v['columnSort'] = $v['boundingPoly']['vertices'][0]['x'];
return $v;
}, $row);
usort($row, function($a, $b) {
return $a['columnSort'] - $b['columnSort'];
});
$row = implode(" ", array_map(function($item) {
return $item['description'];
}, $row));
}
然后我能够回应重新格式的OCR数据:
// Echo the newly orders OCR data
echo implode("\n", $newRows);
我希望本教程可以帮助其他人了解我返回更人性化的OCR响应的过程。