我想分享使用node.js的Azure函数存储在Azure Blob Storage中的大型文件的经验。
我选择在Consumption tier(付费计划)上使用Azure功能,因为我们的解压缩过程每天只运行一次。它的优势 - 价格很低。
不幸的是,您无法使用默认的Azure Blob Storage trigger,因为根据我的经验,在并行触发许多上传文件时,它是不可靠的。有更好的方法来触发Azure功能,我最喜欢的是使用Azure Event Grid触发器。
可以使用node.js流读取,解压缩和写回大文件的完整函数代码,以确保可以找到低内存足迹。
我使用此功能解开〜150MB大小文件。
const { BlobServiceClient } = require("@azure/storage-blob");
const unzipper = require("unzipper");
const ONE_MEGABYTE = 1024 * 1024;
const uploadOptions = { bufferSize: 4 * ONE_MEGABYTE, maxBuffers: 20 };
const AZURE_STORAGE_CONNECTION_STRING = process.env.IMPORT_AZURE_STORAGE_CONNECTION;
// Create the BlobServiceClient object which will be used to create a container client
const blobService = BlobServiceClient.fromConnectionString(AZURE_STORAGE_CONNECTION_STRING);
module.exports = async function (context, zipName) {
context.log("Unzip: " + zipName);
const importContainer = blobService.getContainerClient("zip-import");
const processContainer = blobService.getContainerClient("zip-process");
// Stream zip
const blobClient = importContainer.getBlobClient(zipName);
const downloadBlockBlobResponse = await blobClient.download();
const zipStream = downloadBlockBlobResponse.readableStreamBody.pipe(unzipper.Parse({ forceStream: true }));
for await (const entry of zipStream) {
const blockBlobClient = processContainer.getBlockBlobClient(entry.path);
try {
await blockBlobClient.uploadStream(entry, uploadOptions.bufferSize, uploadOptions.maxBuffers);
context.log(`Uploaded ${entry.path} from unzipped ${zipName}`);
} catch (error) {
throw new Error(`Error while uploading unzipped file ${entry.path}: ${error}`);
}
}
};
您可以从上面的代码中看到,该函数从“ zip-import” blob存储中读取,使用unzipper的UNZIPS,一个流式跨平台UNZIP工具,并将未拉链文件写入“ zip-process” blob中存储。
开心的解压缩!