用Python管理Amazon S3对象-DEV365 开发者社区

亚马逊简单存储服务（S3）是一种流行的云存储服务，可为各种数据提供可扩展且安全的对象存储。管理S3对象可能是一项艰巨的任务，尤其是在处理大型数据集时。在此博客文章中，我们将探讨如何使用Python和Boto3库来管理S3对象并列出特定日期和时间之后创建的对象。

要开始，我们需要安装BOTO3库，Boto3库是Python的Amazon Web Services（AWS）SDK。安装后，我们可以创建一个S3客户端，并设置要使用的区域名称和存储桶名。我们还将使用PYTZ图书馆将时区设置为澳大利亚墨尔本。

接下来，我们将设置列出S3对象的开始日期。在此示例中，我们将将开始日期设置为墨尔本时间的12:00 AM 2023年3月5日。我们将使用DateTime和Tzinfo模块来定义开始日期和时区。

然后，我们将使用Paginator在S3存储桶中的所有对象上迭代，并过滤指定的开始日期后创建的对象。 Paginator将帮助我们处理大型数据集并确保我们不会超过任何API率限制。

我们将使用astimezone（）方法将每个对象的UTC最终变换时间转换为墨尔本时区，并将其格式化为包括时区名称和偏移量的字符串。我们还将将每个对象的大小从字节转换为兆字节，然后将过滤的对象存储在列表中。

最后，我们将使用pprint和open（）方法打印并写入名为“ s3_objects.txt”文件的对象列表。输出文件将包括对象密钥，最后修改的时间和大小在Megabytes中。

这是完整的python脚本：

import boto3
import datetime
import pprint
import json
import pytz

# Create an S3 client
s3 = boto3.client('s3', region_name='ap-southeast-2')

# Set the S3 bucket name
bucket_name = 'myuats3bucket'
tz = pytz.timezone('Australia/Melbourne')
# Set the start date for listing objects
start_date = datetime.datetime(2023, 3, 5, tzinfo=tz)
start_after = start_date.strftime('%Y-%m-%d %H:%M:%S')
print (start_after)

# Set the prefix for filtering objects
prefix = ''

# Initialize the list of objects
object_list = []

# Use a paginator to iterate over all objects in the bucket
paginator = s3.get_paginator('list_objects_v2')
try:
    for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix, StartAfter=start_after):
    # If there are no more objects, stop iterating
        if 'Contents' not in page:
            break

    # Iterate over each object in the current page of results
    for obj in page['Contents']:
        # Add the object to the list if it was created after the start date
        if obj['LastModified'] >= start_date:
            # Convert UTC LastModified time to Melbourne timezone
            melbourne_time = obj['LastModified'].astimezone(tz)
            obj['LastModified'] = melbourne_time.strftime('%Y-%m-%d %H:%M:%S %Z%z')
            # Convert size to MB
            obj['Size'] = round(obj['Size'] / (1024 * 1024), 2)
            object_list.append(obj)
except Exception as e:
    print("Error:", e)

# Print the list of objects
pprint.pprint(object_list)

# Open the file for writing
with open('s3_objects.txt', 'w') as f:
    # Write the formatted output to the file
    for obj in object_list:
        f.write(f"Key: {obj['Key']}, Last Modified: {obj['LastModified']}, Size: {obj['Size']} MB\n")

总而言之，使用Python和Boto3库来管理Amazon S3对象是一种自动化和简化数据管理工作流程的有力方法。具有基于特定标准过滤和格式化S3对象的能力，您可以节省时间并减少数据处理管道中的错误。无论您是处理小型数据集还是大型数据集，此方法都可以帮助您更有效地管理S3对象。

感谢您阅读此博客文章，我们希望您发现它有帮助。如果您有任何疑问或反馈，请随时在下面发表评论。