I have a webapp served with apache2 running python-flask in the backend. The app heavily relies on the S3 Object Storage. I'm using boto3 to interact with the S3 storage. My issue is regarding the generate_presigned_url method when used in production. It returns the following structure:

 'url': 'https://eu-central-1.linodeobjects.com/my-s3-bucket', 
 'fields': {
   'ACL': 'private', 
   'key': 'foo.bar', 
   'AWSAccessKeyId': 'FOOBAR', 
   'policy': 'base64longhash...', 
   'signature': 'foobar'

Everytime I use this method on the same python session the policy key returns a longer value (about 1.5x increase in length for every subsequent request). After a few requests the size of the policy gets really large (tens of MB) and the app breaks. If I restart the python service the policy size gets reset.

After digging in the boto3 documentation and some threads in GitHub and here I couldn't find anything that helped me in regards to resetting the S3 connection without having to restart the whole python session. To keep restarting the apache2 service periodically is not a good approach, so my solution was to call the generate_presigned_url from a standalone script using subprocess and parse the string output back to json before using it, which is not ideal, as I wish I didn't have to keep calling bash scripts from inside apache. The main functions I use follow bellow:

AWS_BUCKET_PARAMS = {'ACL': 'private'}

# connect to my linode's s3 bucket
def awsSign():
    return boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, endpoint_url=AWS_ENDPOINT_URL)

# generate presigned post object for uploading files
def awsPostForm(file_path):
    s3 = awsSign()
    return s3.generate_presigned_post(AWS_BUCKET, file_path, AWS_BUCKET_PARAMS, [AWS_BUCKET_PARAMS], 1800)

# generate post object from external script
def awsPostFormTerminal(file_path):
    from subprocess import Popen, PIPE
    cmd = [ 'python3', '-c', f'from utils import awsPostForm; print(awsPostForm("{file_path}"))' ]
    output = Popen( cmd, stdout=PIPE ).communicate()[0]
    return json.loads(output.decode('utf-8').replace('\n', '').replace("'", '"'))

The problem happens regardless of calling awsSign() one or many times for a list of files.

In short, I wish for a better way of retrieving subsequent post forms from generate_presigned_url in the same python session, without increasing the policy on every new request. If there is a proper way to restart the boto3 connection, provide some parameters that I missed when setting the API calls or maybe it's something particular to the Linode's S3 object storage service.

I also posted the same question on stackoverflow.

If anyone can point me at the right direction I'll appreciate!

The policy is base64-encoded. I believe this is a JSON document.

Have you tried decoding the base64 policy after each request to identify what is different about it each time?

Is it possible one of the parameters you’re feeding in is somehow including details from a previous request?

Decoding the policy back did the job! Turns out the AWS_BUCKET_PARAMS variable was altered by reference after passing through generate_presigned_post. This way the requests were sending all returned data from the previous request as well. Copying the variable inside the function scope before sending the request did the job, now there are no duplications and the returned object's size is stable. Thanks!

@SilvanaNobre brilliant!

Thanks for letting us know, your solution may help someone else in the same situation!


