Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiter ObjectExists failed: Max attempts exceeded #299

Closed
anand086 opened this issue Jun 25, 2020 · 7 comments
Closed

Waiter ObjectExists failed: Max attempts exceeded #299

anand086 opened this issue Jun 25, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request minor release Will be addressed in the next minor release question Further information is requested
Milestone

Comments

@anand086
Copy link

Hi,

Thank you for this package.

I am using awswrangler in Lambda to run a sql, get the query result as pandas dataframe, write it to S3 as parquet and alter the table location. The lambda function shows execution result as failed, even though in glue catalog the location is updated. I have attached the lambda script.

lambda_sql.py.zip

Error --

Function Logs:
START RequestId: b54c0c57-f5f4-4b5d-86d0-f9ba6d7a4c45 Version: $LATEST
[ERROR] WaiterError: Waiter ObjectExists failed: Max attempts exceeded
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 89, in lambda_handler
    update_location =  execute_sql(loc_update_sql, database, ctas=False)
  File "/var/task/lambda_function.py", line 18, in execute_sql
    return wr.athena.read_sql_query(sql,database=database, ctas_approach=ctas)
  File "/opt/python/awswrangler/athena.py", line 529, in read_sql_query
    return _resolve_query_without_cache(
  File "/opt/python/awswrangler/athena.py", line 650, in _resolve_query_without_cache
    s3.wait_objects_exist(paths=[path], use_threads=False, boto3_session=session)
  File "/opt/python/awswrangler/s3/_wait.py", line 100, in wait_objects_exist
    return _wait_objects(
  File "/opt/python/awswrangler/s3/_wait.py", line 33, in _wait_objects
    waiter.wait(Bucket=bucket, Key=key, WaiterConfig={"Delay": _delay, "MaxAttempts": max_attempts})
  File "/var/runtime/botocore/waiter.py", line 53, in wait
    Waiter.wait(self, **kwargs)
  File "/var/runtime/botocore/waiter.py", line 326, in wait
    raise WaiterError(END RequestId: b54c0c57-f5f4-4b5d-86d0-f9ba6d7a4c45
REPORT RequestId: b54c0c57-f5f4-4b5d-86d0-f9ba6d7a4c45	Duration: 164794.78 ms	Billed Duration: 164800 ms	Memory Size: 512 MB	Max Memory Used: 205 MB	Init Duration: 1531.36 ms	
@anand086 anand086 added the question Further information is requested label Jun 25, 2020
@igorborgest igorborgest self-assigned this Jun 25, 2020
@igorborgest
Copy link
Contributor

Hi @anand086!

Some questions:

  • Is this Lambda running inside a VPC? Or is there no VPC involved?
  • Are you running this query on same specific workgroup?
  • Could please paste the related code snippet in this thread? (I'm not allow to download your file)

Thanks

@anand086
Copy link
Author

Hi @igorborgest

Is this Lambda running inside a VPC? Or is there no VPC involved? : No VPC is involved.
Are you running this query on same specific workgroup?: Yes, the primary workgroup

I am trying to execute "alter table set location" command using the "athena.read_sql_query" , which doesn't seem to be the right way of modifying the location. I couldn't find any API in Glue catalog, like something similar to awswrangler.catalog.add_parquet_partitions to set the location.

import os
import json
import time
import logging
import urllib.parse
import pandas as pd
import awswrangler as wr
from datetime import datetime


logger = logging.getLogger(__name__)


def execute_sql(sql, database, ctas=False):

    return wr.athena.read_sql_query(sql,database=database, ctas_approach=ctas)

def get_table_location(database, table):

    location = wr.catalog.get_table_location(database=database, table=table)
    if location.endswith("/"):
        location = location[:-1]
    
    return "/".join(location.split("/")[:-1])

def write_to_s3(df, path):

    return wr.s3.to_parquet(
        df=df,
        path=path, 
        dataset=True,
        compression='snappy',)

def lambda_handler(event, context):
    
    database = os.environ["DATABASE"]
    table = os.environ["TABLE_NAME"]
    sql = os.environ["SQL"]
    
    logger.debug(sql)
    
    #get the table current location
    table_location = get_table_location(database=database, table=table)

    #get the sql result
    df = execute_sql(sql=sql, database=database, ctas=False)

    #define the S3 location 
    S3_PARQUET = table_location+"/dt={}".format(datetime.utcnow().strftime("%Y-%m-%d-%H-%M"))
    logger.info("Final S3 Destination: " + S3_PARQUET)

    #write the dataframe to S3
    write_df = write_to_s3(df=df, path=S3_PARQUET)
    logger.info("Parquet file created: ", write_df['paths'])

    #update the table location
    loc_update_sql = "ALTER TABLE {}.{} SET LOCATION '{}'".format(database, table, S3_PARQUET)
    update_location =  execute_sql(loc_update_sql, database, ctas=False)
    logger.info(update_location)

@igorborgest
Copy link
Contributor

igorborgest commented Jun 25, 2020

The problem here is because wr.athena.read_sql_query() was designed to run your query and then READ the result as Pandas DataFrame.

In your case you are running a query that has no output. For this situation I recommend this approach:

import awswrangler as wr

query_id = wr.athena.start_query_execution("ALTER TABLE ...", database="...")

# Optional, only if you want to wait until the query is done.
wr.athena.wait_query(query_id)

Ref: start_query_execution | wait_query

I will tag this issue as an enhancement to improve this exception and also to create some "alter location" function in the catalog module.

@anand086 Please, let me know how it goes

@igorborgest igorborgest added the enhancement New feature or request label Jun 25, 2020
@igorborgest igorborgest added this to the 1.7.0 milestone Jun 25, 2020
@anand086
Copy link
Author

@igorborgest

Thank you, it works well. Appreciate your help.

Also, just wanted to bring up this small documentation error on https://tinyurl.com/y83cv9bm. The input parameter is table, but example has "name"

wr.catalog.get_table_location(database='default', name='my_table')

@igorborgest
Copy link
Contributor

Enhancement done. Release expected for version 1.7.0 next week.

@igorborgest
Copy link
Contributor

@anand086, feel free to test our dev branch before the official release:

pip install git+https://github.com/awslabs/aws-data-wrangler.git@dev

@igorborgest igorborgest added minor release Will be addressed in the next minor release ready to release labels Jul 16, 2020
@igorborgest
Copy link
Contributor

Released in 1.7.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor release Will be addressed in the next minor release question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants