Waiter ObjectExists failed: Max attempts exceeded #299

anand086 · 2020-06-25T05:51:05Z

Hi,

Thank you for this package.

I am using awswrangler in Lambda to run a sql, get the query result as pandas dataframe, write it to S3 as parquet and alter the table location. The lambda function shows execution result as failed, even though in glue catalog the location is updated. I have attached the lambda script.

lambda_sql.py.zip

Error --

Function Logs:
START RequestId: b54c0c57-f5f4-4b5d-86d0-f9ba6d7a4c45 Version: $LATEST
[ERROR] WaiterError: Waiter ObjectExists failed: Max attempts exceeded
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 89, in lambda_handler
    update_location =  execute_sql(loc_update_sql, database, ctas=False)
  File "/var/task/lambda_function.py", line 18, in execute_sql
    return wr.athena.read_sql_query(sql,database=database, ctas_approach=ctas)
  File "/opt/python/awswrangler/athena.py", line 529, in read_sql_query
    return _resolve_query_without_cache(
  File "/opt/python/awswrangler/athena.py", line 650, in _resolve_query_without_cache
    s3.wait_objects_exist(paths=[path], use_threads=False, boto3_session=session)
  File "/opt/python/awswrangler/s3/_wait.py", line 100, in wait_objects_exist
    return _wait_objects(
  File "/opt/python/awswrangler/s3/_wait.py", line 33, in _wait_objects
    waiter.wait(Bucket=bucket, Key=key, WaiterConfig={"Delay": _delay, "MaxAttempts": max_attempts})
  File "/var/runtime/botocore/waiter.py", line 53, in wait
    Waiter.wait(self, **kwargs)
  File "/var/runtime/botocore/waiter.py", line 326, in wait
    raise WaiterError(END RequestId: b54c0c57-f5f4-4b5d-86d0-f9ba6d7a4c45
REPORT RequestId: b54c0c57-f5f4-4b5d-86d0-f9ba6d7a4c45	Duration: 164794.78 ms	Billed Duration: 164800 ms	Memory Size: 512 MB	Max Memory Used: 205 MB	Init Duration: 1531.36 ms

The text was updated successfully, but these errors were encountered:

igorborgest · 2020-06-25T11:21:37Z

Hi @anand086!

Some questions:

Is this Lambda running inside a VPC? Or is there no VPC involved?
Are you running this query on same specific workgroup?
Could please paste the related code snippet in this thread? (I'm not allow to download your file)

Thanks

anand086 · 2020-06-25T16:28:40Z

Hi @igorborgest

Is this Lambda running inside a VPC? Or is there no VPC involved? : No VPC is involved.
Are you running this query on same specific workgroup?: Yes, the primary workgroup

I am trying to execute "alter table set location" command using the "athena.read_sql_query" , which doesn't seem to be the right way of modifying the location. I couldn't find any API in Glue catalog, like something similar to awswrangler.catalog.add_parquet_partitions to set the location.

import os
import json
import time
import logging
import urllib.parse
import pandas as pd
import awswrangler as wr
from datetime import datetime


logger = logging.getLogger(__name__)


def execute_sql(sql, database, ctas=False):

    return wr.athena.read_sql_query(sql,database=database, ctas_approach=ctas)

def get_table_location(database, table):

    location = wr.catalog.get_table_location(database=database, table=table)
    if location.endswith("/"):
        location = location[:-1]
    
    return "/".join(location.split("/")[:-1])

def write_to_s3(df, path):

    return wr.s3.to_parquet(
        df=df,
        path=path, 
        dataset=True,
        compression='snappy',)

def lambda_handler(event, context):
    
    database = os.environ["DATABASE"]
    table = os.environ["TABLE_NAME"]
    sql = os.environ["SQL"]
    
    logger.debug(sql)
    
    #get the table current location
    table_location = get_table_location(database=database, table=table)

    #get the sql result
    df = execute_sql(sql=sql, database=database, ctas=False)

    #define the S3 location 
    S3_PARQUET = table_location+"/dt={}".format(datetime.utcnow().strftime("%Y-%m-%d-%H-%M"))
    logger.info("Final S3 Destination: " + S3_PARQUET)

    #write the dataframe to S3
    write_df = write_to_s3(df=df, path=S3_PARQUET)
    logger.info("Parquet file created: ", write_df['paths'])

    #update the table location
    loc_update_sql = "ALTER TABLE {}.{} SET LOCATION '{}'".format(database, table, S3_PARQUET)
    update_location =  execute_sql(loc_update_sql, database, ctas=False)
    logger.info(update_location)

igorborgest · 2020-06-25T17:44:36Z

The problem here is because wr.athena.read_sql_query() was designed to run your query and then READ the result as Pandas DataFrame.

In your case you are running a query that has no output. For this situation I recommend this approach:

import awswrangler as wr

query_id = wr.athena.start_query_execution("ALTER TABLE ...", database="...")

# Optional, only if you want to wait until the query is done.
wr.athena.wait_query(query_id)

Ref: start_query_execution | wait_query

I will tag this issue as an enhancement to improve this exception and also to create some "alter location" function in the catalog module.

@anand086 Please, let me know how it goes

anand086 · 2020-06-25T19:17:10Z

@igorborgest

Thank you, it works well. Appreciate your help.

Also, just wanted to bring up this small documentation error on https://tinyurl.com/y83cv9bm. The input parameter is table, but example has "name"

wr.catalog.get_table_location(database='default', name='my_table')

igorborgest · 2020-07-16T20:28:20Z

Enhancement done. Release expected for version 1.7.0 next week.

igorborgest · 2020-07-16T20:48:20Z

@anand086, feel free to test our dev branch before the official release:

pip install git+https://github.com/awslabs/aws-data-wrangler.git@dev

igorborgest · 2020-07-30T14:25:47Z

Released in 1.7.0!

anand086 added the question Further information is requested label Jun 25, 2020

igorborgest self-assigned this Jun 25, 2020

igorborgest added the enhancement New feature or request label Jun 25, 2020

igorborgest added this to the 1.7.0 milestone Jun 25, 2020

igorborgest added a commit that referenced this issue Jun 25, 2020

Fix documentation example #299

bbeadad

igorborgest added a commit that referenced this issue Jun 26, 2020

Fix documentation example #299

ebde495

igorborgest added a commit that referenced this issue Jul 16, 2020

Add coverage for athena.read_sql_* w/o results. #299

820693d

igorborgest added minor release Will be addressed in the next minor release ready to release labels Jul 16, 2020

igorborgest added a commit that referenced this issue Jul 30, 2020

Add coverage for athena.read_sql_* w/o results. #299

f4bd185

igorborgest removed the ready to release label Jul 30, 2020

igorborgest closed this as completed Jul 30, 2020

igorborgest mentioned this issue Sep 21, 2020

read_sql_query with ctas_approach=False requires create/delete table permissions on Glue #407

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waiter ObjectExists failed: Max attempts exceeded #299

Waiter ObjectExists failed: Max attempts exceeded #299

anand086 commented Jun 25, 2020

igorborgest commented Jun 25, 2020

anand086 commented Jun 25, 2020

igorborgest commented Jun 25, 2020 •

edited

Loading

anand086 commented Jun 25, 2020

igorborgest commented Jul 16, 2020

igorborgest commented Jul 16, 2020

igorborgest commented Jul 30, 2020

Waiter ObjectExists failed: Max attempts exceeded #299

Waiter ObjectExists failed: Max attempts exceeded #299

Comments

anand086 commented Jun 25, 2020

igorborgest commented Jun 25, 2020

anand086 commented Jun 25, 2020

igorborgest commented Jun 25, 2020 • edited Loading

anand086 commented Jun 25, 2020

igorborgest commented Jul 16, 2020

igorborgest commented Jul 16, 2020

igorborgest commented Jul 30, 2020

igorborgest commented Jun 25, 2020 •

edited

Loading