[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('') #3058

alexbarghi-nv · 2022-12-07T16:53:58Z

Version

23.02

Which installation method(s) does this occur on?

Docker, Conda, Pip, Source

Describe the bug.

Called add_vertex_data without specifying type. Called renumber_vertices_by_type. An IndexError was thrown.

IndexError: string index out of range

Minimum reproducible example

pG = MGPropertyGraph()
    pG.add_edge_data(
        dask_cudf.from_cudf(
            cudf.DataFrame(
                {
                    "src": cupy.array([0, 0, 1, 2, 2, 3], dtype="int32"),
                    "dst": cupy.array([1, 2, 4, 3, 4, 1], dtype="int32"),
                }
            ),
            npartitions=2,
        ),
        vertex_col_names=["src", "dst"],
    )

    pG.add_vertex_data(
        dask_cudf.from_cudf(
            cudf.DataFrame(
                {
                    "prop1": [100, 200, 300, 400, 500],
                    "prop2": [5, 4, 3, 2, 1],
                    "id": cupy.array([0, 1, 2, 3, 4], dtype="int32"),
                }
            ),
            npartitions=2,
        ),
        vertex_col_name="id"
    )

    pG.renumber_vertices_by_type()



### Relevant log output

```shell
/opt/conda/envs/rapids/lib/python3.9/site-packages/cugraph/dask/structure/mg_property_graph.py:1313: in renumber_vertices_by_type
    df = df.reset_index().sort_values(by=TCN)
/opt/conda/envs/rapids/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
/opt/conda/envs/rapids/lib/python3.9/site-packages/dask_cudf/core.py:225: in sort_values
    df = sorting.sort_values(
/opt/conda/envs/rapids/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
/opt/conda/envs/rapids/lib/python3.9/site-packages/dask_cudf/sorting.py:272: in sort_values
    divisions = quantile_divisions(df, by, npartitions)
/opt/conda/envs/rapids/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

df = <dask_cudf.DataFrame | 22 tasks | 2 npartitions>, by = ['_TYPE_'], npartitions = 2

    @_dask_cudf_nvtx_annotate
    def quantile_divisions(df, by, npartitions):
        qn = np.linspace(0.0, 1.0, npartitions + 1).tolist()
        divisions = _approximate_quantile(df[by], qn).compute()
        columns = divisions.columns
    
        # TODO: Make sure divisions are correct for all dtypes..
        if (
            len(columns) == 1
            and df[columns[0]].dtype != "object"
            and not is_categorical_dtype(df[columns[0]].dtype)
        ):
            dtype = df[columns[0]].dtype
            divisions = divisions[columns[0]].astype("int64")
            divisions.iloc[-1] += 1
            divisions = sorted(
                divisions.drop_duplicates().astype(dtype).to_arrow().tolist(),
                key=lambda x: (x is None, x),
            )
        else:
            for col in columns:
                dtype = df[col].dtype
                if dtype != "object":
                    divisions[col] = divisions[col].astype("int64")
                    divisions[col].iloc[-1] += 1
                    divisions[col] = divisions[col].astype(dtype)
                else:
                    divisions[col].iloc[-1] = chr(
>                       ord(divisions[col].iloc[-1][0]) + 1
                    )
E                   IndexError: string index out of range

/opt/conda/envs/rapids/lib/python3.9/site-packages/dask_cudf/sorting.py:222: IndexError



### Environment details

```shell
Standard environment

Other/Misc.

n/a

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

Fixes rapidsai#3058

See test for simple MRE. This fixes rapidsai/cugraph#3058 Authors: - Erik Welch (https://github.com/eriknw) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #12988

Fixes #3058 Authors: - Erik Welch (https://github.com/eriknw) - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) URL: #3352

alexbarghi-nv added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 7, 2022

alexbarghi-nv assigned eriknw and rlratzel Dec 7, 2022

alexbarghi-nv added Fix and removed bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 7, 2022

alexbarghi-nv added this to the 23.02 milestone Dec 7, 2022

rlratzel removed their assignment Jan 5, 2023

BradReesWork modified the milestones: 23.02, 23.04 Jan 23, 2023

kingmesal mentioned this issue Feb 8, 2023

Improve PropertyGraph scalability #3254

Closed

kingmesal added bug Something isn't working and removed Fix labels Feb 9, 2023

eriknw mentioned this issue Mar 21, 2023

Fix sort_values when column is all empty strings rapidsai/cudf#12988

Merged

3 tasks

eriknw added a commit to eriknw/cugraph that referenced this issue Mar 22, 2023

Fix PropertyGraph.renumber_*_by_type with only default types

2de2644

Fixes rapidsai#3058

eriknw mentioned this issue Mar 22, 2023

Fix PropertyGraph.renumber_*_by_type with only default types #3352

Merged

rapids-bot bot closed this as completed in rapidsai/cudf#12988 Mar 22, 2023

rlratzel mentioned this issue May 24, 2024

Initial non-experimental PropertyGraph Release rapidsai/cugraph-pg#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('') #3058

[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('') #3058

alexbarghi-nv commented Dec 7, 2022

[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('') #3058

[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('') #3058

Comments

alexbarghi-nv commented Dec 7, 2022

Version

Which installation method(s) does this occur on?

Describe the bug.

Minimum reproducible example

Other/Misc.

Code of Conduct