Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffix artifactId with amd64/arm64 for the dist jars [skip ci] #7070

Merged
merged 7 commits into from
Nov 21, 2022

Conversation

NvTimLiu
Copy link
Collaborator

@NvTimLiu NvTimLiu commented Nov 15, 2022

To fix #6881

Build on x86_64/amd64 hosts to aggregate rapids-4-spark dist jar with -Dcpu_arch=amd64 Suffix 'amd64' for dist artifact, e.g., rapids-4-spark-amd64_2.12

Build on arm64/aarch64 hosts to aggregate rapids-4-spark dist jar with -Dcpu_arch=arm64 Suffix 'arm64' for dist artifact, e.g., rapids-4-spark-arm64_2.12

Signed-off-by: Tim Liu timl@nvidia.com

NOTE:

This PR should be merged along w/ many internal CICD updates, otherwise it may break regular CICD pipelines.
So please allow devops guys to merge this one at the appropriate time, thanks!

Build on x86_64/amd64 hosts to aggregate rapids-4-spark dist jar with -Dcpu_arch=amd64
Suffix 'amd64' for dist artifact, e.g., rapids-4-spark-amd64_2.12

Build on arm64/aarch64 hosts to aggregate rapids-4-spark dist jar with -Dcpu_arch=arm64
Suffix 'arm64' for dist artifact, e.g., rapids-4-spark-arm64_2.12

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu NvTimLiu added the build Related to CI / CD or cleanly building label Nov 15, 2022
@NvTimLiu NvTimLiu self-assigned this Nov 15, 2022
@NvTimLiu
Copy link
Collaborator Author

build

@NvTimLiu
Copy link
Collaborator Author

The solution is still under discussing. Will update scala/doc files when the final solution is made

build/buildall Outdated Show resolved Hide resolved
build/buildall Outdated Show resolved Hide resolved
pom.xml Show resolved Hide resolved
@pxLi pxLi marked this pull request as draft November 16, 2022 01:37
@pxLi pxLi marked this pull request as ready for review November 16, 2022 03:07
@pxLi
Copy link
Collaborator

pxLi commented Nov 16, 2022

If we would like to have UCX on arm, the build must support to correctly switch arch-based jucx dependency

also cc @abellina for help

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator Author

build

@@ -24,7 +24,7 @@
<artifactId>rapids-4-spark-parent</artifactId>
<version>22.12.0-SNAPSHOT</version>
</parent>
<artifactId>rapids-4-spark_2.12</artifactId>
<artifactId>rapids-4-spark-${cpu_arch}_2.12</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on an earlier discussion, I thought it was not allowed to have a variable in the artifactId? Seems like we want to split dist into dist-amd64 and dist-arm64, and maybe they rely on a dist-common that builds an intermediate jar using the CPU architecture in the classifier, similar to how we use the Spark version in the classifier of our sql-plugin intermediate jars.

Copy link
Collaborator

@pxLi pxLi Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was not allowed to have a variable in the artifactId?

maven does not allow to put a var in project root module (rapids-4-spark-parent in our case),
but we could add it in submodule artifactID.

do you think directly use artifactID+{cpu.arch} might be easier than parallel world dist-common+classifier{cpu.arch}?

build/buildall Outdated
aarch64|arm64)
cpu_arch='arm64';;
*)
echo "Non-support cpu architecture: ${arch}"; exit 1;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, applies to all the places where it was copied. Not must-fix.

Suggested change
echo "Non-support cpu architecture: ${arch}"; exit 1;;
echo "Unsupported CPU architecture: ${arch}"; exit 1;;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Tim Liu <timl@nvidia.com>
Signed-off-by: Tim Liu <timl@nvidia.com>
@@ -166,7 +166,11 @@ pipeline {
steps {
script {
// Retrieve PROJECT_VER from version-def.sh, e.g, '<major>.<minor>.<patch>-SNAPSHOT'
PROJECT_VER = sh(returnStdout: true, script: "bash $JENKINS_ROOT/version-def.sh | cut -d ',' -f 3 | cut -d ' ' -f 3")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to retrieve PROJECT_VER

The previous way of parsing the bash output by | cut -d ',' -f 3 ... from version-def.sh can not work well, if we change echo string in version-def.sh

The previous output from version-def is:
CUDF_VER: 22.12.0-SNAPSHOT, CUDA_CLASSIFIER: cuda11, PROJECT_VER: 22.12.0-SNAPSHOT, SPARK_VER: 3.1.1 ...

But | cut -d ',' -f 3 ... does not work if we :

add echo $arcg ahead,
--> bash output : amd64 CUDF_VER: 22.12.0-SNAPSHOT, CUDA_CLASSIFIER: cuda11, PROJECT_VER: 22.12.0-SNAPSHOT, SPARK_VER: 3.1.1 ..
--> PROJECT_VER=amd64 22.12.0-SNAPSHOT

or if remove CUDF_VER: 22.12.0-SNAPSHOT, from previous output:
--> bash output : CUDA_CLASSIFIER: cuda11, PROJECT_VER: 22.12.0-SNAPSHOT, SPARK_VER: 3.1.1 ..
--> PROJECT_VER=3.1.1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert this file change too

Copy link
Collaborator Author

@NvTimLiu NvTimLiu Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will update jenkins/Jenkinsfile-blossom.premerge in follow up issue PR changing the | cut -d ',' -f 3 ...

@NvTimLiu
Copy link
Collaborator Author

build

@NvTimLiu
Copy link
Collaborator Author

build

@pxLi
Copy link
Collaborator

pxLi commented Nov 17, 2022

NOTE:

This PR should be merged along w/ many internal CICD updates, otherwise it may break regular CICD pipelines.

So please allow devops guys to merge this one at the appropriate time, thanks!

@NvTimLiu
Copy link
Collaborator Author

NvTimLiu commented Nov 17, 2022

CI IT against the new artifact name rapids-4-spark-amd64_2.12-22.12.0-SNAPSHOT-cuda11.jar, though it failed on some tests

[2022-11-17T06:02:03.861Z] [INFO] Building jar: /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-6095-ci-2/dist/target/rapids-4-spark-amd64_2.12-22.12.0-SNAPSHOT-cuda11.jar
...
[2022-11-17T06:05:32.164Z] AND PLUGIN JARS: /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-6095-ci-2/dist/target/rapids-4-spark-amd64_2.12-22.12.0-SNAPSHOT-cuda11.jar:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-6095-ci-2/integration_tests/target/rapids-4-spark-integration-tests_2.12-22.12.0-SNAPSHOT-spark311.jar:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-6095-ci-2/integration_tests/target/dependency/parquet-hadoop-1.10.1-tests.jar

...

[2022-11-17T06:05:50.671Z] 22/11/17 06:05:49 WARN Executor: Issue communicating with driver in heartbeater
[2022-11-17T06:05:50.672Z] org.apache.spark.SparkException: Exception thrown in awaitResult: 
...
[2022-11-17T06:05:50.672Z] Caused by: java.lang.NullPointerException
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:524)
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:116)
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)

...
[2022-11-17T06:05:50.672Z] Caused by: java.lang.NullPointerException
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:524)
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:116)
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
[2022-11-17T06:05:50.672Z] 	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)

@NvTimLiu
Copy link
Collaborator Author

build

@pxLi
Copy link
Collaborator

pxLi commented Nov 18, 2022

build

build/buildall Outdated
aarch64|arm64)
cpu_arch='arm64';;
*)
echo "Unsupport CPU architecture: ${arch}"; exit 1;;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported

I think Jason already commented

@pxLi
Copy link
Collaborator

pxLi commented Nov 18, 2022

Convert to draft to avoid accidental merge, thx

@pxLi pxLi marked this pull request as draft November 18, 2022 10:05
@pxLi
Copy link
Collaborator

pxLi commented Nov 21, 2022

also filed #7116 to tracking potential jucx update

Signed-off-by: Tim Liu <timl@nvidia.com>
@pxLi pxLi changed the title Suffix artifactId with amd64/arm64 for the dist jars Suffix artifactId with amd64/arm64 for the dist jars [skip ci] Nov 21, 2022
@pxLi
Copy link
Collaborator

pxLi commented Nov 21, 2022

skip ci for recent typo fix

@NvTimLiu
Copy link
Collaborator Author

build

1 similar comment
@pxLi
Copy link
Collaborator

pxLi commented Nov 21, 2022

build

Copy link
Collaborator

@pxLi pxLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this and focus on corresponding CICD update.

If any issue, please help file new follow-ups, thanks!

@pxLi pxLi marked this pull request as ready for review November 21, 2022 01:20
@pxLi pxLi merged commit c7f87f4 into NVIDIA:branch-22.12 Nov 21, 2022
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this pull request Nov 22, 2022
GaryShen2008 pushed a commit that referenced this pull request Nov 22, 2022
…]] (#7070)(#7120)" (#7135)

* Revert "Suffix artifactId with amd64/arm64 for the dist jars [skip ci] (#7070)"

This reverts commit c7f87f4.

* Revert "Build noSnapshots without cdh shims on arm CPU [skip ci] (#7120)"

This reverts commit b7a79d3.

Signed-off-by: Tim Liu <timl@nvidia.com>

Signed-off-by: Tim Liu <timl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support RAPIDS Spark plugin on ARM
4 participants