Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[conda] Unable to make a conda build #113

Closed
fg-mindee opened this issue Mar 7, 2021 · 56 comments · Fixed by #1414
Closed

[conda] Unable to make a conda build #113

fg-mindee opened this issue Mar 7, 2021 · 56 comments · Fixed by #1414
Assignees
Labels
topic: build Related to dependencies and build type: bug Something isn't working
Milestone

Comments

@fg-mindee
Copy link
Contributor

Unfortunately, one of the project dependencies does not have any conda release or any way to make one. I opened an issue on their repo pymupdf/PyMuPDF#938 to track this, but so far I haven't found any way to release the project on anaconda with this dependency.

@fg-mindee fg-mindee added type: bug Something isn't working topic: build Related to dependencies and build labels Mar 7, 2021
@fg-mindee fg-mindee added this to the 1.0.0 milestone Mar 7, 2021
@fg-mindee fg-mindee self-assigned this Mar 7, 2021
@charlesmindee
Copy link
Collaborator

Is it mandatory to support conda? If so, maybe we can switch to another pdf-reader lib.

@fg-mindee
Copy link
Contributor Author

Not mandatory but this is a very common installation mean for python package. We might investigate other options to replace the dependency but we'll have to check for performance drop first

@fg-mindee
Copy link
Contributor Author

For reference, my initial issue on PyMuPDF (pymupdf/PyMuPDF#938) was moved to this discussion: pymupdf/PyMuPDF#1137

@kchawla-pi
Copy link

Why not simply do a

conda run pip install <missing in conda package>

Especially since there's is only one package

@fg-mindee
Copy link
Contributor Author

Hi @kchawla-pi,

So actually since then, there is also weasyprint that is missing a conda build. But it happens that I was thinking about getting back to the bottom of this yesterday. Worst case scenario, we'll make some features optional (such as HTML compatibility through weasyprint) so that the core build is available in conda.

Also please note that for now, the only important dependencies that would benefit from a conda support (performance-wise) are PyTorch & TensorFlow 👍

Anyway, we'll provide some updates on this very topic soon!

@charlesmindee
Copy link
Collaborator

So now with #829 we are just missing weasyprint, right @fg-mindee ?

@fg-mindee
Copy link
Contributor Author

So now with #829 we are just missing weasyprint, right @fg-mindee ?

Nope, pypdfium2 also lacks support of a conda installation. But that could be fixed, I'll ping them about this!
However, having doctr.io.pdf and doctr.io.html as extras, would do the trick 👍

And I think we should seriously consider that: especially for HTML, it's more about people in need of training data, so I would argue that most users don't benefit from weasyprint (which is a problem for MAC users also #815)

For PDFs, it's more important, so if we can get a conda build, our best course of action would probably be to move html/weasyprint to an extra! What do you think?

@fg-mindee
Copy link
Contributor Author

I just checked and weasyprint does have a conda build now 🙌
https://anaconda.org/conda-forge/weasyprint

(But I still think we should move it to extra builds)

@mara004
Copy link
Contributor

mara004 commented May 18, 2022

Sorry about the conda build - I never used conda myself and currently don't have the time/interest to learn it. Due to platform-specific binaries, the setup infrastructure of pypdfium2 is fairly complex already.
Perhaps a developer who is more familiar with conda can look into this at some point. I'd be happy to take a Pull Request that adds conda packaging to the release workflow.
That said, is there any reason you can't use pip?

@felixdittrich92
Copy link
Contributor

@frgfm

@frgfm
Copy link
Collaborator

frgfm commented May 22, 2022

@mara004 what do you mean by "any reason you can't use pip?"

pip installation is already available 👍
but conda builds are more specific to a given environment, so it's good if we can offer that mean of installation as well. For the conda recipe, I don't know about options to use pip (I don't have experience with conda recipe building the C or C++ extensions of a python library though)

@mara004
Copy link
Contributor

mara004 commented May 22, 2022

what do you mean by "any reason you can't use pip?"

I'm not familiar with the conda environment, so perhaps that was a silly question to ask.
I basically meant: For what reason do we need an extra package on conda if the PyPI release can be used?
As @kchawla-pi wrote:

Why not simply do a

conda run pip install <missing in conda package>

but conda builds are more specific to a given environment

I'd be curious to know in what way exactly conda builds are more specific?

I have read the comparison of conda to pip in Wikipedia, but the problem specified there can be solved with venv. pip allows dependency breakage, but very clearly warns about it, so I don't really see an issue in this regard...

@kchawla-pi
Copy link

kchawla-pi commented May 22, 2022

Well, pip does not do sophisticated dependency resolution, unlike Conda. It's the same reason pipenv and poetry are used for package installations, but unlike Conda, they use PyPI's index. Each of these has their own algorithm for dependency resolution, with Pipenv being rather slow.

Conda is the defacto tool for data scientists in the Python ecosystem. Seamlessly using Mindee packages using Conda will solve a big paper cut.

@mara004
Copy link
Contributor

mara004 commented May 23, 2022

Okay, thanks for pointing this out!
To me personally, conda still seems kind of a reinvented wheel and duplicated packaging work, but if there are people who like it and use it I'm open to add support if someone can implement it properly.

@frgfm
Copy link
Collaborator

frgfm commented May 25, 2022

I can definitely second @kchawla-pi on that: I always try to find a conda installation before using pip, because it's much more careful about your existing env compatibility 👍

@mara004
Copy link
Contributor

mara004 commented May 25, 2022

I tried to craft a package with conda-build recently but I'm afraid it didn't go very well at all. I managed to build a package for my host platform (Linux x86_64) but it took unendurably long for conda-build to set up the environment and assemble the package (and while doing so, the directory where I installed miniconda grew well above 3 GiB 🙄). I hope there are ways to speed up the process of running conda-build...

@kchawla-pi
Copy link

Wow that must be so frustrating . I don't know about Conda packaging, but now I'm pissed at conda for making your job so difficult. I will try to take a gander at it in June.

@mara004
Copy link
Contributor

mara004 commented May 25, 2022

Well, I don't know, perhaps I was just doing it the wrong way, but all the same it hasn't been very obvious to me how to do it.

@frgfm
Copy link
Collaborator

frgfm commented May 25, 2022

In my experience conda build is always a long operation. Base conda is known to have a slow dep resolution procedure, so I personally use mamba (https://github.com/mamba-org/mamba) which is blazing fast for dep installation (multi-thread, rewritten in C++). I have to check if that extends to package building as well

@mara004
Copy link
Contributor

mara004 commented May 26, 2022

I think the main problem is that, when running conda-build, it creates an isolated environment where all dependencies are installed. Now, if we want to craft more than one package, it would be essential that the environment can be reused so that dependencies don't need to be installed each time. Is there any option to do this?

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Sep 2, 2022

@frgfm do you know an answer ? 😅

@mara004
Copy link
Contributor

mara004 commented Sep 2, 2022

Even if we can get around the duration problem, I'll still need information about conda platform tags. We need an equivalent for each of the tags shown on https://pypi.org/project/pypdfium2/#files (section "Built Distributions").
Alternatively, perhaps a conda package could just wrap pip install somehow?
The easiest case would be if there were some tool to automatically convert wheels to conda packages, but I doubt this exists.

@mara004
Copy link
Contributor

mara004 commented Sep 2, 2022

@mara004
Copy link
Contributor

mara004 commented Aug 19, 2023

Since it looks like the packages generated from pypdfium2-feedstock will not be made public (cf. AnacondaRecipes/pypdfium2-feedstock#1 (comment)), I will make a second attempt at building official conda packages for pypdfium2 in a conda branch, trying to accept or work around the python version problem (it remains to be decided how).

pypdfium2-feedstock currently requires manual interaction and native hosts.1 I want to design this differently so we can build automatically in a workflow and without native hosts.

Footnotes

  1. This is only possible due to anaconda's extended CI capabilities, and might still end up not supporting some platforms we technically have cross-compiled binaries for.

@mara004
Copy link
Contributor

mara004 commented Aug 20, 2023

Ok, so I think I have the local packaging part ready. It's really inelegant, but all I could do given conda's limitations.

Now the remaining parts we need are

  • CI integration
    • build in parallel for the multiple python versions (I'd suggest 3.8 through 3.11)
    • upload, supposedly to anaconda? (help wanted)
  • People who can test the built packages

Here's an archive of builds for python 3.11 which I generated locally: pypdfium2_conda_py311.zip
Also attaching a patch snapshot of the branch: pypdfium2_conda.patch.txt

Note that the packages will contain wrong __pycache__ files because they are not built natively, but I hope python will just regenerate them locally. (conda really should not bundle pycache in the first place...)

(PS: @kchawla-pi, now you can take a look at the code if you like ;) )

@mara004
Copy link
Contributor

mara004 commented Aug 20, 2023

Oh, and I just discovered conda's --variants feature - we can pass {python: [3.8, 3.9, 3.10, 3.11]} to build for multiple python versions. That doesn't really fix the problem (we still end up with separate packages although not logically necessary), but it makes it easier to accommodate, and hints that there may be at least some upstream recognition of the problem.

@felixT2K
Copy link
Contributor

felixT2K commented Aug 21, 2023

Hi @mara004
Thanks for the updates 👍

About uploading this seems not to be so complicated: https://levelup.gitconnected.com/publishing-your-python-package-on-conda-and-conda-forge-309a405740cf (manual upload)

and with CI: (as example from @frgfm 's holocron lib) 😅
https://github.com/frgfm/Holocron/blob/f78c6c58c0007e3d892fcaa1f1ff786cdbb5195f/.github/workflows/release.yml#L58
https://github.com/frgfm/Holocron/tree/main/.conda

Maybe @frgfm can help a bit more :)

@mara004
Copy link
Contributor

mara004 commented Aug 21, 2023

Thanks, sorry for spamming this thread.

The performance difference is heavy, though. Building all wheels takes ~20s on my device. Contrast this to conda builds which take, like, over 15min.1
(Also the conda packages are 100MiB compared to 30MiB for wheels, which is because of the python version splitting.)

I've got a feeling I'm missing something here, but if that were true it's not obvious how to do it properly.
Please tell me if anyone knows how to speed this up (apart from CI parallelization), or else how to improve it (like disabling pycache compilation).

Footnotes

  1. actually 2 platforms less because conda does not support musllinux ...

@mara004
Copy link
Contributor

mara004 commented Aug 21, 2023

Throwback, @boldorider4 just gave me an eye opener that pdfium should be packaged separately in conda so pypdfium2 can just depend on it and cleanly be noarch. I'm still thinking about this but believe it may finally be the clean solution I was looking for.

Ideally the conda packaging would be done in pdfium-binaries (will still need some conda convert for the cross compiled archs, but much easier). Then what we need in pypdfium2 is to instruct the library loader with the right path, and of course a noarch conda recipe.

This should really have come to my mind earlier. Especially I should have realized after a recent discussion with @KOLANICH about pdfbox, just failed to connect it.

Phew, I need a break before revisiting this 😅

@felixT2K
Copy link
Contributor

@mara004 fyi there is also a draft for conda-forge channel:

conda-forge/staged-recipes#23726

@mara004
Copy link
Contributor

mara004 commented Aug 22, 2023

@mara004 fyi there is also a draft for conda-forge channel:
conda-forge/staged-recipes#23726

Thanks for the pointer, see my comment conda-forge/staged-recipes#23726 (comment).

@felixT2K
Copy link
Contributor

@mara004 I wanted to ask if there are any updates on your site ? :)

@mara004
Copy link
Contributor

mara004 commented Oct 12, 2023

@mara004 I wanted to ask if there are any updates on your site ? :)

I've got it on my mind and have been working on some integration prerequisites to get this done nicely - packaging with an external library differs quite a bit from bundling. I can elaborate on the individual tasks if necessary.
The thing is, my personal situation is rather difficult, but provided it doesn't get worse I should hopefully be able to finish this well before year's end.

@felixT2K
Copy link
Contributor

@mara004 I wanted to ask if there are any updates on your site ? :)

I've got it on my mind and have been working on some integration prerequisites to get this done nicely - packaging with an external library differs quite a bit from bundling. I can elaborate on the individual tasks if necessary. The thing is, my personal situation is rather difficult, but provided it doesn't get worse I should hopefully be able to finish this well before year's end.

Oh yeah, no stress, I just wanted to ask so I can plan for it. :)

@mara004
Copy link
Contributor

mara004 commented Oct 23, 2023

Work in progress: https://github.com/pypdfium2-team/pypdfium2/pull/268/files
I believe the packaging is nearly done, we just need some cleanup, testing, docs and the CI integration now.

@mara004
Copy link
Contributor

mara004 commented Oct 24, 2023

However, we might have a bit of a problem with the custom channels.
According to conda/conda-build#532 (comment), it looks like conda-build might not properly support them in recipes?

In that case, users would have to add the channels explicitly before installation, which is probably doable, but not nice.
Especially we need to be careful with pdfium-binaries, because there's an improper package in anaconda/main (official channel's bblanchon), which adds to the confusion.

@mara004
Copy link
Contributor

mara004 commented Oct 30, 2023

Just merged the conda packaging code: pypdfium2-team/pypdfium2@ee5a2ff.
The packages build locally, but I haven't done the CI integration yet.

@mara004
Copy link
Contributor

mara004 commented Oct 31, 2023

And the CI/docs also merged now: pypdfium2-team/pypdfium2#269

@mara004
Copy link
Contributor

mara004 commented Oct 31, 2023

@felixT2K
Copy link
Contributor

felixT2K commented Nov 1, 2023

Thanks a lot @mara004 👍🏼

@frgfm
Copy link
Collaborator

frgfm commented Dec 3, 2023

Thanks @mara004, it looks like we're gonna be able to get docTR on anaconda now!
Here are all docTR deps as of now & their support of conda:

Default channels

  • numpy
  • scipy
  • h5p
  • matplotlib
  • Pillow

Conda forge

Custom channels

Just gotta put together a conda recipe. Building it might be long for the dependency resolution + check considering the amount of deps, but it should work 👍 (we might have some surprises on some OS though)

@mara004
Copy link
Contributor

mara004 commented Dec 4, 2023

I imagine carrying around the custom channels for pypdfium2 (pypdfium2-team, bblanchon) might be a bit of an annoyance...

Despite the channel::pkg syntax, end users still have to activate the channel manually. Conda does not automatically resolve/activate dependency channels, nor is there a recipe section to specify channels to enable (conda/conda-build#532).

I'm kind of wondering if we might have gone the wrong way and should have tried putting pypdfium2 and dependencies in conda-forge instead, but the feedstock publishing seemed less flexible and I wasn't sure how to automate it. However, if anyone wants to pursue that path, the feedstocks written by Anaconda Team (pdfium-binaries, ctypesgen [pypdfium2-team fork]) might be a good starting point. Though I'd recommend not to use their pypdfium2-feedstock, but split in separate pypdfium2_raw and pypdfium2_helpers packages as we do in the custom channel.

It would be most convenient if conda-forge as a community channel could just "include" or mirror the pypdfium2-team and bblanchon channels, but I don't think they can do this.

Anyway, unfortunately my time budget for conda is more than over, so I won't be able to look into this any deeper 😅

@mara004
Copy link
Contributor

mara004 commented Aug 16, 2024

FWIW, someone has put pypdfium2 in conda-forge now, but badly.
As of this writing, they only support osx-64 and macos-64, and since they're bundling the binaries, they have to build python version specific packages, unnecessarily.
Again, it's tied to native hosts, and as such will always lack support for architectures not provided by the feedstock infrastructure.

So, please continue using our packages from bblanchon and pypdfium2-team channels.


conda-forge links:
https://anaconda.org/conda-forge/pypdfium2
https://anaconda.org/conda-forge/pypdfium2/files
https://anaconda.org/conda-forge/ctypesgen-pypdfium2-team

@felixdittrich92
Copy link
Contributor

Thanks for the update 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: build Related to dependencies and build type: bug Something isn't working
Projects
None yet
8 participants