Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDEA: crawlers are callable modules #221

Open
jkowalleck opened this issue Mar 24, 2020 · 6 comments
Open

IDEA: crawlers are callable modules #221

jkowalleck opened this issue Mar 24, 2020 · 6 comments

Comments

@jkowalleck
Copy link
Member

nichtparasoups image crawlers could be called as modules
ala python3 -m nichtparasoup.imagecrawler.echo '{"image_uri":"foo"}' 3

this would allow to have some images crawled without having to write actual python ...

implementation example for the end of a crawler implementation

import sys
import json
if __name__ == '__main__':
    config = json.loads(str(sys.argv[1]))
    times = max(int(sys.argv[2] if len(sys.argv)>=2 else 0), 1)
    imagecrawler = MyCrawler(**config)
    for _ in range(times):
        json.dump(imagecrawler.crawl(), sys.stdout, indent=None)
        if imagecrawler.is_exhausted():
            break

prerequisites:

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 24, 2020

putting a __main__ on the bottom of each crawler file/module might be fine ...
but this would break modules that implement multiple crawlers - like Instagram which implements Tag and Profile ...

this needs some throught ... and maybe restructuring ...

restructuring idea - which would need no code change at all - all visible interfaces stay the same

  • nichtarasoup
    • imagecrawler
      • echo -- implement Echo`
      • instagram
        • base -- define InstagramBase
        • tag -- implement InstagramTag (include Base)
        • profile -- implement InstagramProfile (nclude Base)
        • __init__ -- import base, tag, profile - and make them public via __all__

or do something disruptive?

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 24, 2020

instead of implementing the same __main__ again and again ...
this functionality could be done once ... and applied where needed ...

this would make the functionality available without importing sys and json everywhere ...

@jkowalleck
Copy link
Member Author

alternative:
craete an extra package nichtparasoup-imagecrawler-cli that has the needed functionality ...
it could even work with the autoloader, so pligin-imagecrawlers would work with it right away ...

or maybe have this included as an extra command in the existing CLI ?

@jkowalleck
Copy link
Member Author

jkowalleck commented Mar 26, 2020

guess this first idea is a great one.
but actually this needs a change in the internal image crawler structure.
with the command calls in mind.

crawlers need to define, how they are configured... in the CLI.
each crawsler acts as an own sub-command. (https://click.palletsprojects.com/en/7.x/commands/#custom-multi-commands)

💯 this means it will cause the application to have mayor changes.
so a version 3 of nichtparasoup will be issued.

@jkowalleck jkowalleck added this to the 3.0.0 milestone Mar 26, 2020
@jkowalleck jkowalleck mentioned this issue Apr 4, 2020
@jkowalleck
Copy link
Member Author

with the switch over to click the following could be a solution, UNTESTED

cli: nichtparasoup imagecrawler run [OPTIONS] [NAME] ...

to get this added dynamically:

  • name gets a callback that does the wollowing:
    • check if it's value is valid (known and has a supcommand)
    • add (or replace the arg with) the appropriate subcommand

justa na idea, never tested this ...

@jkowalleck jkowalleck removed this from the 3.0.0 milestone Aug 8, 2020
@jkowalleck
Copy link
Member Author

jkowalleck commented Aug 8, 2020

an idea:
have the run command gathet needed args/options anf store them in a context
see https://click.palletsprojects.com/en/7.x/commands/#custom-multi-commands
when it comes to invoking the subcommand , just do what you have to do.
option1: subcomman's click may just gathering options. invoker can be overridden - see https://click.palletsprojects.com/en/master/commands/?highlight=subcommands
option2: pass all run options to the invoked subcommand as context. subcommands are base-implementations from CaseImageCrawler that take all options and just run in circles until end is reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant