A cool scrapy spider that is used to notify the price drop in a product that you crave to buy!
- Tracks availability and price of an amazon product of your wish. 🎁
- Scheduled to run periodically everyday. ⏳
- Notifies through email on price drop. 📧
- Deployed and scheduled periodically on Heroku for free. 💸
You can deploy your scrapy spider locally and on Heroku by following the steps below. You can deploy your scrapy spider peridically on Heroku for free similar to peridic scrapy spiders scheduled on scrapy-cloud that comes with paid account upgrade.
- Install scrapy daemon by executing
pip3 install scrapyd
. - Install scrapy-client byexecuting
pip3 install git+https://github.com/iamumairayub/scrapyd-client.git --upgrade
. - Execute
scrapyd
in one terminal. - Change
[deploy]
to[deploy:local] or [deploy:<str>]
in scrapy.cfg. - Uncomment the
url = http://localhost:6800/
under[deploy]
in scrapy.cfg. - Execute
scrapyd-deploy local
in another terminal. - Execute
curl http://localhost:6800/schedule.json -d project=myntra -d spider=gadgets
to start the spider ececution. - Execute step 6 and step 7 whenever a change is made in the project to update the same in scrapy daemon.
- Execute
curl http://localhost:6800/cancel.json -d project=myntra -d job=<job_id>
to stop the running spider.
- Install herokuify_scrapyd by executing
pip3 install herokuify_scrapyd
. - Create requirement.txt file by executing
pip3 freeze > requirements.txt
. - Create Procfile and runtime.txt file.
- Login to Heroku:
heroku login -i
. - Create heroku app by executing
heroku create
. - Copy the url (eg,
https://limitless-waters-77333.herokuapp.com/
) returned and paste it under [deploy] section in scrapy.cfg. - Add
[scrapyd]
section in scrapy.cfg. - Add heroku to remote
heroku git:remote -a <app_name>
, for example,heroku git:remote -a limitless-waters-77333
. - Push the code to heroku.
git init
->git add .
->git commit -m <commit_message>
->git push heroku master
. - Deploy the app.
scrapyd-deploy local
->curl https://limitless-waters-77333.herokuapp.com/schedule.json -d project=myntra -d spider=gadgets
.
Follow the above steps 1-9 on deploying to Heroku. Continue with the steps below:
- Install modules required for scheduling by executing
pip3 install pytz
andpip3 install apscheduler
. - Create periodic_scheduler.py python file to create a cron job.
- Update the Procfile to create a clock dyno
clock: python3 periodic_scheduler.py
. - Update the requirement.txt file by executing
pip3 freeze > requirements.txt
. - Push the code changes to heroku.
git add .
->git commit -m <commit_message>
->git push heroku master
. - Execute
heroku ps:scale clock=1
to ensure the clock dyno component is a singleton process thereby avoiding scheduling duplicate jobs.
NOTE: Refer this doc for more details on the options provided by apscheduler cron trigger.
Refer this doc.
- Using SMTP_SSL()
context = ssl.create_default_context()
server = smtplib.SMTP_SSL("smtp.gmail.com", 465, context=context)
server.login(os.environ.get("sender_email"), os.environ.get("password"))
server.sendmail(os.environ.get("sender_email"), os.environ.get("receiver_email"), message)
- Using .starttls()
context = ssl.create_default_context()
erver = smtplib.SMTP("smtp.gmail.com", 587)
server.ehlo()
server.starttls(context=context)
server.ehlo()
server.login(os.environ.get("sender_email"), os.environ.get("password"))
server.sendmail(os.environ.get("sender_email"), os.environ.get("receiver_email"), message)
server.quit()