Skip to content

Tool for downloading files from 4chan.org. Avoids duplicate downloads (optional) by using the MD5 hashes provided by 4chan.

License

Notifications You must be signed in to change notification settings

nilfoer/fourcdl

Repository files navigation

4CDownloader

Tool for downloading files from 4chan.org. Avoids duplicate downloads (optional) by using the MD5 hashes provided by 4chan.

Download

Download the source code and extract it into a folder where you want the downloaded files to be stored. Then start a shell in that directory and execute:

> pip install -r requirements.txt

This will install all required 3rd-party packages.

Usage

Start the script using:

> fourcdl-runner.py watch

It will watch your clipboard for copied 4chan thread urls. Once a thread url is copied you need to supply a folder name to which the thread will be downloaded to. You'll also be asked if you want to only download unique files (don't worry you can still download duplicates later).

> fourcdl-runner.py watch
Watching clipboard for 4chan thread urls...
14:15:54 - INFO - Viewing thread "B/W Miniseries" No. 3603904. OP:
[Removed output of OP message]
Input the folder name the thread is going to be downloaded to (e.g. "gif_cute", subfolders work too "gif_model/Emily Rudd"):
> p_bw-photos
Only copy unique files? y/n:
> y
Watching clipboard for 4chan file urls...
Copy cmds are: rename_thread, reset_filename, remove_file !

Now you can either press Ctrl+C and answer y to download the whole thread or only unique files if that was previously selected.

Or you can copy link addresses of 4chan files in this thread (e.g. use RMB then e on a thumbnail in Chrome). The original filename and and the MD5 hash will be printed. Now you're in append mode and copied text that isn't a 4chan file URL will be appended to the download file name of the file.

14:22:00 - INFO - Found file url of file: "p/1579558670830.jpg" Total of 1 files
Orig-fn: 2020-01-20-0008 | MD5: IzisM5jU8PGq+PDZ7n1Wkw==
Not a file URL -> clipboard was appended to filename:
1579558670830_Fomapan 200_ shot on Canonet QL17_ developed in Cinestill Monobath

If the file was already downloaded before the following message will appear:

Files with matching md5s:
  p_bw-photos\1579558670830_Fomapan 200_ shot on Canonet QL17_ developed in Cinestill Monobath.jpg
14:33:16 - INFO - ALERT!! File with url p/1579558670830.jpg has been downloaded before!
    Copy add_anyway to add file to downloads!

You can decide to move on to the next file or copy add_anyway to add the file to the download list.

There are three special commands that when copied trigger an action:

  • rename_thread: You will be aske to input a new thread folder name.
  • reset_filename: File name will be reset to the file name on the 4chan server.
  • remove_file: Last copied file will be removed from download list.

Once you want to stop adding files for this thread press Ctrl+C once then you can copy the URL of another thread url or press Ctrl+C once again to start downloading.

Stopped watching clipboard for 4chan file URLs!
Stopped watching clipboard for 4chan thread URLs!
14:30:40 - INFO - Downloading thread No. 3603904: "B/W Miniseries"
[...]
14:30:42 - INFO - CRC-Check successful!
14:30:42 - INFO - Writing thread export file "p_bw-photos_2020-01-23.txt"
14:30:42 - INFO - Appending md5s!

The script automatically verifies downloaded files and saves their MD5 hashes and names in a file with the name 4chan_dl.md5 in the root directory for convenient use with e.g. md5sum. Additionally the hashes and file names are saved in a file named downloaded_files_info.pickle for internal use. So don't delete these files!

Before starting the download the file auto-backup.json is created containing the program state so you can resume from where you left off should the download unexpectedly crash. On crashing a file named crash-exp.json is additionally written for the same reason.

To resume after a crash start the script with fourcdl-runner.py resume should there be no crash-exp.json file and the script crashed while downloading you can use fourcdl-runner.py resume auto-backup.json.

About

Tool for downloading files from 4chan.org. Avoids duplicate downloads (optional) by using the MD5 hashes provided by 4chan.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published