Skip to content

Latest commit

 

History

History
49 lines (43 loc) · 2.8 KB

README.md

File metadata and controls

49 lines (43 loc) · 2.8 KB

CDF.MOE

Makes graphs from comments on Reddit threads

Preparing for the data

  • Install Docker
  • Fill out the info needed in .env.example and rename it to .env
  • Run docker-compose up to start everything up
    • Will need to use the --build option to make it not use a cached image when you wanna update the image
    • Use the -d option to leave it always running in a detached state

Acquiring the data

  • Pick from one of three options to acquire data:
  • Run:
    1. docker exec -it cdf.moe_acquire_1 /bin/bash
    2. poetry run python3 acquire.py --OPTION
    3. Wait potentially quite a while or a few seconds-minutes for it to run depending on which option you chose
  • The default behavior is to automatically run the --cdf option on a weekly basis

Viewing the data

  • If you wanna see the raw data, run docker exec -it cdf.moe_db_1 psql -U YOUR_POSTGRES_USER THE_DB_DATABASE
  • Visit cdf.moe for fancy graphs for CDF

Other stuff

  • Issues and PRs welcome, particularly for adding on graph types
  • This is very much due to a local configuration setup, but you're going to have to do a few other things
    1. Set up an external docker network
    • Run docker network create cdf.moe_nginx
    • This is to connect nginx with the services
    1. Build the site
    • Run docker exec -it cdf.moe_website_1 npm run build and point nginx at the resultant /dist
    • This is because by default the docker container for the website only serves it locally, so if you want to make it available outside, you're gonna need to build it

Interesting queries

Ranking users over number of comments made

select rank() over (order by count desc), author, count from (select author, count(author) from comments inner join threads on link_id = long_id where short_id='short_name_for_thread' group by author order by count desc) x;

Seeing which users received the most replies

with t1 as (select parent_id, count(*) as c from comments inner join threads on link_id = long_id where short_id = 'short_name_for_thread' and link_id != parent_id group by parent_id),
t2 as (select author, name from comments inner join threads on link_id = long_id where short_id = 'short_name_for_thread')
select t2.author, sum(t1.c) as s from t1, t2 where t1.parent_id = t2.name group by author order by s desc;