Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows Safari history file to be imported to Promnesia #207

Merged
merged 1 commit into from
Feb 27, 2021

Conversation

gms8994
Copy link
Contributor

@gms8994 gms8994 commented Feb 26, 2021

This seems like it works. It at least seems to grab visits out of the sqlite database. I'm not fully sure how the schema_check bit works, so if you have suggestions I'm happy to make updates. promnesia index is successful and the dates reported by the browser extension seem accurate.

@karlicoss
Copy link
Owner

Looks good!
Thanks, especially for the links in comments 🙏

Yeah, I had to dig up schema_check from git history, since I forgot as well and it seemed unused!
Turned out it's an artifact from the past. Back then I tried regularly merging it regularly in a single database (because usually these local sqlite databases have a retention like 90 days), so this was a schema check for extra safety.

Didn't save me though :) So now instead I'm backing up the database regularly and then iteratively reconstructing the full merged view in runtime (hence some madness in the module -- I need to implement a first-class support for this in cachew, perhaps).

I guess I'll remove the schema_check things later, when I make sure it's definitely not needed anywhere else.

@karlicoss karlicoss merged commit 20a07c2 into karlicoss:master Feb 27, 2021
@gms8994
Copy link
Contributor Author

gms8994 commented Feb 27, 2021

So your process with browser history files is to create a new sqlite version incrementally for each period T, and then have promnesia import them individually? I'm trying to figure out how I should "best" be storing the files locally; right now, I just have multiple machines scp the respective history files to a single location so that they can then be covered by index, but some of the history files (Safari in particular) are 40M and take a couple of minutes to process...

@karlicoss
Copy link
Owner

Yep, basically everything is 'snapshotted'/backed up individually (e.g. once every couple of days), and then merged by Promnesia. I'm using syncthing to sync most of my stuff, so the syncing part is relatively painless for me.

Indeed, that way there can be quite a few of these databases. For some of my similar exports I have a tool to determine 'redundant' snapshots and clean them up https://github.com/karlicoss/bleanser, have a prototype of a similar script for history databases, just need to merge it. This would solve the storage part, as for processing with a better support from cachew also shouldn't be a problem.

@gms8994
Copy link
Contributor Author

gms8994 commented Mar 1, 2021

Thank you for the information! I'm trying to figure out the "best solution" for this problem; re-indexing the entire file every hour doesn't seem like a great plan from an efficiency perspective as 99% or more will not have changed. Do you have thoughts on extending the Source for Promnesia to allow overriding the query that's generated (or at least adding a "where" clause to it) that would allow this?

I considered splitting up the database in to multiple sqlite files, but the duplication of URL data (at least from Safari's perspective) ballooned the file sizes quite a bit. I could work with that, but I'm curious what you think. Perhaps I should open this as an issue instead?

@karlicoss
Copy link
Owner

makes sense yeah, let's move to #214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants