-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new source for Viber Desktop local sqlite db #204
Conversation
bc cannot control src-name otherwise.
try: | ||
yield from _handle_row(row) | ||
except Exception as ex: | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replying to your comment here
Basically the options people usually use are:
-
throw immediately (as you suggested in the comment)
This is problematic because then on any unexpected situation (like 1/10000 messages parsed incorrectly), the program will break. Considering we have to deal with closed protocols, formats change without warning, and it's annoying when your software breaks until you fix it, so we need to be more flexible and defensive here. In addition, imagine if one data source breaks the whole program -- sometimes it's justified (e.g. if you were working with important medical data or something like that), but not in case of this tool. -
log and continue (as you're doing here)
It works nice in terms of not breaking immediately. However then there is another danger -- imagine that for example Viber changes format for 'time' column to store the isoformatted string (instead of epoch), for whatever reason. Then promnesia indexer will carry on working (because of exception catching) -- it's good. However, you won't notice the error unless you regularly reviewing the logs, or maybe when you know for sure that Promnesia should fire but it doesn't.
So in Promnesia (and related projects that process data, like HPI) I'm doing something intermediate.
- catch the exceptions (where it makes sense, e.g. parsing)
- possibly log it if necessary
- but then
yield
it to the parent function, as a normal value (there is even an alias forResult
)
This allows the top level handler to exit with code 1 (so you get a cron email/systemd job failure etc and notice the error), while still doing what's the program supposed to do (process as much as it can and insert in the database).
As more examples why it's useful in Promnesia, errors are also inserted in the database:
- this makes it easier to test in some cases (instead of intercepting logs, just check the database)
- later it will allow to overview the errors form the browser which would be much more convenient for most users than messing with systemd logs or whatnote
I write more about it here, you might enjoy it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks so much, especially for figuring out the query!
Looks good, there are some inor comments (and some optional, so up to you if you want to fix them, otherwise I can clean up sometime later).
I don't have Viber so can't test it (to ensure it doesn't break later). Do you know by any chance, if there some public viber database somewhere on github, so it would be possible to run some basic tests against it? Otherwise not a big problem, hopefully the most potentially fragile part (query) won't need to change at all for a while.
enhance sanity checks on empty viber fields, ass comented by @karlicoss in karlicoss#204.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know by any chance, if there some public viber database somewhere on github
No.
I could empty mine quasi-completely, to keep just some for running the query, and upload it here or the creation sql-scripts.
Even better if i could upload the DDL+insert statements it in the sources, to create test-case in the sources.
Besides this requiring some time-investment from my side, it leads to another question:
maybe se shpould move the query part of the code in HPI, no?
... move index() closures to the top of the function.
...and use a sensible default.
Posting as a separate comment so it's not buried in the code comments; might be useful for someone else later. So, neither symlink/hardlink for
Ok, but MIME not only examines the extension, but also looks at the 'magic bytes' in the file (that would explain how it can guess '.db'). But browser can only guess by the extension type, it can't look in the file before downloading... let's check how the extension guessing works...
Aha! So the system doesn't actually know Some search https://stackoverflow.com/a/31836/706389 suggests that you need to prepare an xml and install it with
So here we go -- apparently it knows .sqlite2/.sqlite3, but not .sqlite. Ok, let's prepare the xml...
And...
Win! Now we only need to associate sqlitebrowser with it:
And after that Firefox finally offers to open the database! |
Also while debugging the above, found another way, posting it here since might be useful somewhere else later... Found some help on changing the application handler in Firefox, but unclear how to add a new format: https://support.mozilla.org/en-US/kb/applications-panel-set-how-firefox-handles-files Following advice from here: https://support.mozilla.org/en-US/questions/1235051#answer-1158433
Explanation for action field from here: https://support.mozilla.org/fr/questions/1262664#answer-1232236
After reloading Firefox this way also works. But it's a bit sad the functionality isn't exposed to the GUI :( Also, neither of the ways work in Chrome -- it seems to have a mind of its own, and quick search seems to suggest there is no way without custom extensions 🤮 |
Up to you -- don't expect anyone to break it, and I don't mind if we add tests for this later. I know how tedious and time consuming this 'database emptying' can be.
Yep, could do, e.g. iterator over messages could be useful, and then it would be imported in Promnesia for URL extraction. |
Looks good to me! 🎉 Let me know what you think about the hardlink thing, and after that I'm happy to merge if you are. |
I will remove the had-link (your xdg-handler fits by purpose), and then you can merge it. |
prefer @karlicoss teaching firefox to recognize `.db` mime-type: karlicoss#204 (comment)
prefer @karlicoss's teaching firefox to recognize `.db` mime-type: karlicoss#204 (comment)
Removed hard-link hack in 264f543, Tell me when to split it in HPI, better in 2 separate PRs, no? |
Thank you! 🎉 merged For two PRs -- you mean one would be to HPI (e.g. |
Many thanks! Would it make sense to augment karlicoss/open-in-editor to install those handlers you crafted in those 2 messages? |
Not sure about adding this to open-in-editor.. I guess it's somewhat unrelated. |
Will you add it? |
Ah, great to know that even mimetype is enough! Yep, will add later. |
also fix missed mypy issues after #204
also fix missed mypy issues after #204
also fix missed mypy issues after #204
Add new source for Viber Desktop local sqlite db
Viber Desktop (at least in linux that i checked) stores its messages in a local SQLite file
~/.ViberPC/{user-phone-number}/viber.db
and it contains everything needed to devise Visits stream,as shown in the the sample below:
The indexer collects:
The only one missing is a working
Locator.href
, which i couldn't think of anything other than:It is configured with: