Skip to content
This repository has been archived by the owner on Apr 22, 2024. It is now read-only.

Fix/schema issues #146

Merged
merged 17 commits into from
May 4, 2018
Merged

Fix/schema issues #146

merged 17 commits into from
May 4, 2018

Conversation

shawnsarwar
Copy link
Contributor

Changed naming convention of schemas.
For example;
http://demo.eha.org/Person -> org.eha.demo.Person

Which is more idiomatic for Avro. As Salad prefers the previous convention, we do this after a schema has been turned into Avro, if we're using salad-bar.

Consolidated avro libraries.
We've also tried to limit our usage of avro libraries that are not spavro. (removed fastavro completely). However, a bug in spavro means we've been forced to use avro-python3 in the aether-producer for now. It's been reported, and hopefully will be addressed, but their extensive use of Cython makes it hard to patch and submit a PR.

Removed Python2 Dependencies.
We were maintaining compatability of some utilities (saladbar, producer, etc) for dependency reasons. We've been able to remove those dependencies. Now all apps and utilities run python3. Only aether-client and the consumer-sdk will need to be compatible with both python2 and python3.

requests==2.18.4
simplejson==3.13.2
six==1.11.0
spavro==1.1.10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously fastavro is still there 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, if you don't wipe out the previous requirements.txt things can sneak in despite no longer being in primary requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not. This is more mysterious than I though.

pbr==3.1.1
mock==2.0.0
pbr==4.0.2
psycopg2==2.7.4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psycopg2==2.7.4 could you remove this line? 😳

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's redundant? That's what pip_freeze stuck in there for the requirement: psycopg2-binary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a look at the console logs you'll see that psycopg2 is the old version and psycopg2-binary is the new one that will replace the other one in the next releases.

Someone with the same question as you:

psycopg/psycopg2#674

cursor.execute(count_str);
for row in cursor:
return row.get("new_rows")
return sum([1 for row in cursor])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't SELECT COUNT(*) FROM ... work for you? 🤔

or maybe

SELECT COUNT(DISTINCT
                e.id,
                e.modified,
                ps.name,
                ps.id,
                s.name,
                s.id)
  FROM kernel_entity e
...

Bring to the server all the data when you only need the number seems very expensive for me.

Copy link
Contributor Author

@shawnsarwar shawnsarwar May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Select count stopped working, which was a total mystery. So I moved to this. I was afraid it would be slow so I bench-marked it. The operation is on the MS timescale so I stopped with the solution since it's "efficient enough" for now since this only happens once every 10 seconds or so and this and the DB are both on the same machine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mysteries of life!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised because it's kind of a key piece of functionality and all of the sudden nothing worked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include a comment? I think this question is going to show up every time someone reads the code 🤓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Of course.

inner join kernel_schema s on ps.schema_id = s.id
WHERE e.modified > '%s'
ORDER BY e.modified ASC;
''' % (offset)
Copy link
Collaborator

@obdulia-losantos obdulia-losantos May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I be 😈 with the SQL format? Why is ______________ from so far away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like this?

SELECT e.id,
       e.revision,
       e.payload,
       e.modified,
       e.status,
       ps.id      AS project_schema_id,
       ps.name    AS project_schema_name,
       s.name     AS schema_name,
       s.id       AS schema_id,
       s.revision AS schema_revision
FROM   kernel_entity e
       INNER JOIN kernel_projectschema ps
               ON e.projectschema_id = ps.id
       INNER JOIN kernel_schema s
               ON ps.schema_id = s.id
WHERE  e.modified > '%s'
ORDER  BY e.modified ASC; 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 🙈

for x, row in enumerate(cursor):
if x >= max_size - 1:
raise StopIteration
yield {key : row[key] for key in row.keys()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you pretend with this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking why it's not just a list comprehension instead of a generator?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point of using a generator, but I don't get the point of using a generator with enumerate(cursor) (maybe I need to read again the python docs) that actually it isn't a generator itself. 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think enumerate consumes the whole cursor on one go, but I could be wrong. Only reason for enumerate was to implement the limit functionality, but as you said that should go in the SQL query anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, interesting.

@@ -1,17 +0,0 @@
# Copyright (C) 2018 by eHealth Africa : http://www.eHealthAfrica.org
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you removing the file or the license header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the module's folder structure so the file was removed. Could be a merge error since Fredrik was adding the headers on dev at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like the license headers haven't been applied to empty init.py files

inner join kernel_projectschema ps on e.projectschema_id = ps.id
inner join kernel_schema s on ps.schema_id = s.id
WHERE e.modified > '%s'
ORDER BY e.modified ASC;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to use the LIMIT option with the max_size value?

https://www.postgresql.org/docs/9.6/static/queries-limit.html

SELECT select_list
    FROM table_expression
    [LIMIT { number | ALL }] [OFFSET number]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, should have done that.

import aether.saladbar.saladbar.parsers as Parsers
# from saladbar import parsers as Parsers
# from saladbar import salad_handler as salad
import aether.saladbar.saladbar.salad_handler as salad
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove the commented imports or are there for a reason?

Copy link
Collaborator

@obdulia-losantos obdulia-losantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a little bit of cleaning but nothing that cannot be done later if this is very urgent

@shawnsarwar shawnsarwar deleted the fix/schema-issues branch May 4, 2018 10:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants