The cleanup could be missed if the request handling code in
app.py:__call__ exits early (due to exception, or due to
one of those early "return"s).
So to make sure the sql session is cleaned up for real,
wrap the whole thing in a try: finally:.
Also wrote a short tool to test if the session is actually
empty. The tool is currently disabled, but ready to be
used.
After converting everything, check what is actually used in
the db. For media_types that are not used, drop all the
media_data tables and remove the migration info.
This switches the whole source code over to use sql instead
of mongodb. It's a pretty easy change, but changes nearly
the complete way things work. Hopefully everythong works!
The JSON fields are really "dumb stuff in here" fields.
They are not intended to get indexed or anything. And they
can get large. For example the exif_all field in one of my
simple tests is nearly 7 kB large. Although VARCHAR might
work, TEXT feels just better as the storage type.
1. No need to drop media_data['exif'], we only have and
want media_data['exif_all'].
2. Use media['_id'] instead of media._id (better not use
dot-notation on mongo objects in such a low level tool).
MediaEntry.media_data.exif_all will contain all the
"clean" EXIF data.
MediaEntry.exif_display_iter() is an iterator that fetches
the most interesting entries for display from that data.
When creating a new media_data row, the new row needs to
know the MediaEntry it is associated with. I have no idea,
why this worked before at all. Maybe some implicit tricks
by sqlalchemy?
So all models are ready when connecting to the db and so
our "db" object has all models listed on it, create a
function to load all models from the media_types, etc. Call
it in setup_database()
Problem: This gives celery warnings, because celery is
imported before being setup properly. No idea how to fix
this now. So media-type loading is excluded from
load_models for now.
As the queries are quite verbose, disable them for now.
Reenabling them should be done in the central logging
config, which is another story for celery and bin/gmg.
Kind of useful to see but... I don't think they're needed, and I'm not
super comfortable with print statements being in migrations. Seems
semi bloated!
The mongosql tool is really dumping directly into the sql
database and is trying not to use too much logic that might
change later.
So this means, it needs to create the migration records on
its own!
So add a bunch of records with version=0.
Searching media by slug is easy on mongo. But doing the
joins in sqlalchemy is not as nice. So created a function
for doing it.
Well, and create the same function for mongo, so that it
also works.
If there is no media_data row for the current media (for
whatever reason, there might be good ones), let
MediaEntry.media_data not raise an exception but just
return None.
The exif display part now handles this by checking whether
.media_data.exif is defined (None has no attribute exif, so
it's undefined, all fine).
Docs:
http://docs.sqlalchemy.org/en/latest/core/engines.html#configuring-logging
So for an application utilizing python logging for real
(and MediaGoblin should) the rule is:
- Don't use echo=True,
- but reconfigure the appropiate loggers' level.
So replaced the echo=True by a line to reconfigure the
appropiate logger to achieve the same effect.
This still dumps whole bloats of SQL queries into the main
log, but at least they're not duped any more.
The name part of a MediaFile is only using a very limited
number of items. Currently things like "original" or
"thumb".
So instead of storing the string on each entry, just store
a short integer referencing the FileKeynames table and have
the appropiate string there.
Using the new check_media_slug_used it is possible to have
one generic generate_slug in the mixin class instead of in
each db class.
In the sql variant self.id is not always set: If the slug
alone would create a dupe the current code decides for "no
slug at all".
In two cases (generating a new slug and editing the slug)
it is nice to know in advance (before the db gets angry)
that the slug is used/free. So created a db utility
function to check for this on mongo and sql:
check_media_slug_used()
The current SQL layout/sqlalchemy strucuture can't detect
whether a slug isn't needed any more and delete it. So
provide a tool function to cleanup unused slugs.
It's currently not hooked to any gmg function!