Daminion Forums: Tool to Check Consistency of Tags - Daminion Forums

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 19 July 2017 - 05:13 PM (#1)

Tool to Check Consistency of Tags


Hi!

As it has been discussed in this forum (e.g. Group behaviour) that there should be an option to "force" the tags to all grouped or linked images. As the process is today manual, it's prone to errors. This started to annoy me and as I had some free time, I decided to refresh my coding skills and wrote a small Python program that will traverse the Daminion catalog and check for inconsistencies. This is (and probably will stay) as a "nerd version" – you need to be familiar with command line and installing software.

Install Python3 (earlier version should work as well, I have been running on 3.3) and install Psycopg2 package into Python. Now you can run from command line (or from an Python IDE like PyCharm) the attached code. Rename the file first from DamScan.txt to DamScan.py.
C:> Python DamScan.py [options]

usage: DamScan.py [-h] [-c DBNAME] [-s SERVER] [-p PORT] [-u USER] [-l] [-v]
                  [--version]
optional arguments:
  -h, --help            show this help message and exit
  -c DBNAME, --catalog DBNAME
                        Daminion catalog name [NetCatalog]
  -s SERVER, --server SERVER
                        Postgres server [localhost]
  -p PORT, --port PORT  Postgres server port [5432]
  -u USER, --user USER  Postgres user/password [postgres/postgres]
  -v, --verbose         verbose output
  --version             Display version information and exit.

The options should be self evident and the defaults match default Daminion configuration.

Currently the code checks pairwise linked items (not yet grouped) and reports, if there are differences. Program reports the differing files and for single value tags (e.g. Place) it reports both values and for multi value tags (e.g. Keywords) it reports values that are missing from the first file. The output is tab delimited so you paste it into Excel for further processing. The program is read-only; it doesn't change the database contents.

Currently following tags are checked:
  • Place (single)
  • GPS (single)
  • Event (single)
  • Keywords (multi)
  • Categories (multi)
  • People (multi)

An example output:
ImageA	Dir	ImageB	Tag	ValueA/Missing A		ValueB
IMG_8090.jpg	<>	IMG_8090.CR2	GPS	'44.1448N 3.09918E 256.0m'	<>	'44.1448N 3.09918E 0.0m'
IMG_4115.jpg	>	IMG_4115.CR2	Keywords	'Reflections'
IMG_1806.tif	<	IMG_1806-09.tif	Categories	'Other\Panorama'

and an interpretation
  • IMG_8090 the GPS co-ordinates (altitude) differ between JPG and CR2
  • IMG_4115 the CR2 image has keyword 'Reflections' that is missing from the JPG
  • IMG_1806.tif is missing category 'Other\Panorama' that exists in IMG_1806-09.tif

The symbols < and > just show is the relation between the images linked to or from. In both cases (even though the notation can be misleading) ImageB contains tag values that are not existing in ImageA.

I'm thinking of following improvements:
  • an option to do the analysis based on stacking instead of linking
  • an option to define which tags are analyzed (now all are analyzed)
  • option to list the full path and/or the Daminion ID
  • write better installation instructions

If you need some other tags to be analyzed or have improvement ideas or problems, drop a note. (GUI is not in my plans. :pardon: )

-Juha

The normal disclaimer applies that use it at your own risk, there is no warranty etc. Also I don't take any responsibility if your lover leaves you, because you are just fixing the tagging in your catalog. :gamer2:

Attached File(s)


1


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 19 July 2017 - 05:29 PM (#2)

As a reference, I have 20.500 items in my active catalog and I got roughly 1.600 messages. :negative:

-Juha
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 20 July 2017 - 11:00 AM (#3)

Hi!

A slightly more detailed installation instructions.

After you have downloaded Python package right click the package and select "Run as administrator". In the installation dialog select Customized installation. In the customized configuration tick to include Python in the PATH and select installation for all users. Other options can be left to defaults.

After installation start a command window (you may need to do this also as an admin, because the Postgres support package will be installed in the Program Files directory).
C:> python -m pip install -U pip setuptools
C:> python -m pip install psycopg2

Close the elevated command window and start a normal command window. Now you can run my tool with the command (include the path where you downloaded my Python module if you are not in the same folder):
python \DIR\DamScan.py [options]


I have attached a new version that has the support for groups. Invoke the program with an option -g (or --group) to change the default checking based on links to check based on groups(stacks). I also changed the separator in hierarchical stacks to '|' for better readability.

The checking is currently limited to images and RAWs (as defined in Daminion Media Format).

-Juha

Attached File(s)


1


User is offline Wilfried 

  • Bronze Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 200
  • Joined: 08-June 14

Posted 22 July 2017 - 02:20 PM (#4)

Great idea Juha!

I assume, it works only for the server version of Damion, since stand-alone uses a different database, correct?
Wilfried
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 22 July 2017 - 06:43 PM (#5)

Hi Wilfried,

You were correct.

It was very straightforward to add support for the Sqlite database (= standalone) – only the call to open the database is different. For standalone version use options
-l, --sqlite            use sqlite database (standalone) instead of server
-c DB, --catalog DB     relative pathname of the local catalog (with .dmc)

Example command line assuming you have the Python code in your home directory and Daminion catalog at Pictures
C:\Users\user> python DamScan.py -l -c Pictures\DaminionCatalog.dmc

The new version is attached.

-Juha

PS. I had only very limited test data for the local catalog.

Attached File(s)


0


User is offline Wilfried 

  • Bronze Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 200
  • Joined: 08-June 14

Posted 22 July 2017 - 09:46 PM (#6)

Thanks a lot Juha. I will give it a try, when time allows.
Wilfried
0


User is offline Wilfried 

  • Bronze Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 200
  • Joined: 08-June 14

Posted 23 July 2017 - 02:05 PM (#7)

View PostJuha, on 19 July 2017 - 05:29 PM, said:

As a reference, I have 20.500 items in my active catalog and I got roughly 1.600 messages

Just for a quick estimate: How much time did you need to scan those 20,500 items? After some little hurdles, I got it to work and currently for 156,069 elements to be scanned .... Posted Image

It seems to me, if you do not have any linked items and forget to specify -g, you will get theses error messages:

C:\Users\User>python DamScan.py -l -c Pictures\DaminionCatalogAlle.dmc
ImageA  Dir 	ImageB  Tag 	ValueA/Missing A                ValueB
Traceback (most recent call last):
  File "DamScan.py", line 392, in 
    main()
  File "DamScan.py", line 386, in main
    catalog.ScanCatalog()
  File "DamScan.py", line 320, in ScanCatalog
    while self.NextImage():
  File "DamScan.py", line 312, in NextImage
    self.__image = DamImage(self, row[0])
  File "DamScan.py", line 70, in __init__
    self.ImageName = row[0]
TypeError: 'NoneType' object is not subscriptable



The first and so far only mismatch is this:

C:\Users\User>python DamScan.py -g -l -c Pictures\DaminionCatalogAlle.dmc
ImageA  Dir 	ImageB  Tag 	ValueA/Missing A                ValueB
030_27.jpg      <   	120803_8608_WBL_A55.JPG Categories      'Urlaub'


While the finding is correct for "030_27.jpg", the image "120803_8608_WBL_A55.JPG" is not member of any group. However, I cannot exclude, that was in the past. One thought: While most of my file names are unique (but can have pairs withe same name in different folders), some older ones, such as 030_27.jpg are not. Could this possibly confuse your program?

While I was writing this, the scan ended, but I am not sure, it really scanned the entire database, since the result is this:
C:\Users\User>python DamScan.py -g -l -c Pictures\DaminionCatalogAlle.dmc
ImageA  Dir 	ImageB  Tag 	ValueA/Missing A                ValueB
030_27.jpg      <   	120803_8608_WBL_A55.JPG Categories      'Urlaub'
121229_0839_WBL_A55.JPG <   	160901_0769_WBL_A77.JPG Keywords        'Via Loreto'
121229_0839_WBL_A55.JPG <   	160901_0769_WBL_A77.JPG Categories      'Kurioses'
Traceback (most recent call last):
  File "DamScan.py", line 392, in 
    main()
  File "DamScan.py", line 386, in main
    catalog.ScanCatalog()
  File "DamScan.py", line 324, in ScanCatalog
    FromList = self.__image.LinkedFrom()
  File "DamScan.py", line 170, in LinkedFrom
    return self.__bottomItems()
  File "DamScan.py", line 157, in __bottomItems
    img = DamImage(self.__db, r[0])
  File "DamScan.py", line 70, in __init__
    self.ImageName = row[0]
TypeError: 'NoneType' object is not subscriptable


Similar to the first result above, ImageA shows a correct mismatch, ImageB in the same line is not related to it.

The third finding in this example shows the same pair of file names as the second, but Categories 'Kurioses' does not appear in any of those and apparently belongs to a completely different pair.

My intension is to find all grouped items (not only those with mismatching tags) and I am hoping for a small modification of the code to do that.

I any case, thanks a lot for your effort, Juha.

Wilfried
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 23 July 2017 - 07:16 PM (#8)

Hi Wilfried and thank you for your comments,

I need to look for your cases why the program terminates abnormally even -g option specified. I have sent you a PM to debug the issues.

Daminion database has some ghost entries from deleted files, but those entries should be flagged as deleted and my program ignores those entries.

With -v/--verbose option will print all items, but unfortunately it prints a line for each tag type plus some other information, so the output is too cluttered. I will take a look and see, how I can only print the hierarchy without comparing the tags.

-Juha
0


User is offline Wilfried 

  • Bronze Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 200
  • Joined: 08-June 14

Posted 23 July 2017 - 07:43 PM (#9)

Thanks Juha, response to PM ist on the way ...

View PostJuha, on 23 July 2017 - 07:16 PM, said:

With -v/--verbose option will print all items, but unfortunately it prints a line for each tag type plus some other information, so the output is too cluttered. I will take a look and see, how I can only print the hierarchy without comparing the tags.


I also tried the verbose and found out, it crashes with item 49530. Possibly a counter overflow?


48385 (49529, '120716_7968_WBL_A55.JPG', 0)
48386 (49530, '120716_7969_WBL_A55.JPG', 0)
Traceback (most recent call last):
  File "DamScan.py", line 392, in 
    main()
  File "DamScan.py", line 386, in main
    catalog.ScanCatalog()
  File "DamScan.py", line 324, in ScanCatalog
    FromList = self.__image.LinkedFrom()
  File "DamScan.py", line 170, in LinkedFrom
    return self.__bottomItems()
  File "DamScan.py", line 157, in __bottomItems
    img = DamImage(self.__db, r[0])
  File "DamScan.py", line 70, in __init__
    self.ImageName = row[0]
TypeError: 'NoneType' object is not subscriptable


Wilfried
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 23 July 2017 - 08:24 PM (#10)

Hi!

It looks more like a memory problem, because I got up to 76559 items before crash. I'll take a look at this.

-Juha
0


User is offline Wilfried 

  • Bronze Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 200
  • Joined: 08-June 14

Posted 23 July 2017 - 08:33 PM (#11)

Certainly something specific to each computer, but I should have plenty of memory (8GB on Windows 10 pro 64bit) which is never completely used. I was watching the memory usage of Python and never went beyond 11MB, if I remember correctly.

Is there any option to limit memory usage? Some environment variable or something similar?

Wilfried
0


User is offline Wilfried 

  • Bronze Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 200
  • Joined: 08-June 14

Posted 24 July 2017 - 11:56 AM (#12)

View PostJuha, on 23 July 2017 - 07:16 PM, said:

... With -v/--verbose option will print all items, but unfortunately it prints a line for each tag ....


Juha, I suggest the following change at line 311 to make the verbose option more useful:

print("\r", self.__counter, row, end="")


Even though each iteration will be printed, each line overwrites the previous one. The two additional parameters are "\r"=> carriage return; end="" => no line feed at end of line. That way you will see only the running counts and file names without the screen to be filled and rolling up.
Wilfried
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 25 July 2017 - 08:18 PM (#13)

Thank you Wilfried for helping to debug the program. This is now a release, what I can call a beta release.

The updated options are
usage: DamScan.py [-h] [-c DBNAME] [-s SERVER] [-p PORT] [-u USER] [-g] [-l]
                  [-f] [-i] [-v] [--version]

Search inconcistent tags from a Daminion database.

optional arguments:
  -h, --help            show this help message and exit
  -c DBNAME, --catalog DBNAME
                        Daminion catalog name [NetCatalog]
  -s SERVER, --server SERVER
                        Postgres server [localhost]
  -p PORT, --port PORT  Postgres server port [5432]
  -u USER, --user USER  Postgres user/password [postgres/postgres]
  -g, --group           Use groups/stacks instead of image links
  -l, --sqlite          Use Sqlite (= standalone) instead of Postgresql (=server)
  -f, --fullpath        Print full directory path and not just file name
  -i, --id              Print database id after the filename
  -v, --verbose         verbose output
  --version             Display version information and exit.

If you are using the standalone version, don't forget to add .dmc to the catalog name. Use format -c=... if you are not in the same folder as the catalog.
python PycharmProjects\DamScan\DamScan.py -v -l -g -c="Pictures\test - Copy.dmc"

If you have questions or comments, write to the forum or send a PM.
-Juha

Attached File(s)


0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 30 July 2017 - 08:24 AM (#14)

Daminion also allows you to link or group associated items together, but there are no built-in tools for checking the consistency of the meta data for the linked or grouped items. DamScan.py solves this problem and reports inconsistencies in metadata for Daminion server and standalone catalogs.

Great thanks to Wilfried and Uwe for testing my program and commenting the documentation. Here is what I can call the first official version of the program. You need to rename DamScan.txt to DamScan.py after downloading. See detailed instructions and options in the manual. The program is also available from GitHub.

-Juha

Attached File(s)


1


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 02 August 2017 - 07:58 AM (#15)

Hi!

When importing the output file into Excel, you have to select in import wizard at Step 1 File origin: 65001 : Unicode (UTF-8). This will import the accented and diacritic letters correctly.

There is also an updated version in Github. It doesn't contain any new features, just bug fixes to few exceptional cases.

-Juha
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 10 August 2017 - 06:50 PM (#16)

Hi!

A new release for those who have been using the tool and who is doesn't want "valid" differences between grouped or linked items to be reported.

Quote from the documentation
The option -a specifies a configuration file that contains acknowledged differences between linked or grouped media items. The differences listed in this file are excluded from the output.

The new version and the updated documentation are available from Github.

-Juha
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 08 September 2017 - 06:48 PM (#17)

Hi!

A new version of the tool is available in GitHub. Now it's possible to save the parameters in an INI file. Collections can now also be compared and then there are some bug fixes and performance improvements.

-Juha
0


User is offline Juha 

  • Silver Member
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 329
  • Joined: 08-December 11

Posted 27 September 2017 - 06:34 AM (#18)

Hi,

Added support for "Title", "Description" and "Comments". The latest version and documentation is available in GitHub.

-Juha
0


Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users