The New Import Script

Started by Lotus
16 replies

Site Administrator
We've implemented a new, multi part script as a part of image Maintenace on PonerPics. The script has two functions. First, in updating existing "Derpibooru Imported" images on the website, and second, in importing more images.

First, the script will use the Derpibooru data dump to import new aliases and updates to tags from Derpibooru. Aliases that conflict with PonerPics aliases by creating infinite circular loops will be dissolved, and a report will be generated for staff to sort out the proper alias. The script will then update tag changes on existing "derpibooru imported images" to adjust to changes to tags on these images for their counterparts on Derpibooru in the time since their upload. These changes will be recorded in tag history. This does not affect changes to made to the tags on these images by users, and tags removed by users will remain removed. This script will also update legacy scores and favorites to reflect changes in the scores on the derpibooru counterparts. After the first massive update, this will run once per week.

The second function of the script is to import new images from derpibooru through the backend. These images will have the image number and upload dates of their derpibooru counterparts, and the scores and favorites recorded as "legacy scores." As essentially every one of these images has a "porter" counterpart, the script will automatically detect and merge these images, with any scores, favorites and comments transferring over. The image will have the tags of the derpibooru image, but the tag history of the porter image will transfer, and any user added tag will remain, and any user removed tag will remain removed. The merger process only effects porter images. This script uses the Derpibooru data dump, and so will run daily, but will not import images in real time. I am hopeful that it can be expanded to work in real time in the future, but that is as it is now. Catching up with delinquent imports will likely take some time; over a month at current estimations.

We will also be hiring new staff to help with aliases, duplicates, and general moderation. Further, more clearing of backlogged duplicates will take place once the import has caught up.

Post here for suggestions, questions, to report problems, or for general discussion. DM me or Megalith if you have any interest in joining staff.
Background Pony #8E2C
Retard me posted the thing in the wrong discussion thread. So let's try this again.

Also posting this here as requested in the Fan Site Alternative thread:

I first came across this bug when I was looking for the blood ravens tag in relation to warhammer 40k crossover images.
As you can see here, the tag is logged with 38 pictures, but the search only depicts 37 in total, even with all filters disabled. And if you then look at the tag changes, you see the entry of a pic that was made three days ago, with the picture itself being deleted.
There's a couple of more tags where I noticed a similar behaviour.

Site Administrator
@Background Pony #8E2C
The image count error was found during testing. I believe the conclusion was that Pupper wrote a script to correct the image count, though presumably it will have to wait until the import is caught up.

As for the tag changes on the deleted image: at this stage, the script is only supposed to be updating changes in tags and legacy scores. Tag changes are noted in tag history, hence why they show up. Those tags were not in the data dump that created that image, and were evidently added in when the tags were updated, evidently existing in the data dump. The image that you linked does not show the actual picture even to moderators. I attempted to manually restore the image, and received an error. I checked the Derpibooru counterpart, and that image was deleted for lack of relevance to mlp.

Because there is no picture there to see or restore, and because the image is marked as “missing or deleted images,” I think that the image was probably deleted on Derpibooru prior to the initial import, and that the image never had an actual uploaded picture on PonerPics; the tag update added previously missing metadata (the tags), and this image is the phantom 38th image counted as having the “blood raven” tag.

But what I haven’t conclusively disproven is that the image previously existed on PonerPics and the new script deleted it because its counterpart on Derpibooru was deleted between the time of the initial import and now. I do not think that that is the cause because the actual image should be present were it so. I’ll ask Pupper to look into that, but he’s a big busy now.
Background Pony #8E2C
Okay, maybe I picked an unlucky example. That's entirely possible. It doesn't appear to be a singular issue though. An anon posted a relevant picture in the thread.

Site Administrator
@Background Pony #8E2C
Alrigtht, I've started looking into the deleted images with tag changes through basically the same method as the poster on the 4chan thread (looking up tag change history for the "safe" tag). Every single deleted image does not have an actual image that shows up to moderators; all give error messages when I attempt to restore them; all are uploaded through "Derpi Importer — Missing and Deleted Images" rather than our usual Derpibooru Importer; and all of them are six years old (that's just where the tag changing script happens to be at this moment). These are empty pages with no images. All of this tells me that these are images that were both deleted off of Derpibooru at the time we made our mass import three years ago and these images were not among those backed up on Rome's archive from whence we took tens of thousands of images deleted off of derpibooru. I went to six of these image's counterparts on Derpibooru. All of them were deleted. Four were duplicates that redirected to their merged images (unfortunately ours do not redirect), one was deleted because it was a full page IDW scan, and one was deleted per Artist Take Down Request.

Everything is telling me that the import script is adding tags to pages representing images that were deleted on Derpibooru prior to Rome Silvanius archiving images. These images are empty, imageless pages that had no tags previously but do now.
Background Pony #43DC
It seems watched tags don't update to new aliases added in the last import. The new aliases must be manually added to your watched list in order to appear in a my:watched search.
Background Pony #E10D
Did the tag updating stop? I don't see any updates from Derpi Imported that is more recent than 2 months ago. Also some guy named Shutyourhole is changing the explicit tags on eqg images to safe.
Background Pony #43DC
Problem with the imported tag aliases: "pony pussy" is supposed to be aliased to "anatomically correct", however it is still present on >>6708531. The image will not appear in searches for anatomically correct, nor can pony pussy be removed and replaced with anatomically correct.
