Things are coming along nicely.
There’s just so much I can handle with programming and software that it makes sense, to me, to try and handle that in advance before pushing the project open. One of the biggest issues I have been encountering is the Wade-Giles to Pinyin conversion. I really didn’t want to dump the bulk of that on the community. I "wasted" a bit of time trying to normalize the Wade-Giles names to correct some OCR issues and some of the values in the novel proper, a number of which are incorrect or inconsistently accented.
And then there is the fuss of so many partial names used. For example, Which Zhang hung his head? Which Cao commanded? Easy enough for us to make sense of in context, but not so much for a program. Especially when that program is aware of a whole mess o' Caos. Originally I thought we might extend these names so the program can recognize them properly, but I think we may have a better solution.
Using data in the encyclopedia database, which includes all officers and the chapters in which they appear—and is rather close to complete—I was able to create a pretty stupid complex set of logic which can download all their name data and create a series of filters which starts with complete names and name combinations, weighted by how important that character is to the novel (defined by how many chapters they appear in). It also creates versions of their name with properly accented Wade-Giles and without some accents, and with wild-cards to deal with the difference between the likes of Ts'ao and Ts‘ao. And then, with additional logic which looks for proper name separation (to avoid mixing up with other proper nouns and language), runs through the whole routine and replaces names for the chapter loaded, using only officers who appear in the chapter. It looks
like it is working extremely well, after some tuning.
What’s pretty cool is that I can probably reverse this to convert all names back into proper Wade-Giles. Along with all proper nouns once all of those details are properly documented. It makes me happy to solve a lot of this in advance.
- Novel Project Preview
- novel_temp.png (105 KiB) Viewed 109 times
This is a screenshot of what I’m using to preview and improve pre-processing code and routines. Not what it is planned to look like. Right now it is set to highlight recognized and properly completed quotes (inconsistent quotation usage is an issue) and to highlight officer names. And other various errors it picks up, of which there are none shown.
Proper nouns are still a bit of a challenge. I'm thinking, at this point, that I can probably write some additional code which scans through the document to find all other instances of Wade-Giles, with all officer names now protected, and scoops them all up into a haphazard proper nouns database. From there, incorrect entries, or entries which should not be tied to proper nouns, can be handled manually (simply by removing the [Proper Nouns] bracket marker). And, perhaps, with a tool, we can specify one proper noun to be a duplicate of another, merging them with the incorrect entry adopting the language of the correct one; change the proper noun tag, having it update automatically in the novel; and generally start with a lot of progress already made. Seems like it might make sense to have a large database to clean up and narrow down rather than no database to gradually and painstakingly build up.
I've been thinking of how to handle odd names for proper nouns. It seems like we would like to be able to specify one while not necessarily always using the same language in the novel. For example, Hsich‘uan seems to be frequently translated as “the [cardinal direction]” in colloquial language, and has overlap with Ssŭch‘uan. But it seems useful to maintain a relationship to the original term, which would allow it to be referenced with a tool-tip and additional information. So maybe we can end up with something like [Hechuan:alternative text] which allows association with the proper noun in the database but also display of any desired text.
It seems like most of the initial concerns can be covered with special handling of names and proper nouns. And most anything else introduced gradually after the project launches.
I've also been pondering how to handle notes. Maybe it makes sense to have the ability to leave community notes and individual notes. That way we can have novel notes which are generic and cover basic information shown in the novel, which anyone can edit and improve, but also the ability for someone to add their own personal note with a much more specific and individual reflection. This form of note should be more unique and would belong to and be attributed to the author.