Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying place names in Ulysses #27

Open
cderven opened this issue Mar 26, 2017 · 5 comments
Open

Identifying place names in Ulysses #27

cderven opened this issue Mar 26, 2017 · 5 comments

Comments

@cderven
Copy link

cderven commented Mar 26, 2017

Tagging the corpus to identify place names might form an interesting parallel to some of the work that you’ve been doing to date. Would others see it as sufficiently interesting to identify toponyms, provide some basic geocoding, and then possibly link the locations to external sources like GeoNames?

I’m thinking of the convention in TEI to use <placename> to identify toponyms and <place> to contain data about locations and then linking the two using ids. So, taking an example from Wandering Rocks:

<listplace>
...
 <place xml:id="wrL19" type="route">
      <placeName>Aldborough House</placeName>
      <location>
        <settlement> Dublin</settlement>
        <country> Ireland</country>
        <geo>53.39979659999999, -6.2435338</geo>
      </location>
  </place>
</listplace>

and

<p> <lb n="100083”/> Near <placeName ref="#wrL19">Aldborough house</placeName> Father Conmee thought of that spendthrift 
<lb n="100084"/>nobleman. And now it was an office or something. </p>

<place> can be used in conjunction with a type, so in this example a route. Is this useful?

Using would allow toponym identification and would contain the data used for geocoding.

Ronan alerted me also to this very interesting project: https://muziejus.github.io/wandering-rocks/.

@yellwork
Copy link
Collaborator

Sorry it took me a few days to chime in here, Caleb, but this is a terrific idea. It would be incredible to get some geotagging into the edition. Have you a sense of anyone who might have compiled a dossier of the place names (if not the <placeName>s) that are mentioned in the book? As always, I’m thinking how great it would be if we could automate some of this labour – or take advantage of existing scholarship on precisely this topic. I know the Gifford and Slote annotations, for example, highlight and locate a good number of the place names…

@cderven
Copy link
Author

cderven commented Apr 7, 2017

I've taken longer respond to you Ronan! I haven't come across any full gazetteers for Ulysses. There may be a variety of potential options here though.

  1. Existing annotations like Gifford's or Slote's are definitely viable tools.
  2. Different geotagging tools (Named-Entity Recognition software, etc.) certainly allow partial automation of the process. There are issues around accuracy and precision and a wide divergence of opinion around their appropriateness for literary corpora but I think they're a useful first step.
  3. Crowdsourcing, which I see has been mentioned in Issue Retagging Joyce’s dialogue dash #9.

From my experience with past projects I think some combination of automated and manual process seems to work. I've encoded place names in Wandering Rocks using the convention above which I would be happy to merge with the file in the repository, if that's not jumping the gun. I think there may be room for a discussion too about how you may want to model geo elements in the corpus?

@yellwork
Copy link
Collaborator

yellwork commented May 5, 2017

These all sound like very promising suggestions, Caleb. (He says four weeks later.) Slote/Gifford certainly catch a lot but the challenge would be wading through the annotations to find the location-specific ones. I’ve no issue with us using named-entity recognition software and geotagging place names. One question I’d have would be how are the mentions of place names distinguished? I doubt that everything is just flattened, right?, whereby the ‘Dublin’ in ‘The Rocky Road to Dublin’ sung in ‘Nestor’ is the same as ‘the Ards of Down’ just recalled by Deasy or the mention of ‘Sandymount Strand’ as the setting of ‘Proteus’. (Or is that the work of separate encoding?)

I’d certainly be keen to see your encoded place names be merged into the WR file in the repository. Fire ahead!

@cderven
Copy link
Author

cderven commented May 11, 2017

There's certainly a need for a strong typology of place to disambiguate these different types of mentions, Ronan. With the small piece of work that I did, I used the model developed for the Literary Atlas of Europe.

I'll upload the work that I've done (probably next week) which may be a good starting place around a discussion of typology, modelling, etc.

@JonathanReeve
Copy link
Member

Hi Caleb,

This idea is great. You might be able to use Moacir's geolocations from his Wandering Rocks project, which look like they're up here. He says that he's noticed a few problems in Gifford that he's corrected in the data there.

Feel free to push to GitHub as you're working--no need to upload it all at once. Then send a pull request (instructions here) whenever you're at a stopping point.

I think this will be a great contribution to this edition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants