-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs import adds duplicate entries in taxa tree master #52
Comments
Sample duplicate data from select taxon_id, t.date_updated, author_id, genus_id, author_name, genus_name, species, bt.bugs_trace_id
from tbl_taxa_tree_master t
join tbl_taxa_tree_genera using (genus_id)
join tbl_taxa_tree_authors using (author_id)
left join bugs_import.bugs_trace as bt
on bt.sead_table = 'tbl_taxa_tree_master'
and bt.sead_reference_id = taxon_id
where author_name = '(L.)'
and genus_name = 'Elaphrus'
and species = 'riparius'
Imported data seems to be assigned to existing taxon: select *
from tbl_abundances
join tbl_analysis_entities using (analysis_entity_id)
join tbl_datasets using (dataset_id)
where taxon_id in (28965, 40204) No item assigned to 40204. |
Same data from BugsCEP: select *
from "INDEX"
where "AUTHORITY" = '(L.)'
and "GENUS" = 'Elaphrus'
and "SPECIES" = 'riparius'
This specie has a synonym: select *
from "TSynonym"
where "SynAuthority" = '(L.)'
and "SynGenus" = 'Elaphrus'
and "SynSpecies" = 'riparius'
select *
from "INDEX"
where "CODE" in ('1.0120030000000000', '1.0120045000000000')
Motsvarande poster i select taxon_id, family_name, genus_name, species, author_name, taxonomic_code
from tbl_taxa_tree_master
join tbl_taxonomic_order using (taxon_id)
join tbl_taxa_tree_genera using (genus_id)
join tbl_taxa_tree_authors using (author_id)
join tbl_taxa_tree_families using (family_id)
where taxonomic_code in (1.0120030000, 1.0120045000)
select taxon_id, family_name, genus_name, species, author_name, taxonomic_code
from tbl_taxa_tree_master
left join tbl_taxonomic_order using (taxon_id)
join tbl_taxa_tree_genera using (genus_id)
join tbl_taxa_tree_authors using (author_id)
join tbl_taxa_tree_families using (family_id)
where family_name = 'CARABIDAE'
and genus_name = 'Elaphrus'
and author_name in ('(L.)', 'Mäklin')
select bugs_table, sead_table, sead_reference_id, bugs_data, manipulation_type
from bugs_import.bugs_trace
where translated_compressed_data like '%Elaphrus%'
and bugs_table = 'TSynonym'
|
The bug can be reproduced with the following small BugsCEP database. The data is created using insert into "INDEX" ("CODE", "FAMILY", "GENUS", "SPECIES", "AUTHORITY") values ('1.0120045000000000', 'CARABIDAE', 'Elaphrus', 'tuberculatus', 'Mäklin');
insert into "INDEX" ("CODE", "FAMILY", "GENUS", "SPECIES", "AUTHORITY") values ('1.0120030000000000', 'CARABIDAE', 'Elaphrus', 'riparius', '(L.)');
insert into "INDEX" ("CODE", "FAMILY", "GENUS", "SPECIES", "AUTHORITY") values ('23.028001499999998', 'STAPHYLINIDAE', 'Eucnecosum', 'brachypterum (grp)', '(Grav.)');
insert into "INDEX" ("CODE", "FAMILY", "GENUS", "SPECIES", "AUTHORITY") values ('93.015056999999999', 'CURCULIONIDAE', 'Otiorhynchus', 'nodosus', '(Möll.)');
insert into "INDEX" ("CODE", "FAMILY", "GENUS", "SPECIES", "AUTHORITY") values ('40.002102000000001', 'SCIRTIDAE', 'Ptilodactyla', 'exotica', 'Chapin');
insert into "TBiblio" ("REFERENCE", "AUTHOR", "TITLE", "Notes") values ('Bell & Walker (2005)', 'Bell, M. & Walker M.J.C. (1992)', 'Late Quaternary Environmental Change - Physical and Human Perspectives (Second Edition). Longman, Essex.', NULL);
insert into "TBiblio" ("REFERENCE", "AUTHOR", "TITLE", "Notes") values ('Bohme 2005', 'Böhme, J. (2005)', 'Die Köfer Mitteleuropas. K. Katalog (Faunistiche Übersicht) (2nd ed.). Spektrum Academic, Munich.', '(revised version of Lucht 1987)');
insert into "TBiblio" ("REFERENCE", "AUTHOR", "TITLE", "Notes") values ('Morris 2006', 'Morris, M. (2006)', 'Checklist of beetles of the British Isles, Curculionidae. <www.coleopterist.org.uk/curculionidae-list.htm>.', NULL);
insert into "TBiblio" ("REFERENCE", "AUTHOR", "TITLE", "Notes") values ('Strand 1946', 'Strand, A. (1946)', 'Nord Norges Coleoptera. Tromsö Museums Arshefter, Naturhistorisk Avd. Nr. 34, 67(1). (629pp.)', NULL);
insert into "TSite" ("SiteCODE", "SiteName", "Region", "Country", "NGR", "LatDD", "LongDD", "Alt", "IDBy", "Interp", "Specimens") values ('SITE000253', 'Håkulls Mosse, Kullaberg', 'Skåne', 'Sweden', NULL, 56.2999992, 12.5333338, 125, 'Lemdahl', 'Kullen Peninsula, see also Björkeröds mosse.', NULL);
insert into "TCountsheet" ("CountsheetCODE", "CountsheetName", "SiteCODE", "SheetContext", "SheetType") values ('COUN000144', 'Hakullsmosse_bugsdata.XLS', 'SITE000253', 'Stratigraphic sequence', 'Abundances');
insert into "TSample" ("SampleCODE", "SiteCODE", "X", "Y", "ZorDepthTop", "ZorDepthBot", "RefNrContext", "CountsheetCODE") values ('SAMP000546', 'SITE000253', NULL, NULL, NULL, NULL, 'B8:6/6', 'COUN000144');
insert into "TDatesMethods" ("Abbrev", "Method", "Type", "SortOrder") values ('GeolPer', 'Geological period', 'Period', 2);
insert into "TDatesPeriod" ("PeriodDateCODE", "SampleCODE", "Uncertainty", "PeriodCODE", "DatingMethod", "Notes") values ('PERI005175', 'SAMP000546', NULL, 'LG', 'GeolPer', NULL);
insert into "TFossil" ("FossilBugsCODE", "CODE", "SampleCODE", "Abundance") values ('FOSS014299', '93.015056999999999', 'SAMP000546', 3);
insert into "TFossil" ("FossilBugsCODE", "CODE", "SampleCODE", "Abundance") values ('FOSS144182', '23.028001499999998', 'SAMP000546', 30);
insert into "TLookupCountsheetContext" ("SheetContext", "SortOrder") values ('Stratigraphic sequence', 3);
insert into "TLookupCountsheetTypes" ("CountsheetType", "SortOrder") values ('Abundances', 1);
insert into "TPeriods" ("PeriodCODE", "PeriodName", "PeriodType", "PeriodDesc", "PeriodRef", "PeriodGeog", "Begin", "BeginBCAD", "End", "EndBCAD", "YearsType") values ('LG', 'Lateglacial', 'Geological', 'Cold period after the last Glaciation. Pollen Zones I-III', 'Bell & Walker (2005)', 'Europe', 13500, 'BP', 10000, 'BP', 'C14');
insert into "TSiteOtherProxies" ("OtherProxyID", "SiteCODE", "HasPollen", "HasPlantMacro", "HasDiatoms", "HasChironomids", "HasSoilChemistry", "HasIsotopes", "HasAnimalBones", "HasArchaeology", "HasMolluscs") values (1, 'SITE000253', 1, 0, 0, 0, 0, 0, 0, 0, 0);
insert into "TSynonym" ("CODE", "SynGenus", "SynSpecies", "SynAuthority", "Ref", "Notes") values ('93.015056999999999', 'Otiorhynchus', 'dubius', '(Ström.)', 'Bohme 2005', NULL);
insert into "TSynonym" ("CODE", "SynGenus", "SynSpecies", "SynAuthority", "Ref", "Notes") values ('93.015056999999999', 'Otiorhynchus', 'maurus', '(Gyllenhal) non (Marsham)', 'Morris 2006', NULL);
insert into "TSynonym" ("CODE", "SynGenus", "SynSpecies", "SynAuthority", "Ref", "Notes") values ('23.028001499999998', 'Arpedium', NULL, NULL, 'Strand 1946', NULL);
insert into "TSynonym" ("CODE", "SynGenus", "SynSpecies", "SynAuthority", "Ref", "Notes") values ('1.0120045000000000', 'Elaphrus', 'riparius', '(L.)', 'Lindroth 1985', 'regards this as a synonym.');
insert into "TTaxoNotes" ("CODE", "Ref", "Data") values ('40.002102000000001', 'Bohme 2005', 'Genus not listed.');
|
This is a minimal BugsCEP database that reproduces the error: insert into "INDEX" ("CODE", "FAMILY", "GENUS", "SPECIES", "AUTHORITY") values
('1.0120045000000000', 'CARABIDAE', 'Elaphrus', 'tuberculatus', 'Mäklin');
('1.0120030000000000', 'CARABIDAE', 'Elaphrus', 'riparius', '(L.)');
insert into "TSynonym" ("CODE", "SynGenus", "SynSpecies", "SynAuthority", "Ref", "Notes") values
('1.0120045000000000', 'Elaphrus', 'riparius', '(L.)', 'Lindroth 1985', 'regards this as a synonym.'); |
Import INDEX itemsThe mapping between items in BugsCEP The items already exists in the SEAD database, so no new items are inserted. select *
from tbl_taxonomic_order
where taxonomic_code in (1.0120030000, 1.0120045000)
|
Import Synonym itemsEach BugsCEP synonym item should have a corresponding record in SEAD table
If a synonym is previously imported (exists in the import trace) then nothing needs to be done. Otherwise a new species association is created comprising of:
The target species ( |
Find or create association's target speciesFetch target species ( select taxon_id
from tbl_taxonomic_order
where taxonomic_code in (1.0120045000)
sead_bugs_import/src/main/java/se/sead/bugsimport/speciessynonyms/SynonymCreator.java Lines 47 to 54 in 62a882c
select taxon_id, author_name, genus_name, species
from tbl_taxa_tree_master
join tbl_taxa_tree_genera using (genus_id)
join tbl_taxa_tree_authors using (author_id)
where taxon_id = 28966
|
Find or create association's source speciesFinding the source species is basically a simple search in SEAD of the given species name ( select taxon_id, author_name, genus_name, species
from tbl_taxa_tree_master
join tbl_taxa_tree_genera using (genus_id)
join tbl_taxa_tree_authors using (author_id)
where species = 'riparius'
and genus_name = 'Elaphrus'
and author_name = '(L.)' Note that this species exists in the SEAD database. The bugs import, however, uses the authority name given by the target species ( sead_bugs_import/src/main/java/se/sead/bugsimport/speciessynonyms/SynonymSpeciesManager.java Lines 147 to 156 in 62a882c
Since the combination ( This results in a duplicate species record ( |
We have two options to correct the issue:
|
Duplicates can be identified by this query:
The text was updated successfully, but these errors were encountered: