POST: Progress Report on Data Loading April 19
Summary of loading progress weeks Apr 6 to 19 by Marta Mihoff
John provided three extract files:
sample_deployments.csv
sample_recoveries.csv
post_active_investigators_SD.
csv
Susan took the file of post_active_investigators and added a column for the institutioncode. Most of these were official codes from the orgainization's websites. Some we created when there was no official site as was the case for the Indian tribes.
Marta loaded these into a post schema using an R script
Marta created an experimental schema in the postunit database (schema p107)
- then attempted to load the data
There were numerous back and forth letters between Marta and John asking and answering questions about the POST sample data and the OTN tables. All these questions have been resolved and a load strategy has been decided upon for loading receiver metadata.
Marta developed several load scripts to take the data from the sample extracts and insert the records into the proper tables in the postunit database.
List of scripts:
post_load.r - load the csv extracts into the database
post_set_up_moorings.sql - build the experimental project schema
ext_pis.sql - take the augmented file which Susan created and add new institutioncodes to obis.institution_codes
- create a record for obis.otn_resources using a combination of data in the sample deploy file and the list of active investigators
ext_stations.sql - inserts the new STATION records from the sample deployment extract
ext_receivers_examine.sql - a script with several different queries which displays different ways of looking at the deploy/recover data
- this exposed several questions which needed to be answered plus exposed some records which should not be in the extract. This was caused by the researcher and not the POST extract process.
ext_receivers_insert.sql - insert the RECEIVER records
ext_receivers_lost_found.sql - do some extra processing for lost and found receivers
ext_download.sql - insert DOWNLOAD records
The loading was straightforward with no issues. All OTN requirements are met. And no new columns are required to represent POST data. The above scripts will form the basis for loading all the receiver data. Although each project set will have to be examined before loading to see if modifications will be needed.
Load Strategy for the rest of the RECEIVER metadata
- John to create an extracts of the entire deploy and recover tables
- John to make minor changes to the project id and project naming table and provide an extract of the entire list
- the project name extract will be augmented with project acronyms to be used for schema naming (Susan?)
- these files will be loaded as is into a postunit schema created to hold all the original extracts
- individual schema will be created as needed
- the data for each project will be extracted from the large tables (using project id) as needed and put into tables xxx.c_deploy and xxx.c_recover
- the above set of scripts will modified to load each schema
- the data will be verified
Data loading will be done by Marta and Brian
Data verification will be done by Marta
John will provide the three extract files and then start to look at tagging metadata.
Summary of discussions:
Recoveries
- although Recovery information is in the extract file OTN will not be processing this for presentation in OTN products. Other than recover dates and noting lost and found receivers
- all the recovery information will be kept in the appropriate project schema for possible future product development. This is exactly how OTN handles this recovery metadata now.
Recover lats and longs
- values are not reliable as there were several cases where they were the same as deployment values but the notes made it clear that they were sometimes very far apart (kilometres)
It all went extremely well and I anticipate no problems loading the receiver metadata.