POST: Progress Report on Data Loading April 19

Summary of loading progress weeks Apr 6 to 19 by Marta Mihoff

John provided three extract files:
sample_deployments.csv
sample_recoveries.csv
post_active_investigators_SD.

csv



Susan took the file of  post_active_investigators and added a column for the institutioncode. Most of these were official codes from the orgainization's websites. Some we created when there was no official site as was the case for the Indian tribes.



Marta loaded these into a post schema using an R script



Marta created an experimental schema in the postunit database (schema p107)


- then attempted to load the data



There were numerous back and forth letters between Marta and John asking and answering questions about the POST sample data and the OTN tables. All these questions have been resolved and a load strategy has been decided upon for loading receiver metadata.



Marta developed several load scripts to take the data from the sample extracts and insert the records into the proper tables in the postunit database.



List of scripts:

post_load.r                          - load the csv extracts into the database


post_set_up_moorings.sql    - build the experimental project schema


ext_pis.sql                          - take the augmented file which Susan created and add new institutioncodes to obis.institution_codes


- create a record for obis.otn_resources using a combination of data in the sample deploy file and the list of active investigators


ext_stations.sql                   - inserts the new STATION records from the sample deployment extract


ext_receivers_examine.sql   - a script with several different queries which displays different ways of looking at the deploy/recover data


- this exposed several questions which needed to be answered plus exposed some records which should not be in the extract. This was caused by the researcher and not the POST extract process.


ext_receivers_insert.sql        - insert the RECEIVER records


ext_receivers_lost_found.sql - do some extra processing for lost and found receivers


ext_download.sql                 - insert DOWNLOAD records



The loading was straightforward with no issues. All OTN requirements are met. And no new columns are required to represent POST data. The above scripts will form the basis for loading all the receiver data. Although each project set will have to be examined before loading to see if modifications will be needed.



Load Strategy for the rest of the RECEIVER metadata

- John to create an extracts of the entire deploy and recover tables


- John to make minor changes to the project id and project naming table and provide an extract of the entire list


- the project name extract will be augmented with project acronyms to be used for schema naming (Susan?)


- these files will be loaded as is into a postunit schema created to hold all the original extracts


- individual schema will be created as needed


- the data for each project will be extracted from the large tables (using project id) as needed and put into tables xxx.c_deploy and xxx.c_recover


- the above set of scripts will modified to load each schema


- the data will be verified



Data loading will be done by Marta and Brian


Data verification will be done by Marta



John will provide the three extract files and then start to look at tagging metadata.



Summary of discussions:

Recoveries


- although Recovery information is in the extract file OTN will not be processing this for presentation in OTN products. Other than recover dates and noting lost and found receivers


- all the recovery information will be kept in the appropriate project schema for possible future product development. This is exactly how OTN handles this recovery metadata now.


Recover lats and longs


- values are not reliable as there were several cases where they were the same as deployment values but the notes made it clear that they were sometimes very far apart (kilometres)



It all went extremely well and I anticipate no problems loading the receiver metadata.