Get identifiers of deduplicated trial records
Source:R/dbFindIdsUniqueTrials.R
dbFindIdsUniqueTrials.Rd
Records for a clinical trial can be loaded from more than one register into a collection. This function returns deduplicated identifiers for all trials in the collection, respecting the register(s) preferred by the user. All registers are recording identifiers also from other registers, which are used by this function to provide a vector of identifiers of deduplicated trials.
Usage
dbFindIdsUniqueTrials(
preferregister = c("EUCTR", "CTGOV", "CTGOV2", "ISRCTN", "CTIS"),
prefermemberstate = "DE",
include3rdcountrytrials = TRUE,
con,
verbose = FALSE
)
Arguments
- preferregister
A vector of the order of preference for registers from which to generate unique _id's, default
c("EUCTR", "CTGOV", "CTGOV2", "ISRCTN", "CTIS")
- prefermemberstate
Code of single EU Member State for which records should returned. If not available, a record for DE or lacking this, any random Member State's record for the trial will be returned. For a list of codes of EU Member States, please see vector
countriesEUCTR
. Specifying "3RD" will return the Third Country record of trials, where available.- include3rdcountrytrials
A logical value if trials should be retained that are conducted exclusively in third countries, that is, outside the European Union. Ignored if
prefermemberstate
is set to "3RD".- con
A database connection object, created with
nodbi
. See section `1 - Database connection` in ctrdata.- verbose
If
TRUE
, prints out the fields of registers used to find corresponding trial records
Value
A named vector with strings of keys (field "_id") of records in the collection that represent unique trials, where names correspond to the register of the record.
Details
Note that the content of records may differ between registers (and, for "EUCTR", between records for different Member States). Such differences are not considered by this function.
Examples
dbc <- nodbi::src_sqlite(
dbname = system.file("extdata", "demo.sqlite", package = "ctrdata"),
collection = "my_trials",
RSQLite::SQLITE_RO)
#> RSQLite version has enabled accelerating docdb_create() and docdb_update() functions when used with value = <NDJSON file name>.
dbFindIdsUniqueTrials(con = dbc)[1:10]
#> Searching for duplicate trials...
#> - Getting all trial identifiers...
#> , 29 found in collection
#> - Finding duplicates among registers' and sponsor ids...
#> - 2 EUCTR _id were not preferred EU Member State record for 8 trials
#> - Keeping 3 / 8 / 5 / 8 / 3 records from EUCTR / CTGOV / CTGOV2 / ISRCTN / CTIS
#> = Returning keys (_id) of 27 records in collection "my_trials"
#> ISRCTN ISRCTN ISRCTN EUCTR
#> "12949496" "13281214" "17473621" "2012-003632-23-SE"
#> EUCTR EUCTR CTIS CTIS
#> "2014-002606-20-PT" "2014-003556-31-SE" "2022-501142-30-00" "2023-505613-24-00"
#> CTIS ISRCTN
#> "2024-510663-34-00" "20343063"