Get information for variable of interest (e.g., clinical endpoints) from long data frame of protocol- or result-related trial information as returned by dfTrials2Long. Parameters `valuename`, `wherename` and `wherevalue` are matched using Perl regular expressions and ignoring case.
Arguments
- df
A data frame (or tibble) with four columns (`_id`, `identifier`, `name`, `value`) as returned by dfTrials2Long
- valuename
A character string for the name of the field that holds the value of the variable of interest (e.g., a summary measure such as "endPoints.*tendencyValue.value")
- wherename
(optional) A character string to identify the variable of interest among those that repeatedly occur in a trial record (e.g., "endPoints.endPoint.title")
- wherevalue
(optional) A character string with the value of the variable identified by `wherename` (e.g., "response")
Value
A data frame (or tibble, if tibble
is loaded)
that includes the values of interest, with columns
`_id`, `identifier`, `name`, `value` and `where` (with the
contents of `wherevalue` found at `wherename`).
Contents of `value` are strings unless all its elements
are numbers. The `identifier` is generated by
function dfTrials2Long to identify matching elements,
e.g endpoint descriptions and measurements.
Examples
dbc <- nodbi::src_sqlite(
dbname = system.file("extdata", "demo.sqlite", package = "ctrdata"),
collection = "my_trials",
flags = RSQLite::SQLITE_RO)
dfwide <- dbGetFieldsIntoDf(
fields = c(
## ctgov - typical results fields
# "clinical_results.baseline.analyzed_list.analyzed.count_list.count",
# "clinical_results.baseline.group_list.group",
# "clinical_results.baseline.analyzed_list.analyzed.units",
"clinical_results.outcome_list.outcome",
"study_design_info.allocation",
## euctr - typical results fields
# "trialInformation.fullTitle",
# "baselineCharacteristics.baselineReportingGroups.baselineReportingGroup",
# "trialChanges.hasGlobalInterruptions",
# "subjectAnalysisSets",
# "adverseEvents.seriousAdverseEvents.seriousAdverseEvent",
"endPoints.endPoint",
"subjectDisposition.recruitmentDetails"
), con = dbc
)
#> Querying database (4 fields)...
dflong <- dfTrials2Long(df = dfwide)
#> clinical_results.outcome_list.outcome
#> study_design_info.allocation
#> endPoints.endPoint
#> subjectDisposition.recruitmentDetails
#>
#> .
#> .
#>
#> Total 7096 rows, 79 unique names of variables
## get values for the endpoint 'response'
dfName2Value(
df = dflong,
valuename = paste0(
"clinical_results.*measurement.value|",
"clinical_results.*outcome.measure.units|",
"endPoints.endPoint.*tendencyValue.value|",
"endPoints.endPoint.unit"
),
wherename = paste0(
"clinical_results.*outcome.measure.title|",
"endPoints.endPoint.title"
),
wherevalue = "response"
)
#> Returning values for 2 out of 12 trials
#> # A tibble: 22 × 5
#> `_id` identifier name value where
#> <chr> <chr> <chr> <chr> <chr>
#> 1 2012-003632-23-CZ 1 endPoints.endPoint.subjectAnalysisS… 7.0 Time…
#> 2 2012-003632-23-CZ 1 endPoints.endPoint.unit Days Time…
#> 3 2012-003632-23-CZ 2 endPoints.endPoint.unit At l… Dura…
#> 4 2012-003632-23-CZ 6.1 endPoints.endPoint.subjectAnalysisS… 0.63 Over…
#> 5 2012-003632-23-CZ 6.2 endPoints.endPoint.subjectAnalysisS… 0 Over…
#> 6 2012-003632-23-CZ 6.3 endPoints.endPoint.subjectAnalysisS… 0.65 Over…
#> 7 2012-003632-23-CZ 6.4 endPoints.endPoint.subjectAnalysisS… 0.59 Over…
#> 8 2012-003632-23-CZ 6.5 endPoints.endPoint.subjectAnalysisS… 0.60 Over…
#> 9 2012-003632-23-CZ 6 endPoints.endPoint.unit Over… Over…
#> 10 2012-003632-23-CZ 8 endPoints.endPoint.subjectAnalysisS… 78.6 Cumu…
#> # ℹ 12 more rows