-
Notifications
You must be signed in to change notification settings - Fork 45
Import Microsoft Word Transcript into R : Shorter Method
trinker edited this page Aug 23, 2012
·
1 revision
If your transcripts are in a Microsoft Word format this tutorial will demonstrate one procedure for cleaning and importing your data into R for use with qdap. This method is shorter and automates most of the parsing for the researcher. If this method (relies on read.transcript) fails the researcher will have to use the alternative method and do the parsing by hand.
###The following video demonstrates how to clean a Microsoft Word based transcript and read it into R.
Video
MS Word Transcript and R Script (zip file)
library(qdap)
dat <- read.transcript(file = "Test.xlsx", header = FALSE,
col.names=c("person", "dialogue"))
htruncdf(dat,,50)
#use rm_row to remove between row annotations
dat <- rm_row(dataframe = dat, search.column = "person", terms = c("[Cro", "[St"))
dat #look at it
#use column number instead
rm_row(dat, 1, c("[Cro", "[St"))
#The dash argument: see also ellipsis & quote2bracket arguments
args(read.transcript) #function arguments
dat <- read.transcript(file = "Test.xlsx", header = FALSE,
col.names=c("person", "dialogue"), dash = "(pause)")
left.just(rm_row(dat, 1, c("[Cro", "[St")), 2)The bracketX and bracketXtract functions
examp2 <- examp2 <- structure(list(person = structure(c(1L, 2L, 1L, 3L), .Label = c("bob",
"greg", "sue"), class = "factor"), text = c("I love chicken [unintelligible]!",
"Me too! (laughter) It's so good.[interupting]", "Yep it's awesome {reading}.",
"Agreed. {is so much fun}")), .Names = c("person", "text"), row.names = c(NA,
-4L), class = "data.frame")
examp2
bracketX(examp2$text, 'square')
bracketX(examp2$text, 'curly')
bracketX(examp2$text)
examp2
bracketXtract(examp2$text, 'square')
bracketXtract(examp2$text, 'curly')
bracketXtract(examp2$text)
paste2(bracketXtract(examp2$text, 'curly'), " ")