SUS2/paper_draft3.Rnw at master · whynotyet/SUS2 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
\documentclass{sigchi}

\pagenumbering{arabic}

<<setup, include=FALSE, cache=FALSE>>=
opts_chunk$set(fig.path='plots/', fig.align='center', fig.show='asis', echo=FALSE, comment = "", message = FALSE, tidy=FALSE, dev='pdf', out.width="\\linewidth")
options(replace.assign=TRUE)
library(knitr)
library(formatR)
library(ggplot2)
library(grid)
library(boot)
library(lsr)
library(psych)
setwd("~/Dropbox/r-code/SUS2/")
m.sus2 = read.csv("sus2_medstat.csv")
m.orisus = read.csv("ori_sus_medstat.csv")
m.revsus = read.csv("rev_sus_medstat.csv")
a.sus2 = read.csv("sus2_algo.csv")
a.orisus = read.csv("ori_sus_algo.csv")
a.revsus = read.csv("rev_sus_algo.csv")
SE = function(x){
    return(sqrt(var(x)/length(x)))
}
convSUS = function(x) {
    return(rowSums((x)*2.5))
}
convSUS2 = function(x) {
    return(rowSums(x)/.18)
}
@

% Load basic packages
\usepackage{balance}  % to better equalize the last page
\usepackage{graphics} % for EPS, load graphicx instead
%\usepackage{times}    % comment if you want LaTeX's default font
\usepackage{url}      % llt: nicely formatted URLs
\usepackage{booktabs}
\usepackage{rotating}

% llt: Define a global style for URLs, rather that the default one
\makeatletter
\def\url@leostyle{%
  \@ifundefined{selectfont}{\def\UrlFont{\sf}}{\def\UrlFont{\small\bf\ttfamily}}}
\makeatother
\urlstyle{leo}


% To make various LaTeX processors do the right thing with page size.
\def\pprw{8.5in}
\def\pprh{11in}
\special{papersize=\pprw,\pprh}
\setlength{\paperwidth}{\pprw}
\setlength{\paperheight}{\pprh}
\setlength{\pdfpagewidth}{\pprw}
\setlength{\pdfpageheight}{\pprh}

% Make sure hyperref comes last of your loaded packages,
% to give it a fighting chance of not being over-written,
% since its job is to redefine many LaTeX commands.
\usepackage[pdftex]{hyperref}
\hypersetup{
pdftitle={Critiquing the System Usability Scale from a Questionnaire Design Perspective: Beware of Acquiescence Bias},
pdfauthor={XXXXXXXXX REPLACE WITH AUTHOR NAMES XXXXXXXXXX},
pdfkeywords={System Usability Scale, SUS; response biases; usability evaluation; standardized questionnaires},
bookmarksnumbered,
pdfstartview={FitH},
colorlinks,
citecolor=black,
filecolor=black,
linkcolor=black,
urlcolor=black,
breaklinks=true,
}

% create a shortcut to typeset table headings
\newcommand\tabhead[1]{\small\textbf{#1}}


% End of preamble. Here it comes the document.
\begin{document}

\title{Critiquing the System Usability Scale from a Questionnaire Design Perspective: Beware of Acquiescence Bias}

\numberofauthors{2}
\author{
    \alignauthor 1st Author Name\\
    \affaddr{Affiliation}\\
    \affaddr{Street, City}\\
    \email{E-mail address}
    \alignauthor 2nd Author Name\\
    \affaddr{Affiliation}\\
    \affaddr{Street, City}\\
    \email{E-mail address}
}
% \author{
%   \alignauthor Ren\'{e} F. Kizilcec\\
%     \affaddr{Department of Communication}\\
%     \affaddr{Stanford University}\\
%     \email{kizilcec@stanford.edu}
%   \alignauthor Hendrik M\"{u}ller\\
%     \affaddr{User Experience Research}\\
%     \affaddr{Google, Inc.}\\
%     \email{hendrikm@google.com}
% }

\maketitle

\begin{abstract}
The System Usability Scale (SUS) is undoubtedly the most widely used measure of usability today. Numerous studies have assessed its psychometric properties and used it as a ``gold standard" in the development of alternative scales. Recent advances in questionnaire design research on acquiescence and other biases, however, now challenge some of the foundations of the SUS. In this note, we review literature on relevant questionnaire design aspects, inspect the SUS for biases, and using an experiment, show that it is vulnerable to significant acquiescence bias. We then propose an alternative scale, strongly rooted in the SUS, which conforms with recent insights from questionnaire design research, and provide an example of how the proposed scale outperforms the SUS in terms of measurement sensitivity.
\end{abstract}

\keywords{System Usability Scale, SUS; response biases; usability evaluation; standardized questionnaires}

%\terms{Human Factors; Design; Measurement}

\category{H.5.2.}{Information Interfaces and Presentation (e.g. HCI)}{User Interfaces–Evaluation/Methodology}

\section{Introduction}
With the increased focus on developing highly usable products, it has become very important to measure their perceived usability. Standardized questionnaires are widely used for such measurement in small and large usability studies. Commonly used questionnaires include the Computer User Satisfaction Inventory (CUSI) \cite{kirakowski1987}, the Questionnaire for User Interface Satisfaction (QUIS) \cite{chin1988}, the After Scenario Questionnaire (ASQ) \cite{lewis1991}, and the System Usability Scale (SUS) \cite{brooke1996}. The SUS has received the highest level of adoption, with hundreds of references across publications alone. The SUS is comprised of ten statements measured on a 5-point agreement scale, yielding a single score summarizing the usability assessment of the evaluated system. Over the years, the SUS has been used across a variety of systems. However, since its inception in 1986, research regarding the design of valid and reliable questionnaires has advanced significantly, with insights that now challenge some of the foundations of the SUS. Most relevant are insights related to acquiescence bias \cite{smith1967,sarisKrosnickShaeffer2010}, satisficing \cite{krosnick1991}, and scale lengths \cite{krosnickFabrigar1997, groves2004}.

The remainder of this note outlines advances in questionnaire design relevant to the SUS, reviews the original SUS in the context of those, proposes an alternative version to conform with these insights, and finally compares the SUS to the proposed alternative. Note that the authors' intention is not to reduce the number of items in the SUS, nor the identification and subsequent elimination of biases unrelated to acquiescence. Instead, the goal is to critique the wording of the statements and response options of the SUS on theoretical grounds with empirical evidence and propose an alternative scale.

\section{Related Work}

Although first published in 1996, the SUS was developed in 1986 by John Brooke at Digital Equipment Corporation (DEC). It was used as a ``quick and dirty" scale administered after usability studies on electronic office systems, such as DEC's VT100, a text-based terminal system. The SUS measures attitudes regarding the effectiveness, efficiency, and satisfaction with a system. The SUS is comprised of ten statements (see Table \ref{tab:items}) for which the respondent is asked to specify their level of agreement or disagreement on a symmetric 5-point Likert scale (established by Rensis Likert in 1932). Its endpoints are labeled with ``Strongly disagree" and ``Strongly agree", while additionally all five scale items are numbered from 1 to 5 (see Figure \ref{fig:originalScale}). All statements of the SUS need to be evaluated by respondents. Responses are consolidated into a single score to represent the global usability assessment for that system and to enable cross-system comparisons.

\begin{figure}[h]
\centering
\includegraphics[width=.65\columnwidth]{originalScale.png}
\caption{5-point agreement scale as used in the SUS.}
\label{fig:originalScale}
\end{figure}

One of the questionnaire biases researched extensively is that of acquiescence response bias: the tendency of a respondent to agree with a given statement independent of its substance \cite{smith1967}. When given a non-neutral statement, respondents are more likely to think of reasons why the statement is true, rather than affording cognitive effort to consider reasons for disagreement. This form of shortcutting the question answer process is often referred to as satisficing \cite{krosnick1991}. Furthermore, respondents with lower self-perceived status assume the questionnaire administrator agrees with the posed statement, resulting in deferential agreement bias \cite{sarisKrosnickShaeffer2010}. Similarly, respondents whose personality naturally skews towards agreeableness \cite{sarisKrosnickShaeffer2010} and those with lower cognitive ability \cite{krosnickNarayanSmith1996} are more likely to suffer from this bias. Acquiescence bias is the strongest when presented with binary agree/disagree, yes/no, or true/false answer options \cite{smith1967}; however, similar effects have been shown for agreement scales (such as the Likert scale) \cite{sarisKrosnickShaeffer2010}. Instead, non-leading questions with neutral scales should be used to measure latent constructs \cite{sarisKrosnickShaeffer2010}.

The design of answer scales with optimal labeling and length has been subject to much research. First, fully labeled scale points compared to those that just use numbers improve reliability and reduce bias \cite{groves2004}. Second, the choice of scale length and labels depends on the nature of the construct being measured, i.e., if it is unipolar or bipolar in nature. Unipolar constructs range from zero to an extreme amount and are best measured on a 5-point scale, while bipolar constructs range from an extreme negative to an extreme positive with a natural midpoint and are best measured on a 7-point scale. These guidelines improve a scale's reliability and maximizes data differentiation while minimizing the cognitive burden on respondents \cite{krosnickFabrigar1997}.

\section{Acquiescence Bias in the SUS}

In this section, we review the SUS in regards to acquiescence bias, first on theoretical grounds and then through empirical evaluation.

\subsection{Heuristic review}

Each of the ten items of the SUS is constructed of a non-neutral statement and an agreement scale, which is a design that has been shown to induce acquiescence \cite{smith1967}. For instance, consider item 3 of the SUS (Table \ref{tab:items}): ``I thought the system was easy to use." Respondents are likely to spend relatively more effort on finding reasons that confirm rather than contradict the statement. As a result, respondents tend to be more agreeable towards the ``easy to use" description in the statement. As another example, consider item 9 from the SUS: ``I felt very confident using the system". The loaded phrasing of this statement in combination with the endpoint-labeled agreement scale induces respondents to report higher confidence than they would otherwise. Note also that this item confounds two dimensions by containing ``very" and ``confident", as disagreement with feeling very confident is not equivalent to feeling unconfident, but could theoretically indicate extreme confidence. Additionally, the nature of the agreement scale leads respondents further towards agreement as disagreeing with anyone requires courage and cognitive effort \cite{sarisKrosnickShaeffer2010}. This effect may be especially strong, as the SUS is frequently administered at the end of usability studies without anonymity and in the presence of the researcher. These concerns apply to all SUS items as they are phrased in the same way.

Some of the items in the SUS have been reverse-keyed to ask about the same construct from opposite directions, as for example in items 3 (``easy to use") and 8 (``cumbersome to use"). It may be that potential acquiescence bias effects cancel each other out; however, this claim requires empirical evaluation. Nevertheless, this reverse-keyed approach is used only for a subset of the items in the SUS, hence, all other items (1, 2, 4, 5, 6, 7, 9, 10) remain vulnerable to significant acquiescence bias. While some items (e.g., 4 and 10) are reasonably similar, they effectively tap into different underlying constructs and are thus unlikely to mitigate any biases. Lastly, the SUS's agreement scale is offered as a bipolar scale with five points. As agreement/disagreement is bipolar, the appropriate scale length would be seven points to maximize reliability and data differentiation \cite{krosnickFabrigar1997}.

\subsection{Experimental Evaluation}

To provide empirical support for the principal claim that the SUS is vulnerable to acquiescence bias in particular, we ran an online survey experiment with 877 participants who evaluated an online learning system. Participants of a massive open online course were asked to complete an optional post-course survey. Respondents were randomly assigned to one of two groups: one received the original SUS (n=439) and the other the reversed SUS (n=438) (Table \ref{tab:items}). The system that respondents were asked to evaluate comprised of the course sites for browsing and watching lecture videos.

\begin{table}[h!]
\small
\centering
\begin{tabular}{rccccr}
\toprule
 & \multicolumn{2}{c}{Original} & \multicolumn{2}{c}{Reversed} \\
\cmidrule(r){2-3} \cmidrule(r){4-5}
Item \# & M & SD & M & SD & p-value \\
\midrule
1 & 3.09 & 0.90 & 3.21 & 1.00 & 0.002\\
2 & 3.28 & 1.06 & 3.23 & 0.86 & 0.006\\
3 & 3.30 & 0.84 & 3.45 & 0.89 & \textless0.001\\
4 & 3.54 & 0.92 & 3.08 & 1.23 & \textless0.001\\
5 & 3.00 & 0.92 & 3.10 & 1.01 & 0.013\\
6 & 3.26 & 1.02 & 2.89 & 1.14 & \textless0.001\\
7 & 3.12 & 0.90 & 3.09 & 0.95 & 0.868\\
8 & 3.10 & 1.18 & 3.33 & 0.84 & 0.173\\
9 & 3.29 & 0.89 & 3.45 & 0.91 & \textless0.001\\
10 & 3.26 & 1.06 & 2.35 & 1.49 & \textless0.001\\
\midrule
overall & 80.58 & 16.14 & 77.92 & 14.23 & \textless0.001\\
\bottomrule
\end{tabular}
\caption{Original and reversed SUS scores providing strong evidence for acquiescence bias.}
\label{tab:acqui}
\end{table}

A comparison between scores from the original and reversed SUS provides strong evidence that the SUS induces acquiescence bias. Without acquiescence bias, the average for each item on the original SUS would not be significantly different from the reverse-coded average for each item on the reversed SUS. However, if acquiescence bias exists, respondents would tend to agree with statements independent of the statement's tone, which would be reflected in significant differences between the original SUS scores and reversed SUS reverse-coded scores. Table \ref{tab:acqui} provides means, standard deviations, and p-values from Mann-Whitney tests of the hypothesis that there is no location shift. We find highly significant differences with at least 99\% confidence in all but two items and the overall SUS score. This is very strong evidence for the claim that the original SUS induces acquiescence bias.

\section{Alternative Proposal}

In this section we propose and discuss an alternative usability scale. Recent questionnaire design insights are used to change the wording of the different SUS items and the response scales. An experiment illustrates its quality and sensitivity as compared to SUS.

<<eval=FALSE>>=
testItems=function(i, oa, ra){
    print(paste(i, round(mean(oa),2), round(sd(oa),2), round(mean(ra),2), round(sd(ra),2), round(wilcox.test(oa,ra)$p.value,5), sep=" & "))
}
for(i in 1:10){
    testItems(i, m.orisus[,i], m.revsus[,i])
}
testItems("all", convSUS(m.orisus[,1:10]), convSUS(m.revsus[,1:10]))
@

\subsection{Proposed Wording Changes}

To minimize acquiescence bias, each of the SUS statements may be transformed into a construct-specific, neutral question with similarly neutral answer options matching the question construct. The appropriate scale and its length will then depend on the nature of the construct, i.e., if it is unipolar or bipolar in nature. To explain the reasoning that led to the proposed changes, we consider a few exemplary items from the SUS with the intention that the same logic can be applied to the remaining items, as summarized in Table \ref{tab:items}.

For example, for item 9, we would first need to identify the underlying construct being asked about, which in this case is likely ``confidence". The leading statement can then easily be transformed into a construct-specific, neutral question: ``How confident were you using the system?". To determine the appropriate scale, it is critical to consider the polarity of ``confidence." As confidence naturally starts from a zero point (the absence of confidence), spans to a positive extreme (high confidence), without a natural negative extreme (levels of negative confidence), this construct is unipolar in nature. As a result, it should be measured on a 5-point, fully-labeled scale from ``Not at all confident" to ``Extremely confident" (see full scale in Table \ref{tab:items}).

As another example, item 3 refers to the bipolar construct of ``ease/difficulty". Bias may then be minimized by changing its wording to ``How easy or difficult was it to use the system?", giving ``easy" and ``difficult" equal weight so respondents are less led into either direction. Due to the bipolarity of this construct, a 7-point, fully labeled scale from ``Extremely difficult" to ``Extremely easy" may be used (see full scale in Table \ref{tab:items}). Note, as items 3 and 8 in the original SUS ask about the same construct, only one of these identical reworded questions would be included in the questionnaire. As the purpose of this note is not to evaluate the summarization of the questionnaire responses into a single score, it is excluded from this discussion.

\subsection{Experimental Evaluation of Scale Sensitivity}

A good usability scale should exhibit a high level of sensitivity to reflect even subtle differences in usability. We conducted a second survey study in post-course surveys of two online courses that ran on different systems (Web interfaces) to investigate how an example of the proposed alternative scale (items 3, 6, 9, 10 from Table \ref{tab:items}) compares to the SUS in terms of sensitivity.

The two systems offered similar capabilities, i.e., browsing and playing video lectures, but differed considerably in their design. We employed Molich and Nielsen's heuristic evaluation criteria \cite{molich1990improving} to informally establish which system has better usability. While both systems showed generally high usability, one system was deemed superior in four evaluation categories. This informal usability comparison was the basis for labeling one system as having ``high usability" and the other ``low usability". We collected 439 SUS and alternative scale responses for the low usability system, and 96 for the high usability system.

A Mann-Whitney test of the difference between the usability ratings for the two systems for each scale suggests that the alternative scale is more sensitive than the SUS ($W$=18585, $p$=0.069 for the SUS; $W$=17744, $p$=0.014 for the alternative scale). The alternative scale scores are significantly different for the two systems, while the SUS scores are not.  Although this comparison uses relatively small sample sizes and only four items from the proposed alternative scale, this finding points at further potential benefits of using a more neutral scale than the SUS.
<<eval=FALSE>>=
wilcox.test(convSUS(m.orisus[,1:10]), convSUS(a.orisus[,1:10]), conf.int=T)
wilcox.test(sample(rowSums(m.sus2[,2:5])/.18,439), sample(rowSums(a.sus2[,2:5])/.18,96), conf.int=T)
@

\subsubsection{Comparing Psychometric Properties}
%As the psychometric properties of the SUS have been studied extensively \cite{lewis2009factor}, a brief evaluation of key statistics should be sufficient here.
Table \ref{tab:dist} provides statistics and psychometric properties of the original and reversed SUS, and an example (items 3, 6, 9, 10) of the alternative scale. They share similar distributional characteristics, but the SUS has higher reliability than the alternative scale in terms of Cronbach's $\alpha$. The SUS is also 2.5 times longer than the alternative scale example which increases reliability. A factor analysis yields two eigenvalues greater than one for the SUS, consistent with previous work on its factor structure \cite{lewis2009factor}.

\begin{table}[h!]
\small
\centering
\begin{tabular}{llll}
\toprule
          & Original & Reversed & Alternative\\
Statistic & SUS      & SUS      & Scale$^{\text{item \#3,6,9,10}}$\\
\midrule
N         & 439 & 438 & 869\\
Range     & [23, 100] & [8, 100] & [22, 100]\\
Mean (SD) & 80.6 (16.1) & 77.9 (14.2) & 76.7 (14.7)\\
Median (IQR) & 85 (20) & 80 (18) & 78 (22)\\
Cronbach $\alpha$ & 0.86 & 0.73 & 0.67\\
$\left\vert{\lambda}>1\right\vert^{\ast}$ & 2 & 2 & 1\\
\bottomrule
\multicolumn{4}{l}{\scriptsize{$^{\ast}$number of eigenvalues greater than 1 in factor analysis}}\\
\end{tabular}
\caption{Statistical and psychometric scale properties.}
\label{tab:dist}
\end{table}
<<eval=FALSE>>=
length(convSUS(m.orisus[,1:10])); range(convSUS(m.orisus[,1:10])); mean(convSUS(m.orisus[,1:10])); sd(convSUS(m.orisus[,1:10])); median(convSUS(m.orisus[,1:10])); IQR(convSUS(m.orisus[,1:10])); alpha(m.orisus[,1:10])$total; fa(m.orisus[,1:10],1)$e.value
length(convSUS(m.revsus[,1:10])); range(convSUS(m.revsus[,1:10])); mean(convSUS(m.revsus[,1:10])); sd(convSUS(m.revsus[,1:10])); median(convSUS(m.revsus[,1:10])); IQR(convSUS(m.revsus[,1:10])); alpha(m.revsus[,1:10])$total; fa(m.revsus[,1:10],1)$e.value
length(convSUS2(m.sus2[,2:5])); range(convSUS2(m.sus2[,2:5])); mean(convSUS2(m.sus2[,2:5])); sd(convSUS2(m.sus2[,2:5])); median(convSUS2(m.sus2[,2:5])); IQR(convSUS2(m.sus2[,2:5])); alpha(m.sus2[,2:5])$total; fa(m.sus2[,2:5],1)$e.value
@
\section{Conclusion}

This note's unique contribution is to critique the SUS from a questionnaire design perspective which has not been done before. We find strong evidence that the SUS induces acquiescence bias and show how an alternative scale with statements rephrased as questions and construct-specific, neutral answer scales achieve higher sensitivity in measuring usability than the SUS. Future work could evaluate the SUS with respect to other biases, such as satisficing and the use of hypotheticals. Moreover, a thorough psychometric evaluation of the proposed alternative scale using a variety of systems is needed before we can recommend the adoption of this new scale. This will also require addressing the compatibility issue between old and new usability scores for the transition phase.

\begin{sidewaystable}
\small
\centering
\caption{Original and reversed SUS items (see Figure \ref{fig:originalScale} for answer scale) and proposed alternative with corresponding answer scales.}
\label{tab:items}
\begin{tabular}{| r | p{5cm} | p{5cm} | p{3.5cm} | p{6cm} |}
\hline
\textbf{\#}  & \textbf{Original SUS$^\ast$} & \textbf{Reversed SUS$^\ast$} & \textbf{Proposed Alternative} & \textbf{Proposed Answer Scale}\\
\hline
1 & I think that I would like to use this system frequently & I do not think that I would like to use this system frequently & How much do you like or dislike the system? & \{Extremely, Moderately, Slightly\} dislike, Neither like nor dislike, \{Slightly, Moderately, Extremely\} like \\ \hline
2  & I found the system unnecessarily complex & I found the system appropriately simple & How complex is the system? & \{Not at all, Slightly, Moderately, Very, Extremely\} complex\\ \hline
3  & I thought the system was easy to use & I thought the system was hard to use & How easy or difficult was it to use the system? & \{Extremely, Moderately, Slightly\} difficult, Neither difficult nor easy, \{Slightly, Moderately, Extremely\} easy\\ \hline
4  & I think that I would need the support of a technical person to be able to use this system & I think that I would not need any support of a technical person to be able to use this system  & How likely are you to need the support of a technical person to be able to use the system? & \{Extremely, Very, Somewhat\} unlikely, Neither likely nor unlikely, \{Somewhat, Very, Extremely\} likely\\ \hline
5  & I found the various functions in this system were well integrated & I found the various functions in this system were not well integrated & How integrated are the system's various functions? & \{Not at all, Slightly, Moderately, Very, Extremely\} integrated\\ \hline
6  & I thought there was too much inconsistency in this system & I did not think there was too much inconsistency in this system & How consistent is the system? & \{Not at all, Slightly, Moderately, Very, Extremely\} consistent\\ \hline
7  & I would imagine that most people would learn to use this system very quickly  & I would imagine that most people would learn to use this system very slowly & How easy or difficult was it to learn how to use the system? & \{Extremely, Moderately, Slightly\} difficult, Neither difficult nor easy, \{Slightly, Moderately, Extremely\} easy\\ \hline
8  & I found the system very cumbersome to use & I found the system very manageable to use & How cumbersome was it to use the system? & \{Not at all, Slightly, Moderately, Very, Extremely\} cumbersome\\ \hline
9  & I felt very confident using the system & I did not feel very confident using the system & How confident were you using the system? & \{Not at all, Slightly, Moderately, Very, Extremely\} confident\\ \hline
10  & I needed to learn a lot of things before I could get going with this system   & I needed to learn very few things before I could get going with this system  & How much more is there to learn about the system? & Nothing at all, A little, A moderate amount, A lot, A great deal\\ \hline
\end{tabular}
\end{sidewaystable}

\bibliographystyle{acm-sigchi}
\bibliography{suslit}
\end{document}