Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions _lab/lab07.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,43 @@ def mostCommonWords(filename, N):

```


# Extra Credit Challenges

(make sure you read all 3 sections including this one: Extra Credit Challenges, Extra Credit Tests, and Extra Credit Submission)

Below are three extra credit challenges.
- For each one, you need to implement a modified version of each of the lab07 functions (`totalWords`, `longestWord`, `charactersPerWord`, `wordFrequency`, and `mostCommonWords`).
- Each of the three challenges involves pre-processing the input text in a new way.
- Each modified function that you will implement must call the original version of the function after doing the specified pre-processing.
- __The pre-processing is cumulative__: that means your solution to challenge 3 must perform the pre-processing required by challenge 3, in addition to the pre-processing required by challenges 1 and 2.

The challenges are:
1. Remove double-quotes from the file content
2. Remove "stop words" using the provided [stopwords.txt](stopwords.txt).
* every line of stopwords.txt contains exactly one word, called a stop word (some of them are contractions, e.g. "you'd")
* Hint about how to remove all occurences of a word from a list of words: if you use the lists's `remove` method, it will only remove the first occurence of that word from the list. To address this you can use a while loop to keep removing a word as long as it is still found in the list. Alternatively, instead of removing the desired word from the list, you could create a new list and add to it all words except for the one you'd like to remove.
3. Convert all letters to lower case
* In English, "Hi" and "hi" are the same word, even though in Python they are different strings. For this challenge, enforce that by pre-processing all words to lowercase. The result will be that "hi", "Hi", "HI", and "hI" will all be counted as the same word "hi". (Yes, "HI" is the state abbreviation for Hawaii so you could argue it actually is a different word. In this assignment do not worry about that, just treat them as the same word).

__REMINDER--the pre-processing is cumulative__: that means your solution each challenge must perform the pre-processing required by that challenge, in addition to the pre-processing required by the previous challenge(s).

# Extra Credit Tests

For each challenge we are providing the tests below to help you get started. The autograder for this assignment will contain additional tests for cases that are not covered by the tests below, so you are advised to think about which test cases are missing and write them yourself.

Initial tests for challenge 1:
```python
```

Initial tests for challenge 2:
```python
```

Initial tests for challenge 3:
```python
```

# Extra Credit Submission

For each challenge that you complete, you will sumbmit one file, called `lab07_challenge1.py`, `lab07_challenge2.py`, or `lab07_challenge3.py` for challenges 1, 2, and 3 respectively. Each file must contain a modified implementation of `totalWords`, `longestWord`, `charactersPerWord`, `wordFrequency`, and `mostCommonWords`. In order to reuse your original functions, we recommend copying them into your challenge files with new names, for example `longestWord_original`.
179 changes: 179 additions & 0 deletions _lab/lab07/stopwords.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
i
me
my
myself
we
our
ours
ourselves
you
you're
you've
you'll
you'd
your
yours
yourself
yourselves
he
him
his
himself
she
she's
her
hers
herself
it
it's
its
itself
they
them
their
theirs
themselves
what
which
who
whom
this
that
that'll
these
those
am
is
are
was
were
be
been
being
have
has
had
having
do
does
did
doing
a
an
the
and
but
if
or
because
as
until
while
of
at
by
for
with
about
against
between
into
through
during
before
after
above
below
to
from
up
down
in
out
on
off
over
under
again
further
then
once
here
there
when
where
why
how
all
any
both
each
few
more
most
other
some
such
no
nor
not
only
own
same
so
than
too
very
s
t
can
will
just
don
don't
should
should've
now
d
ll
m
o
re
ve
y
ain
aren
aren't
couldn
couldn't
didn
didn't
doesn
doesn't
hadn
hadn't
hasn
hasn't
haven
haven't
isn
isn't
ma
mightn
mightn't
mustn
mustn't
needn
needn't
shan
shan't
shouldn
shouldn't
wasn
wasn't
weren
weren't
won
won't
wouldn
wouldn't