diff --git a/your-code/main.ipynb b/your-code/main.ipynb deleted file mode 100755 index 54a8b65..0000000 --- a/your-code/main.ipynb +++ /dev/null @@ -1,420 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Before your start:\n", - "- Read the README.md file\n", - "- Comment as much as you can and use the resources in the README.md file\n", - "- Happy learning!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Import reduce from functools, numpy and pandas" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 1 - Mapping\n", - "\n", - "#### We will use the map function to clean up words in a book.\n", - "\n", - "In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Run this code:\n", - "\n", - "location = '../data/58585-0.txt'\n", - "with open(location, 'r', encoding=\"utf8\") as f:\n", - " prophet = f.read().split(' ')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. \n", - "\n", - "Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### The next step is to create a function that will remove references. \n", - "\n", - "We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def reference(x):\n", - " '''\n", - " Input: A string\n", - " Output: The string with references removed\n", - " \n", - " Example:\n", - " Input: 'the{7}'\n", - " Output: 'the'\n", - " '''\n", - " \n", - " # your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\\n`. Write your function in the cell below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def line_break(x):\n", - " '''\n", - " Input: A string\n", - " Output: A list of strings split on the line break (\\n) character\n", - " \n", - " Example:\n", - " Input: 'the\\nbeloved'\n", - " Output: ['the', 'beloved']\n", - " '''\n", - " \n", - " # your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 2 - Filtering\n", - "\n", - "When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def word_filter(x):\n", - " '''\n", - " Input: A string\n", - " Output: True if the word is not in the specified list \n", - " and False if the word is in the list.\n", - " \n", - " Example:\n", - " word list = ['and', 'the']\n", - " Input: 'and'\n", - " Output: False\n", - " \n", - " Input: 'John'\n", - " Output: True\n", - " '''\n", - " \n", - " word_list = ['and', 'the', 'a', 'an']\n", - " \n", - " # your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the `filter()` function to filter out the words speficied in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Bonus Challenge\n", - "\n", - "Rewrite the `word_filter` function above to not be case sensitive." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def word_filter_case(x):\n", - " \n", - " word_list = ['and', 'the', 'a', 'an']\n", - " \n", - " # your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 3 - Reducing\n", - "\n", - "#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. \n", - "\n", - "We will start by writing a function that takes two strings and concatenates them together with a space between the two strings." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def concat_space(a, b):\n", - " '''\n", - " Input:Two strings\n", - " Output: A single string separated by a space\n", - " \n", - " Example:\n", - " Input: 'John', 'Smith'\n", - " Output: 'John Smith'\n", - " '''\n", - " \n", - " # your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Bonus Challenge 2 - Applying Functions to DataFrames\n", - "\n", - "#### Our next step is to use the apply function to a dataframe and transform all cells.\n", - "\n", - "To do this, we will connect to Ironhack's database and retrieve the data from the *pollution* database. Select the *beijing_pollution* table and retrieve its data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the data using the `head()` function." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The next step is to create a function that divides a cell by 24 to produce an hourly figure. Write the function below." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def hourly(x):\n", - " '''\n", - " Input: A numerical value\n", - " Output: The value divided by 24\n", - " \n", - " Example:\n", - " Input: 48\n", - " Output: 2.0\n", - " '''\n", - " \n", - " # your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Apply this function to the columns `Iws`, `Is`, and `Ir`. Store this new dataframe in the variable `pm25_hourly`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Our last challenge will be to create an aggregate function and apply it to a select group of columns in our dataframe.\n", - "\n", - "Write a function that returns the standard deviation of a column divided by the length of a column minus 1. Since we are using pandas, do not use the `len()` function. One alternative is to use `count()`. Also, use the numpy version of standard deviation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def sample_sd(x):\n", - " '''\n", - " Input: A Pandas series of values\n", - " Output: the standard deviation divided by the number of elements in the series\n", - " \n", - " Example:\n", - " Input: pd.Series([1,2,3,4])\n", - " Output: 0.3726779962\n", - " '''\n", - " \n", - " # your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -}