-
Notifications
You must be signed in to change notification settings - Fork 0
Hw1_commit #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Hw1_commit #1
Conversation
bda602_hw1/hw1.py
Outdated
| # y = species | ||
|
|
||
|
|
||
| from sklearn.preprocessing import StandardScaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to add all the import commands at the beginning of the code, in one place
bda602_hw1/hw1.py
Outdated
| datafile ="http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" | ||
| #c=pd.read_csv(datafile) | ||
| #datafile2 = "/mnt/C:/Users/thoma/OneDrive/Documents/bda602/hw1/bezdekIris.data" | ||
| #print(datafile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prints the URL not the dataframe, it should be print(c). anyway it's not important just a note
bda602_hw1/hw1.py
Outdated
|
|
||
| print(iris_data.head()) | ||
|
|
||
| # def petal_func(columnname): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iris_data.describe() is much more convenient to get all statistics
bda602_hw1/hw1.py
Outdated
| from sklearn.model_selection import train_test_split | ||
| species = iris_data['species'] | ||
| data_new = iris_data.drop(columns=['species']) | ||
| xtrain, xtest, ytrain, ytest = train_test_split(data_new, test_size=0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is running for you, but wouldn't run for me. So I noticed you created species and actually never used it. I changed this line to xtrain, xtest, ytrain, ytest = train_test_split(data_new, species, test_size=0.2) and it works now
bda602_hw1/hw1.py
Outdated
| print(X_train) | ||
|
|
||
| # from sklearn.ensemble import RandomForestClassifier | ||
| # xtrain, xtest = train_test_split(data_new, test_size=0.2,random_state=123) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there any reason you again divided data into test and train?
Bita look at this