Classification using scikit-learn on iris dataset
Check Out Cool Merch Here >>>
Classification using scikit-learn on iris dataset
Classification is a fundamental task in machine learning that involves learning to predict categorical or discrete output variables based on a set of input features. In this tutorial, we will use scikit-learn to perform classification on the famous Iris dataset.
The Iris dataset is a well-known dataset in machine learning and consists of 150 samples, each with four input features (sepal length, sepal width, petal length, and petal width) and a corresponding output class label (setosa, versicolor, or virginica). Our goal is to train a machine learning model that can accurately predict the output class label given a new set of input features.
We will use scikit-learn, a popular Python library for machine learning, to perform classification on the Iris dataset. Here is the code to load the Iris dataset and split it into training and testing sets:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset
iris = load_iris()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)
In this code, we first load the Iris dataset using the load_iris() function from scikit-learn's datasets module. We then split the dataset into training and testing sets using the train_test_split() function from the model_selection module. We set the test_size parameter to 0.3, which means that 30% of the data will be used for testing, and the remaining 70% will be used for training. We also set the random_state parameter to 42 to ensure that the random split is reproducible.
Next, we will train a machine learning model on the training data and evaluate its performance on the testing data. We will use scikit-learn's LogisticRegression algorithm, which is a common algorithm for binary and multi-class classification problems:
from sklearn.linear_model import LogisticRegression
# Train a Logistic Regression model on the training data
model = LogisticRegression()
model.fit(X_train, y_train)
# Evaluate the model on the testing data
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy}")
In this code, we first import the LogisticRegression class from scikit-learn's linear_model module. We then train a LogisticRegression model on the training data using the fit() method. Finally, we evaluate the performance of the model on the testing data using the score() method, which returns the accuracy of the model on the testing data. We print the accuracy to the console.
When we run this code, we get an accuracy of around 96%, which means that the model is able to accurately predict the output class label for 96% of the testing examples.
In conclusion, scikit-learn provides a powerful and easy-to-use framework for performing classification on the Iris dataset and other similar datasets. By following these simple steps, you can quickly and easily train a machine learning model that can accurately predict the output class label given a new set of input features.