Visual programming with Orange Tool

Anjali Bhavsar
3 min readSep 13, 2021

In this blog I will show how to Split our data in training data and testing data in Orange and how to use cross validation in Orange.

Creating the workflow

First we use File widget in the canvas and load the inbuilt titanic dataset in the workflow.

Next send the input data to widget Data Sampler. Data Sampler selects a subset of data instances from an input data set. It outputs a sampled and a complementary data set (with instances from the input set that are not included in the sampled data set). The output is processed after the input data set is provided and Sample Data is pressed. Here I sampled the data 70% output sampled data and 30% will be complementary data set.

Data sampler

Now send the sample data from Data Sampler to Test and Score widget. The widget tests learning algorithms. Different sampling schemes are available, including using separate test data. The widget does two things. First, it shows a table with different classifier performance measures, such as classification accuracy and area under the curve. Second, it outputs evaluation results, which can be used by other widgets for analyzing the performance of classifiers, such as ROC Analysis or Confusion Matrix.

The sample data from Test and Score is send to three different learning algorithms namely Neural Network, Naive Bayes and Logistic Regression.

Work flow

Sampling using Cross Validation in Orange

Module Orange, evaluation, testing contains methods for cross-validation, leave-one out, random sampling and learning curves. These procedures split the data onto training and testing set and use the training data to induce models; models then make predictions for testing data. Predictions are collected in Experiment Results, together with the actual classes and some other data. The latter can be given to functions scoring that compute the performance scores of models.

Cross validation of data

Adding test data

After splitting the data into train and test dataset, we will send the 70% of the sampled data from Data Sampler as the train data and remaining 30% data as the test data by clicking on the link between Data Sampler and Test and Score. In there set the link from Data Sample box to Data box and Remaining Data box to Test Data as shown in below figure.

Now as you can see in below image there are two flows from data sampler to test and score.

final workflow

Now compare between train and test data

Test on train data
Test on test data

In this blog I tried to explore on how we can sample our data and compare different learning algorithms to find out which is the best algorithm for our data set using the Orange tool.

Thank You!!

--

--