Skip to main content

Inspect Datasets

In this section we will look at the different ways in which you can visualize the contents of a dataset.

For this section, we’ll start by loading a well-known dataset, the Iris dataset. We will do this using the load_test_data() function.

import shapelets as sh

session = sh.sandbox()
data = session.load_test_data()
note

Remember to create a Shapelets session first to work with the API.

Contents of a Dataset

If you have a Shapelets Dataset, you can visualize the contents using.

  • head(n=5) function to see the n top first rows.
data.head()
danger

PYTHON CODE CANNOT BE RENDERED!

  • tail(n=5) function to see the n last rows. Beware that this specific method will cause the dataset to materialize and could take some extra time to run, depending on the dataset size.
data.tail()

If you want to get a sample of a dataset, you can do it by calling sample()

data.sample()

Dataset Description

If you want to know the number of rows in a dataset you can use Python built-in len() function to find it out.

len(data)

If you want to access the names of the columns in a dataset you can use the attribute columns of a dataset.

data.columns

If you want to know the shape of a dataset you can use the shape attribute.

data.shape

You can check the structure of the dataset, getting information about columns and datatypes by calling the dataset object.

data

Dataset Summaries

If you want a statistical summary of the DataSet, you can get it calling describe():

data.describe()