# **Data**

## **Description for the data**

The available data consist of 5 signals (accelerometer z, x&y, and 3-axes gyroscope) recorded on 8 individuals with a sampling frequency of 16 Hz, what means that each signal has been measured 16 times per second.

For example, if the signals were recorded during 15 minutes for a given individual, a total of (15*60 (secs))*16 = 14400 measures are taken, therefore, we will have a vector of 14400 observations (data points) of each signal for that individual.

The recording time of each signal for each person was lower than 20 minutes.

Along with the signals the activity carried out by the individuals in each considered fraction of time was also registered.

- **Predictors**: the signals.
- **Response**: the activities.

Besides, we have data regarding the signals for 2 additional individuals whose activity was not registered. One of the task will be to recognize what kind of activity those people were doing in each second of the recorded time. This data will play the role of new data to be predicted.

## **Understanding the data**

The dat is encapsulated as a `Matlab` file. We can load it using `loadmat` from `scipy`.

In [6]:
# Load the .mat file
HAR_database = loadmat(r'C:\Users\fscielzo\Documents\DataScience-GitHub\Human-Activity-Recognition\HAR_database\HAR_database.mat')

`HAR_database` is a dictionary with the following aspect:

In [536]:
HAR_database

{'__header__': b'MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Tue May 16 10:33:21 2017',
 '__version__': '1.0',
 '__globals__': [],
 'database_training': array([[array([[ 0.02417325,  0.01990478,  0.03474859, ...,  0.15302195,
                  0.16644707,  0.13097667],
                [ 0.59441693,  0.60247182,  0.52582067, ...,  1.9454901 ,
                  2.00124793,  1.9890223 ],
                [-0.02273627, -0.01287489, -0.0200163 , ...,  0.00468276,
                 -0.00249675, -0.00409452],
                [ 0.11196179,  0.10379559,  0.10319319, ...,  0.11359097,
                  0.10181863,  0.10699662],
                [ 0.06049883,  0.05515676,  0.05754055, ...,  0.04945955,
                  0.06203927,  0.06837131]])                             ,
         array([[3, 3, 3, ..., 3, 3, 3]], dtype=uint8)],
        [array([[-0.00614456,  0.09416632,  0.02556427, ...,  0.25846636,
                  0.26831659,  0.20932986],
                [ 0.15163056,  0.27388097,  0

The keys of `HAR_database` are the following:

In [537]:
HAR_database.keys()

dict_keys(['__header__', '__version__', '__globals__', 'database_training', 'database_test'])

The available data for building the human activity recognition system is stored within `database_training`, and is an array of arrays.

It has 8 arrays (one per individual), and each array has 2 arrays as columns, the first with the signals data, the second with the activity data.

In [538]:
HAR_database['database_training']

array([[array([[ 0.02417325,  0.01990478,  0.03474859, ...,  0.15302195,
                 0.16644707,  0.13097667],
               [ 0.59441693,  0.60247182,  0.52582067, ...,  1.9454901 ,
                 2.00124793,  1.9890223 ],
               [-0.02273627, -0.01287489, -0.0200163 , ...,  0.00468276,
                -0.00249675, -0.00409452],
               [ 0.11196179,  0.10379559,  0.10319319, ...,  0.11359097,
                 0.10181863,  0.10699662],
               [ 0.06049883,  0.05515676,  0.05754055, ...,  0.04945955,
                 0.06203927,  0.06837131]])                             ,
        array([[3, 3, 3, ..., 3, 3, 3]], dtype=uint8)],
       [array([[-0.00614456,  0.09416632,  0.02556427, ...,  0.25846636,
                 0.26831659,  0.20932986],
               [ 0.15163056,  0.27388097,  0.24967994, ...,  1.01129501,
                 1.00222655,  0.94156865],
               [-0.02664025, -0.02277741, -0.00710769, ..., -0.02000222,
                -0.0194218 ,

In [539]:
HAR_database['database_training'].shape

(8, 2)

In [540]:
HAR_database['database_training'][0]

array([array([[ 0.02417325,  0.01990478,  0.03474859, ...,  0.15302195,
                0.16644707,  0.13097667],
              [ 0.59441693,  0.60247182,  0.52582067, ...,  1.9454901 ,
                2.00124793,  1.9890223 ],
              [-0.02273627, -0.01287489, -0.0200163 , ...,  0.00468276,
               -0.00249675, -0.00409452],
              [ 0.11196179,  0.10379559,  0.10319319, ...,  0.11359097,
                0.10181863,  0.10699662],
              [ 0.06049883,  0.05515676,  0.05754055, ...,  0.04945955,
                0.06203927,  0.06837131]])                             ,
       array([[3, 3, 3, ..., 3, 3, 3]], dtype=uint8)], dtype=object)

For example, the array with the signals data for the first individual is the following:


In [541]:
HAR_database['database_training'][0][0]

array([[ 0.02417325,  0.01990478,  0.03474859, ...,  0.15302195,
         0.16644707,  0.13097667],
       [ 0.59441693,  0.60247182,  0.52582067, ...,  1.9454901 ,
         2.00124793,  1.9890223 ],
       [-0.02273627, -0.01287489, -0.0200163 , ...,  0.00468276,
        -0.00249675, -0.00409452],
       [ 0.11196179,  0.10379559,  0.10319319, ...,  0.11359097,
         0.10181863,  0.10699662],
       [ 0.06049883,  0.05515676,  0.05754055, ...,  0.04945955,
         0.06203927,  0.06837131]])

There are 5 signals (rows) and 17736 measurements  for each signals.

In [542]:
HAR_database['database_training'][0][0].shape

(5, 17736)

Since the sampling frequency of the signals is 16 Hz, which represents 16 measurements per second, the time during which the signals were recorded for the first individual is 17736 / 16 = 1108.5 seconds, what means 18.48 minutes. So, the recording time of the signals for the first individual was 18.48 minutes.

In [543]:
sampling_freq = 16
(HAR_database['database_training'][0][0].shape[1] / sampling_freq) / 60

18.475

The array with the activities of the first individual is the following.

In [544]:
HAR_database['database_training'][0][1]

array([[3, 3, 3, ..., 3, 3, 3]], dtype=uint8)

Is a 1D array (vector) with 17736 components, that represent measurements of activity.

In [545]:
HAR_database['database_training'][0][1].shape

(1, 17736)

The i-th component of that vector indicates the activity that the first individual was doing in that specific fraction of time within those 18.48 minutes. Since each point represent a tiny fraction of time (1/16 = 0.063 seconds, approximately), is reasonable to find the same activity repeated many times in a row. For example, if the individual was standing in the first 50 seconds of the recording, the vector would have a 3 (label for standing) in its first 50*16 = 800 components. In other words, the first 800 measurements fo activity would correspond to standing.