"This is the basic environment. stumpy is the main library we're working with, numpy and pandas are for all intents and purposes required to work with datasets and matplotlib allows us nice output."
]
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 1,
"execution_count": 1,
@ -16,6 +24,14 @@
"import datetime as dt"
"import datetime as dt"
]
]
},
},
{
"cell_type": "markdown",
"id": "a42ce3b2",
"metadata": {},
"source": [
"Basic utilities used to output the time it takes to compute a matrix profile over a given number of windows."
]
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 2,
"execution_count": 2,
@ -47,6 +63,14 @@
" return result"
" return result"
]
]
},
},
{
"cell_type": "markdown",
"id": "e9e52f62",
"metadata": {},
"source": [
"More utilities to encapsulate the process of plotting our data."
]
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 3,
"execution_count": 3,
@ -74,6 +98,14 @@
" plot_.plot(profile[:, 0])"
" plot_.plot(profile[:, 0])"
]
]
},
},
{
"cell_type": "markdown",
"id": "07c68fbe",
"metadata": {},
"source": [
"Load the data!"
]
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 5,
"execution_count": 5,
@ -167,6 +199,14 @@
"steam_data[:4]"
"steam_data[:4]"
]
]
},
},
{
"cell_type": "markdown",
"id": "347f81d3",
"metadata": {},
"source": [
"Automates the process of marking a motif or discord. Note that extracting a discord (or radically different sample) from a matrix profile is the same process as extracting a motif but with the matrix profile in reverse order allowing us to generalize into the threshold_extraction function. This function is also capable of marking every repeat in a motif group, rather than simply taking the first two values."
"The graphs show our dataset on top and the distance profile on bottom. The distance profile plots a sliding window's closeness to its nearest neighbor. The low points are places where the data is most similar to some other sample and the high points represent radically different patterns which appear nowhere else in the data. The matrix profile is simply the distance profile sorted from lowest distance value to greatest."