More human readable output

master
Brady McDonough 3 years ago
parent 5cab0ea229
commit eef2489cc8

@ -1,9 +1,12 @@
# 1 - STUMPY Basics # 1 - STUMPY Basics
Above what was seen in the tutorial I started on a general purpose motif Above what was seen in the tutorial I started on a generalized motif and discord
function which takes the dataset and a computed matrix profile and returns the extraction procedure. Since these are both really the same process over different
top motif it discovered and all locations where that motif appears. It does not sort orders of the matrix profile we end up with a very clean functional setup
return overlapping motifs. with a single function doing all the work and two wrappers 'deciding' which
calculation should be done. This allowed me to generalize marking discovered
discord and motif windows on my output plots, a process which should remain more
or less evergreen.
## TODO ## TODO
I remember reading that the distances returned by a matrix profile have an upper I remember reading that the distances returned by a matrix profile have an upper

File diff suppressed because one or more lines are too long

@ -1,5 +1,13 @@
{ {
"cells": [ "cells": [
{
"cell_type": "markdown",
"id": "cc72af0d",
"metadata": {},
"source": [
"This is the basic environment. stumpy is the main library we're working with, numpy and pandas are for all intents and purposes required to work with datasets and matplotlib allows us nice output."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 1,
@ -16,6 +24,14 @@
"import datetime as dt" "import datetime as dt"
] ]
}, },
{
"cell_type": "markdown",
"id": "a42ce3b2",
"metadata": {},
"source": [
"Basic utilities used to output the time it takes to compute a matrix profile over a given number of windows."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
@ -47,6 +63,14 @@
" return result" " return result"
] ]
}, },
{
"cell_type": "markdown",
"id": "e9e52f62",
"metadata": {},
"source": [
"More utilities to encapsulate the process of plotting our data."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 3,
@ -74,6 +98,14 @@
" plot_.plot(profile[:, 0])" " plot_.plot(profile[:, 0])"
] ]
}, },
{
"cell_type": "markdown",
"id": "07c68fbe",
"metadata": {},
"source": [
"Load the data!"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": 5,
@ -167,6 +199,14 @@
"steam_data[:4]" "steam_data[:4]"
] ]
}, },
{
"cell_type": "markdown",
"id": "347f81d3",
"metadata": {},
"source": [
"Automates the process of marking a motif or discord. Note that extracting a discord (or radically different sample) from a matrix profile is the same process as extracting a motif but with the matrix profile in reverse order allowing us to generalize into the threshold_extraction function. This function is also capable of marking every repeat in a motif group, rather than simply taking the first two values."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 7,
@ -216,6 +256,14 @@
"\n" "\n"
] ]
}, },
{
"cell_type": "markdown",
"id": "07e3ce74",
"metadata": {},
"source": [
"Now we can actually do our calculations..."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 8,
@ -240,6 +288,14 @@
"discord_list = get_discords(matrix_profile, motif_order, window, 0.005)" "discord_list = get_discords(matrix_profile, motif_order, window, 0.005)"
] ]
}, },
{
"cell_type": "markdown",
"id": "9e4612eb",
"metadata": {},
"source": [
"... and report what we've discovered."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 9,
@ -289,6 +345,14 @@
"plot_matrix_profile(axis[1], matrix_profile)\n", "plot_matrix_profile(axis[1], matrix_profile)\n",
"mark_discovered(discord_list, axis[0], axis[1], window, max(steam_data['steam flow']))" "mark_discovered(discord_list, axis[0], axis[1], window, max(steam_data['steam flow']))"
] ]
},
{
"cell_type": "markdown",
"id": "432d3659",
"metadata": {},
"source": [
"The graphs show our dataset on top and the distance profile on bottom. The distance profile plots a sliding window's closeness to its nearest neighbor. The low points are places where the data is most similar to some other sample and the high points represent radically different patterns which appear nowhere else in the data. The matrix profile is simply the distance profile sorted from lowest distance value to greatest."
]
} }
], ],
"metadata": { "metadata": {
@ -307,7 +371,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.9" "version": "3.10.10"
} }
}, },
"nbformat": 4, "nbformat": 4,

@ -11,17 +11,12 @@ copied in as I follow along, I have tried to mark as clearly as possible where I
extrapolations. All data related to the tutorial is also mirrored in a data directory for each entry. extrapolations. All data related to the tutorial is also mirrored in a data directory for each entry.
## 1 - STUMPY Basics ## 1 - STUMPY Basics
Above what was seen in the tutorial I started on a general purpose motif function which takes the dataset [Human readable walkthrough.][3]
and a computed matrix profile and returns possibly multiple motif groups. It takes 2 threshold parameters, [Python code][4]
one for the absolute value of the matrix profile at the given point and the other for a percentage of the
maximum data magnitude.
### TODO
I remember reading somewhere about an upper bound on matrix profile values. I should find that again
and calculate a percentage of the upper bound rather than having mp_thresh be an absolute value.
The Motifs class and function should be broken off into their own module for re-use elsewhere, including
Jupyter.
[1]: https://www.cs.ucr.edu/%7Eeamonn/MatrixProfile.html "Resources and papers on the Matrix Profile" [1]: https://www.cs.ucr.edu/%7Eeamonn/MatrixProfile.html "Resources and papers on the Matrix Profile"
[2]: https://stumpy.readthedocs.io/en/latest/tutorials.html "stumpy tutorial" [2]: https://stumpy.readthedocs.io/en/latest/tutorials.html "stumpy tutorial"
[3]: ./1-STUMPY-basics/Steamgen\ dataset.html
[4]: ./1-STUMPY-basics/portable.py
Loading…
Cancel
Save