Could somebody elaborate "dimensional space" and "hyperplane"?
I am reading a text related to SVM, and the mathematical language is giving me a little hard time.
Here training vectors xi are mapped into a higher (maybe infinite) dimensional space by the function $\theta$. SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space.
I do not understand the term "dimensional space" in this case. Drawing on a paper is 2D. We are living in 3D space. In Mathematics, when we say "higher dimensional space", what are we actually implying in Mathematics?
Another term "hyperplane" is also giving me a bit of hard time to understand. Is it simply just a 2D plane? I try to search for its definition, and most of the time, I get a term that leads to many more terms (and more confusing) Frankly, mathematical language is difficult for me.
Could somebody simplify and relate "dimension" and "hyperplane" in the text above in an easier way to understand?
Thank you very much.
The way I read the quote, "higher (maybe infinite) dimensional space" should be read as "higher-dimensional (maybe infinite-dimensional) space" meaning a space with more dimensions than the $x_i$ had originally (mapping them into a higher-dimensional space).
A hyperplane in $n$-dimensional space is an $(n-1)$-dimensional object that can be described by $\vec{n}\cdot\vec{x}=k$ where $\vec{n}$ is a constant vector orthogonal to the hyperplane, $\vec{x}$ is a variable vector from the origin to a point on the plane, and $k$ is some scalar constant. A hyperplane in 2-space is a line; a hyperplane in 3-space is a plane. (edit: Matt E.'s comment that a hyperplane is a subspace of dimension one less than the whole space is a much nicer definition than mine.)
Karl, higher dimensions are defined so as to be analogous to 2-D and 3-D. This means analytical definitions of orthogonality, distance, and so on must be preserved.
To "visualise" 4-D and up, you can start by visualisation exercises with 4-cubes or think about arrays in programming. $array[i][j][k][l]
is a four-dimensional array for example. If you took all of the possible combinations of 0 and 1 in that array -- for instance [0,1,0,0], [1,1,0,0], [0,1,1,1]
and so on -- you would have a data structure that's equivalent to a 4-cube or tesseract. You can generate these in R
using the combn()
function.
Here's another example: imagine a 4-sequence of 4 stock prices progressing in parallel, second by second. Record all the seconds for a single trading day and you have a day-long 4-D vector. That's four-dimensional data. I could have looked at 500 stock prices progressing in parallel and that would be 500-dimensional data.
The Titanic data set in R
is another accessible example of 4-D data (age, sex, class, survival).
As for hyperplanes, think about this:
- a line splits a circle
- a plane splits a sphere
- a hyperplane splits a ... whatever comes next.
Squiggly or blobby higher-dimensional shapes are called "manifolds", by the way.
Below are examples of functions that map data to higher dimensions.
- $f(x,y) = (x, x, x, y, y)$. $f$ is like copying and pasting whole columns in a table or array. $f$ maps 2 dimensions to 5.
- $g(x,y) = (x, \ x+1, \ x^2, \ x \cdot y, \ y^3 + y^5 - y^4 + 55)$. $g$ maps 2 dimensions to 5.
- $h(a, b, c, d, e) = (a + b + c + d + e, \ a \cdot b \cdot c \cdot d \cdot e)$. $h$ maps 5 dimensions to 2.
- Here is another function $j$ that maps 5 dimensions to 2: $j(a, b, c, d, e) = (a, e)$. $j$ is like erasing the middle columns of your data. In math books this may be called projection.
- Here is one, $\theta$, that's used in a common SVD example: separating an inner ring from an outer ring. $\theta(x,y) = (x^2, \ \sqrt{2} \cdot x \cdot y, \ y^2)$ (that's mapping how many to how many dimensions?)
- Here is one more function just to demonstrate that letters, numbers, or greeks can be used: $\gamma_{\eta}(a, b, c, d, e) = (0, 5)$. This one has a complicated name but it maps any 5-dimensional input to the same 2-D point, which is a "trivial" thing to do.
As you can see you just count the commas inside the parentheses to figure out how many dimensions you've got (plus one).
Hope this helps.