A.I. Experiments: Visualizing High-Dimensional Space

A.I. Experiments: Visualizing High-Dimensional Space


[MUSIC PLAYING] DANIEL SMILKOV: Hi, I’m Daniel. MARTIN WATTENBERG:
Hi, I’m Martin. FERNANDA VIEGAS:
Hi, I’m Fernanda. Machine learning
is pretty complex. So we’ve been
experimenting with ways to visualize what’s happening. There’s a core concept
in machine learning called high-dimensional space. Here’s one way to wrap your
head around this concept. You can think about people
as being high-dimensional. For example, take
famous scientists. You can think about when they
were born, where they were born, their fields of study. Each of these is like a
dimension of that person. These dimensions become
difficult to untangle when you think about
different people, because someone might
be similar in some ways, but very different in others. MARTIN WATTENBERG: But
this is the kind of thing you can use machine
learning for. With machine
learning, the computer isn’t told the meaning
of these dimensions. It just sees them as numbers. And it sees each set of
numbers as a data point. But by looking across all
of these dimensions at once, it’s able to place related
points closer together in high-dimensional space. DANIEL SMILKOV: Here’s
a concrete example where words are treated as
high-dimensional data points. The important thing
to remember is that we haven’t told the
computer the meaning of words. Instead, we’ve shown it
millions of sentences as examples of how
words get used. Here is a visualization
of the results. We’re looking at
a subset of words that the computer
has learned about. Each dot represents one word. Each word is a data point
with 200 dimensions. Using a technique called
t-SNE, the computer clusters words together
that it considers related. And clusters
form-base the meaning, even though we’ve never taught
it the meaning of words. Here is a cluster of
numbers, months of the year, words related to space, people’s
names, cities, and so on. FERNANDA VIEGAS: We
can also look closely at smaller sets of words. If we search “piano,”
we can run t-SNE only on words related to “piano.” We get clusters of composers,
genres, musical instruments, and more. MARTIN WATTENBERG:
And this approach doesn’t just work from words. For example, you can
also treat an image as a high-dimensional
data point. Here’s a dataset
where lots of people wrote digits between 0 and 9. People write in
all kinds of ways. So the question is, instead
of us needing to manually code rules for all the
ways people write, could a machine figure it out
itself using machine learning? Each image is 784 pixels. The computer treats each
pixel as a dimension. Again, using t-SNE, it
clusters these images in a high-dimensional space. We’ve color-coded them so
that it’s easier for us to see what’s going on. And you can see groups of
digits clustering together. It’s learned something about
the meaning of these digits. FERNANDA VIEGAS: These
visualizations techniques we’ve been exploring can be
useful for all kinds of things. That’s why we’re working on
open sourcing all of this as part of TensorFlow
so that anyone can use these tools
to explore their data. [MUSIC PLAYING]

100 thoughts to “A.I. Experiments: Visualizing High-Dimensional Space”

  1. good video ,i just have a question what is the difference between these two algorithm the pca and tsne ? i already have some information about the pca but i need to practice more and more , and i want to discover the new algorithm tsne

  2. Which "inputs" are used here to descripe one word? Is one word made up of nearby words in the input texst or what else?

  3. That's amazing! But what about sentence structures? They are quite different for languages around the world.

  4. Would be awesome if you had a sort of "3D" search engine where you could search the words and you could surf around them or something.

  5. If words can be treated as vectors, does that mean you can do vector analysis with words?

    What Is this I don't even…

  6. When now the machine can figure out the meanings without us feeding them to it I can't become surprised that AI is eventually gonna present itself as an undeniable fate.

  7. You could use a neural network to analyze data of the most active tinder users (let's suggest they are also succcessful ones) and build an AI that gets you dates and laid

  8. I'm not into computer science, but to me, it looks like the internet is slowly turning into one giant single AI where every user is just a cluster of data.

  9. T-SNE for math equation solving, music generator, translation, more intelligent internet, and also for login security transactions in blockchain fingerprint and faceID learning recognition, just about anything, I wonder if it can work for Data compression so it can make internet faster and learn from it.
    High Dimensional Space is a fake 3D that software can interpret visually as 3D,
    BUT! I am assuming it doesnot understand it as 3D, unless it has additional 3D software to correlate it with.
    This can be amazing for a new more sophisticated Blockchain, something like a more advance CryptoCoin, something unique and private by Google.
    Combine that with Duplex, add Cache learning, and make it a 3D blockchain software all in one, it will only need software to understand how humans lie and what is real from what is fake.
    And adding Rosseta Stone in 3D blockchain, you have something amazing for a newer server to work with.

  10. @ 2:54, What kinds of things? How about making it useful for just one thing first. Show one practical example.

  11. What means of dimensionality reduction is preferred for word embeddings? Principle component analysis, linear discriminate analysis, or something else? I'm just projected them down to 2-d in the past but hate how much expressiveness is lost in so doing

  12. The same technique described in the video used for clustering the scripts can also be used to cluster fundamental speech units (phonemes). Its higher dimensional data (vector) point's movement along the time dimension during a speech can be traced (in any relevent plane(s)) to obtain a unique language independent but phoneme dependent script. It is a straight forward way to visualize (space based representation) any audio / sound / vibrations (time based information). In my own philosophical terms 'forms' can be dervied from 'names' and vice versa.

  13. Does the AI arranges the information in a brain like shape intentionally, or is it a random occurrence?

  14. I wonder if "meaning" maybe is not the correct term, because how I understood this it only "learns" about a relation to the other numbers for example but not the meaning of one of them?

  15. So the data is basiically creating a medium of all points of views and not specifics or factual information. Basically forming a common theory. Which could be wrong if one great scientist has just followed anothers theory?

  16. even with high dimensions of data the computer handles serially, we still see in 2d. there is no way of seeing in 3d, unless perhaps there's a universe where light travels through solid atoms, and bounces of them all also, and somehow your eyes can read the distance the light has come from, and your eyes are limited to a certain resolution of distance, and all that 3d space of atoms is effortlessly interpreted all at once, almost feeling like god looking at everything. but anywho… lol

  17. I am trying to produce a T-Sne visualisation through this website: http://projector.tensorflow.org/. I upload a tsv file including x,y and PCA values. The file is loading for many hours and there is no result. Is this the correct way to do it? I prefer to avoid pure scripting.

  18. is the length of vectors an indicator of similarity between objects or the direction of vectors? the video seemed to suggest its the length

  19. can you please tell me how to write my metadata file in order to visualize labels on data points. beacause my tensors are not matching with the no. of observations in my metadata.tsv file

  20. this technology is way to dangerous it can be use to identify or target vulnerable individuals in their respective stream.
    but on the brighter side it makes daily routine whole lot easier

  21. sne and t-sne are not the only methods ( and have their own drawbacks, as every method)…
    a useful tool in a tool box
    noch much, not less

  22. Outdone by UMAP. Faster computation for larger data sets. Uniform Manifold Approximation and Projection. I have been using it for a research dataset I have classifying induced phytochemical responses to stimuli in plants. It works exceptionally well. There is an R package available "UMAP". It doesn't handle tibbles, only data frames. Otherwise it is very intuitive.

  23. Hi, everyone.
    I am doing classification using 66-features (all features values are numerical values). Can we visualize such type of classifications?

  24. Actually word can be converted to vector based on the nearest word which is present based on that you find meaning

  25. information made out of geometric form in an infinite amount of
    dimension since its easier to stack a continuum of information, this is
    how we are made, we have 144 of these genetic geometric form in multiple
    dimension that makes up our dna, that means we are still limited as an
    artificial intelligence in term of being able to stack information,
    hence grow and learn, we can only go as far as our present mortal life
    allows, I believe in 6th dimension where our soul reside this
    informative fractal field contains more geometric form in a higher
    dimension allowing more information stacking, and gathering

    basically dimension is just how the universe and us stack information in
    an intuitive and efficient way, think of cleaning a working space in 3d
    where you have a desk, one thing would be in a box inside a box, then
    this thing inside a box wich is inside another box take place on your
    desk, you would then put it inside your desk to clean space and organize
    , this is the fractal way of stacking code of information, i believe we
    are close to developping self aware artificial intelligence able to
    grow and learn if only we could model a geometric language in multiple
    dimension wich is then arrange in the type of code or intelligence you
    want

    something that is inside can also be outside, intertwining universe,
    freemasonic symbolic cartoon like the simpson made a paradox universe
    intro where the camera start zoomed into an atom inside homer hairs,
    this distance then grow to show him sitting on the sofa with his family,
    then zoom out to see the earh, sun, stars system, galaxy universe, only
    to back in to an atom again and inside homer hair, and i believe this
    is freemasonic truth of our the multiverse works, we live inside of us,
    wich is also outside of us, crazy dimension that is

  26. This is a so damn interesting work from a small team. And they explained it so well, someone like me who doesn't know a thing about data visualization could grasp the concept behind all of this. –And it's open source.

Leave a Reply

Your email address will not be published. Required fields are marked *