Deep Learning, NLP, and Representations

July 8, 2014

In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way.

But despite the results, we have to wonder… why do they work so well?

This post reviews some extremely remarkable results in applying deep neural networks to natural language processing (NLP). In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it’s a very elegant perspective.

Read the full post on my new blog!


Fanfiction, Graphs, and PageRank

July 7, 2014

On a website called, users write millions of stories about their favorite stories. They have diverse opinions about them. They love some stories, and hate others. The opinions are noisy, and it’s hard to see the big picture.

With tools from mathematics and some helpful software, however, we can visualize the underlying structure.

Graph of Harry Potter Fanfiction, colored by ship


In this post, we will visualize the Harry Potter, Naruto and Twilight fandoms on We will also use Google’s PageRank algorithm to rank stories, and perform collaborative filtering to make story recommendations to top users.


Read the post on my new blog!

Neural Networks, Manifolds, and Topology

April 9, 2014

Recently, there’s been a great deal of excitement and interest in deep neural networks because they’ve achieved breakthrough results in areas such as computer vision.

However, there remain a number of concerns about them. One is that it can be quite challenging to understand what a neural network is really doing. If one trains it well, it achieves high quality results, but it is challenging to understand how it is doing so. If the network fails, it is hard to understand what went wrong.

While it is challenging to understand the behavior of deep neural networks in general, it turns out to be much easier to explore low-dimensional deep neural networks – networks that only have a few neurons in each layer. In fact, we can create visualizations to completely understand the behavior and training of such networks. This perspective will allow us to gain deeper intuition about the behavior of neural networks and observe a connection linking neural networks to an area of mathematics called topology.


A number of interesting things follow from this, including fundamental lower-bounds on the complexity of a neural network capable of classifying certain datasets.

Read more on my new blog!

Visualizing Functions On Groups

January 16, 2014

Functions of the form G \to \mathbb{R} or G \to \mathbb{C}, where G is a group, arise in lots of contexts.

One very natural way this can happen is to have a probability distribution on a group, G. The probability density of group elements is a function G \to \mathbb{R}.

Another way this can happen is if you have some function f: X \to \mathbb{R} and G has a natural action on f‘s domain – if you care about the values f takes at a particular point x, you are led to consider functions of the form g \to f(gx). For a specific example, the intensity of a particular pixel, x, in a square gray-scale image, f: [0,1]^2 \to \mathbb{R}, subject to flips and rotations, can be considered as a function D_4 \to \mathbb{R}.

Basic Visualization

Recall that we can visualize finitely generated groups by drawing Cayley Diagrams. (There’s a nice book, Visual Group Theory by Nathan Carter, that teaches a lot of basic group theory from the perspective of Cayley Diagrams.)


The natural way to visualize functions on groups is to picture them as taking values on the nodes of the Cayley Diagrams. One way to do this is by coloring the nodes. In the following visualization of a real-valued function on D_4, dark colors represent a value being close to zero and light colors close to one.


Read the rest of this entry »

The Death of a Squirrel

August 25, 2013

(Trigger warning: descriptions of severe animal injury.)

Today a squirrel was hit by a car a few feet away from me while I was walking down the side walk.

Read the rest of this entry »

Order Statistics

August 16, 2013

What is the distribution of the maximum of n random variables? What started out a utilitarian question in my exploration of some generalized versions of the secretary problem turns out to be quite a deep topic.


(Note that I have little background in probability and statistics. Please forgive (and inform me of, so I can fix!) errors of notation or language. Or general confusion on my part.)

Read the rest of this entry »

Topology Notes

June 14, 2013

I’ve been talking about writing a topology textbook introductory notes on topology for years. Basically since I wrote my Rethinking Topology (or a Personal Topologodicy) post 2 years ago — it’s hard to believe it’s been that long!

In any case, I finally started writing it. I’ve done a mild review of existing introductions to general topology (ie. I skimmed through the first few chapters of a dozen topology textbooks), so I feel somewhat comfortable contrasting my work to existing literature. It’s quite a different approach.

Topological Anatomy: Closure, Interior, and Boundary

Topological Anatomy: Closure, Interior, and Boundary

I initially develop topology based on closures and adherant points. Kuratowski’s closure axioms are then built up with natural explanations. Emphasis is given to the variety of possible definitions (along the lines of Lakatos et al’s Proofs and Refutations) and exercises encourage the reader to explore the variety of possible definitions. I attempt to justify the axiomatic approach in a manner similar to Pinter’s wonderful A Book of Abstract Algebra, though I may fall very short. From here, we build intuition for closure, boundary, and interior with some diagrams and proofs of identities. Finally, we wrap up the first chapter with a visual interpretation of the closure axioms.

Arrows represent closure and lines superset in a visualization of the indiscrete closure operator on {1,2}.

The indiscrete closure operator on {1,2}

(You can find the most recent version of the book on github.)

Read the rest of this entry »

How My Neural Net Sees Blackboards (Part 2)

June 9, 2013

Previously, I discussed training a neural net to clean up images. I’m pleased to say that, using more sophisticated techniques, I’ve since achieved much better results. My latest approach is a four layer convolutional network. Sadly, the convolution throws away the sides of the images, so we get a black margin. In any case, compare the results:

Original Image -- dirty blackboard image to be cleaned with neural net

Original Image

Blackboard image after being cleaned by 3 layer neural net.

Attempt 1: 3 fully connected layers

Blackboard image cleaned by 4 layer convolutional network.

Attempt 2: 4 convolutional layers.

Read the rest of this entry »

I’m Sick and Tired of 3D Printed Guns

May 29, 2013

For the last few months, every time someone hears that I work with 3D printers they bring up 3D printed guns. I can’t say how many times it has happened in this month alone. And I’m getting really really tired of it.

“They’re the killer app of 3D printers.” What a great pun. You don’t know how funny you are. “They’re the most important thing that will come out of 3D printing.” Everyone who is trying to do amazing things with 3D printers is thrilled to hear you say that.

The only thing that really matters about 3D printed guns is the hype.

You can make a gun with a lathe. You can make a gun with hand tools. And if you did that the gun might actually work.

To the extent guns are “censored” it only takes a tiny bit of effort to get around as it is. (And even if you don’t think the government should control who has guns, there being some barrier to entry for guns is likely still a good idea.)

Some day it will be easy to make guns with 3D printers. And they’ll be completely over shadowed by the massive explosion of innovation 3D printing will cause, as mechanical engineering becomes more like programming and custom objects from parametrized designs become common place. The impact isn’t a single shining glamorous 3D printed object.

But for now all that 3D printed guns and the media hype around them are causing is a bunch of ignoramuses clamouring for the regulation of 3D printers.

And, yes, they’ll fail. Regulating 3D printers is no more practical than, say, copyright.

But you know what? We still have stupid copyright laws that ruin random people’s lives and complicate everyone else’s.

And you know what else? The last time I crossed the US border with a 3D printer I was searched and interrogated for an hour because they were worried I’d print things, sell them, and disrupt the US economy. I’m just not going to find out what would happen if I did this now.

And you know what else? The last time the police decided to put intense scrutiny on my community, it really sucked.

People have the right to 3D print whatever they want. But 3D printing guns is a strategically bad idea that doesn’t really have much impact otherwise. And the hype over 3D printed guns is toxic.

How My Neural Net Sees Blackboards

May 11, 2013

For the last few weeks, I’ve been taking part in a small weekly neural net study group run by Michael Nielsen. It’s been really awesome! Neural nets are very very cool! They’re so cool, I had to use them somehow. Having been interested in mathematical handwriting recognition for a long time, I decided to train a neural net to clean up images of blackboards as a short weekend project/break. Thanks to Dror Bar-Natan’s Academic Pensieve, it was easy to get a bunch of data.

Etingof-130323-110039 Etingof-130323-110039
Etingof-130323-110039 Etingof-130323-110039
Etingof-130323-110039 Etingof-130323-110039

Read the rest of this entry »

Singularity Summit Talk

May 7, 2013

As readers of this blog are probably aware, I’m a rather big fan of multiplicative calculus, an obscure mathematical tool that is very useful in reasoning about quasi-exponential trends. And I’m rather sad that I get few occasions to apply it. So, it should come as no surprise that when I was given an opportunity to speak at Singularity Summit last fall, a conference largely concerned with exponential trends in technology, I decided to try to persuade people of the utility of multiplicative calculus and the value of mathematical abstractions.

Now, there was a bit of confusion because a lot of people thought that I was going to be speaking about my work — that’s what all the other Thiel Fellows did — but I think this was much more valuable. The things I do are generally of very domain specific interest and don’t have super deep implications world-changing for the future of technology, just some minor, incremental improvements. In any case, you can watch me and the other fellows talk.

Chris Olah speaking at singularity summit

My public speaking skills aren’t the greatest, and it wasn’t the best talk I’ve given… but it was pretty cool when Vernor Vinge came up to me and told me that he liked my talk! (And I think Ray Kurzweil may have said so as well, but I wasn’t sure if it was him.) And then I and the other fellows had dinner with Eliezer Yudkowsky (and lots of other awesome people, but he sat beside us). So it was a crazy awesome experience in general!

Parsec Patterns

May 7, 2013

At a meeting of the Toronto Haskell User Group a few months back (Jan 9, 2013), after going over the basics of the combinatoric parsing library Parsec, I talked about some clever tricks that made my work with it much cleaner. (See also Alber Lai’s write up of his talk, Parsec Generally.)

Note that I don’t have deep knowledge of Parsec. I’ve just used it for a few things and noticed that some code patterns really cleaned up my code. I’d love to hear about other people’s tricks for writing awesome parsers with Parsec!

(Read More On Github)

Misunderstanding Romantic Attraction

May 3, 2013

One thing I’ve realized in the last year is that I really deeply misunderstood what romantic attraction was. Most people I’ve tried to describe this to have been incredulous, since the things I misunderstood seem trivial to them. Nevertheless, I think that for a certain type of person this may actually be a really difficult point. In particular, I think the way romance is generally talked about and portrayed in our culture is kind of misleading if you can’t read beneath the surface. Personally, I only realized these points after reading neuroscience literature on romantic attraction.

Read the rest of this entry »

Monads For The Terrified

March 20, 2013

It seems like everyone writes a monad tutorial in the Haskell community… Well, here’s mine.

People learning Haskell generally seem to get tripped up about monads. There are good reasons for this.

  1. Monads are, really, needed to write a serious Haskell program. As a consequence, people generally try to learn them fairly shortly after beginning to use Haskell.
  2. Monads are defined in Haskell and have a very short definition. As such, it is very tempting to not just teach someone how to use monads, but to also try to teach them the underlying definition. The definition involves a bunch of rather advanced Haskell. In combination with (1), this is a big problem.
  3. Monads are a very abstract idea. I’m not really sure that the motivation can be properly grasped by anything other than using them.
  4. Monads are generally interweaved with the idea of an IO object.

This introduction attempts to avoid some of these problems. In particular, it is only going to teach you how to use monads and not what they are. This avoids (2) and, to a lesser extent, (3). I’ve tested this approach on a number of people in the last few months, and it seems to be quite effective. That said, I understand that some people may prefer an introduction focusing on the what and why, in which case you may wish to look Chris Smith’s Why Do Monad’s Matter?.

Some day, you will need to learn more. But, if you find you understand the contents of this tutorial, I’d encourage you to wait and play around with monads for a while. Then, when the time comes to learn the in full, they’ll be natural and obvious ideas. We can talk more about that later. For now: onwards!

(Read more on github)

Math Notation Experiment “Paper”

March 20, 2013

A while back, I wrote a post on some unusual math notation I was playing with. I actually took it much further than that and drafted up a paper full of different ideas over the following months. At the time, I was hoping it might be publishable but then got discouraged and never did anything with it… Except show it to people in person and promise I’d put it online soon. In any event, it’s one of those things I’ve been meaning to put up for forever.

So here it is. I’m particularly proud of the quantifier stuff and the numerals.

When one considers how complicated the ideas mathematical notation must represent are, it clearly does quite a good job. Despite this, the author believes that any claim that modern mathematical  notation is the best should be met with extreme scepticism.

Mathematical notation is a natural language: no one sat down and constructed it, but rather it formed gradually by people making changes that are adopted. Most changes are not adopted; whether they are depends on a variety of factors including: mathematical utility, ease of adoption (the average individual doesn’t want to spend hours learning), dissemination, and shear dumb luck. The first two of these qualities are associated to real properties in notation, forming the necessary selective pressure for evolution to occur.

Evolution is a blind watchmaker: the world around us is filled with examples of the stupidity of biological evolution (the classic example being halibut). Similarly, evolution is also a blind language and notation designer. In particular, it is held back by a strong selective force against change, since people would need to adopt it, and so evolution doesn’t effectively explore the full notation-space. This staticism means that even the most outrageous notations remain unchallenged by virtue of age.

Read More…

Please keep in mind that I wrote this (except for the redaction of some sillier sections and some minor improvements) three years ago, when I was quite a bit younger, and it isn’t representative of my present abilities. That said, I do still think that the fundamental idea, that notation is important and should be explored, is very true.

(I’ve also put the paper on github, for those who are interested.)