I’ve been doing crossword puzzles daily for many years. Apart from doing a puzzle before grad school exams to warm up my neurons, I never saw much of a connection between puzzles and my day job...
There have been a number of papers at NIPS that use Hamiltonian Monte Carlo , and I thought I’d share a Javascript implementation of the algorithm that I wrote a couple years ago. It can be a f...
When considering data analysis questions, I often think of this passage from “The Wizard War” by R.V. Jones, head of British scientific intelligence during World War II. > One salutary i...
Since Cornell is such a big place, departments have individual graduation ceremonies where we can give students more individual recognition. I was recently invited by the Information Science stud...
There was a conversation on Twitter about the current state of Mallet . My goal for Mallet is that it should do a few things very well. Future development will focus on making the process of usin...
Bag-of-words models are surprisingly powerful, but there are often cases where several words are really a single semantic unit. How we handle these terms can have a major impact on how well we ca...
The Google n-gram viewer has become a common starting point for historical analysis of word use. But it only tells us about individual words, with no indication of their context or meaning. Seve...
Modern datasets are often large, complicated, and disorganized. Clustering algorithms create data-driven organizations. These algorithms include a wide range of methods, from k-means to mixture m...
I’ve been using this blog as a more philosophical platform, this is going to be about some new features in the machine learning package that I work on, Mallet . One of these, LabeledLDA, is som...
The New York Times has an article titled For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights . Mostly I really like it. The fact that raw data is rarely usable for analysis with...
I was reading a paper the other day and came across the word aleatory. This turns out to be an excellent word. It comes from the Latin alea for “dice”, as in alea jacta est, which is what yo...
One of my students recently asked me for advice on learning ML. Here’s what I wrote. It’s biased toward my own experience, but should generalize. My current favorite introduction is Kevin M...