Just finished Andrew Ng's ML course^{1} and I can wholeheartedly recommend it as a nice balance of both exposition and handson work. Was a bit surprised though when, around logistic regression, maybe thirdway through the course, prof. Ng said that in his experience of "walking around Silicon Valley" many startups don't use concepts beyond of what was covered thus far. Sounds encouraging for a beginner  where to next?
I've seen two recommended approaches: either dive deep with the Deep learning book or go for An Introduction to Statistical Learning. For terser read, 'All' of Statistics is also often listed as the next step  coincidentally, its author strongly expresses why grounding in math is crucial:
Students who analyze data, or who aspire to develop new methods for analyzing data, should be well grounded in basic probability and mathematical statistics. Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a bandaid.
— Wasserman
I don't know how widely shared this belief is, but here's more:
Machine learning is statistics minus any checking of models and assumptions.
— Brian D. Ripley at useR! 2004, Vienna
Of course the more the math the better. But math is broad and life short, so you have to pick your ground  otherwise you might end up in set theory basement along with Georg Cantor. For me, even if ML really is statisticswithbettermarketing, the field's broad applicability and the relative longevity of the underlying knowledge is what's endearing. Especially when compared to building a fancy JSON viewer, aka an app.
OTOH, to take the jab at apps back, the latest announcements from I/O and WWDC conferences will only give boost to algorithm marketplaces and further abstractions: C++ >> Python >> TheanoTensorflowCNTK >> Keras >> Heroku for AI? Apple's Core ML only requires dropping in a trained model  prompting further focus on marketplaces of prebuilt MLmodels. These will solve the majority of business problems and commoditize ML further. Whereas in the world of 'mere' apps, my anecdata still shows the struggle of keeping an app from devolving into a ball of mud. Maybe building a scalable system really does take more 'ingenuity'. But maybe something akin to DeepCoder will prove otherwise.
Anyhow, here are the solutions in case you get stuck figuring out the Octave minutia of vectorization (or that ~
is for ignored return value, ...
for line continuation, etc.) and can thus spend more time grokking the concepts.^{2}

The Coursera syllabus is actually a watered down version of Stanford CS 229, so the title is a bit cheeky. ↩

This breaches the Coursera Honor Code. I reason that freely available solutions diminish worth of a (paid) Coursera Certification  which is debatable anyway. And really, whoever is inclined to (self)cheat would have no problem finding these solutions elsewhere. ↩