development,

Resources for learning A.I

Jan 05, 2020 10:40 AM

Resources for learning A.I
Share this

This document is designed to contain the content that I found most useful and deserving of attention from those both interested or passionate about Machine Learning. Much of the content comes from the Proceedings page of my Notion, where my notes are organized and archived.

Preamble:

This is a collection of some of my favorite resources, content, courses, and everything else that I used to for the past 2 years. I wasn’t very consistent with updating it so it’s probably missing a lot but I’ll try to fill in the gaps slowly.

Having taught myself this far, I’ve learned a lot. So much so that I have a separate article detailing what I’ve learned about learning.

There’s a mix of content types throughout the document. You’ll find one-pagers, cheatsheets, article (and series of articles), videos (and series of videos), MOOCs, books, textbooks, and other creative ways education is distributed online. There is a large range in terms of emphasis of attention; some resources are more cosmetically inclined, others are more dense and utility-heavy. But everything here, in my opinion, is well made; or at the very least made with integrity. I am deeply thankful to the creators and providers of such content.

‘Amateur.’ you say that as if it was a dirty word or something, but ‘amateur’ comes from the Latin word ‘amare’ which means love… To do something for the love of it.” - Mozart in the Jungle

Why A.I?

You know nothing:

Where I started when I was 18. Where everyone starts at some point.

Programming

Machine Learning doesn’t require a serious technical background in computer science or software engineering, at least if you’re aiming for something of a hobby or leisure activity. But it most certainly requires basic programming knowledge.

  • Language
    • A fluent knowledge of Python is basically all you need to know to start off. Knowledge in other languages can help, Java (ubiquity and reliability) Javascript (for usable projects) and C++ (For doing research) in particular can be useful.
    • If you want a high level overview of programming languages as a concept itself, check out this series of short videos.
  • Data structures and Algorithms
    • Absolutely necessary upon moving into more difficult A.I algorithms. Emphasis on Stacks, Computational efficiency, and Graphs/Trees. These concepts will come up in the A.I specialization, especially as you conceptualize your own models.
      • Udacity has an incredible course called intro to “data structures and algorithms”, perfect for getting the seasoned programmer started with learning the concepts necessary to nail a developer position interview.
Intermediate programming
  • Thorough knowledge and familiarity with advanced programming concepts like functions and classes is necessary. The quality of your programs will be hugely advanced, and some tools require knowledge in these concepts
  • There is this wonderfully written list of the top mistakes beginner programmers make. Through experience I found a lot of these mistakes to be true, and are often amplified when the programmer in question is self-taught. Calling yourself out on these mistakes will make you a better programmer.
  • To absolutely master Python, one should be aware of tips and tricks like this, this, and this. You should know what are, and how to use dunder methods, decorators, and design patterns.
  • Hilariously accurate, the response to this thread on Stack Exchange outlines the progression that most Python programmers will experience.
  • Be brutally honest and constantly inquisitive with your study in computer science; it will make a difference in everything else you learn and do. Here’s a list of Python’s best kept secrets that spell the difference between boys and men
  • Built by MIT, this course is your one stop shop for learning above and beyond what you could possibly learn in school. Here’s the missing semester of your CS education
  • Organize your developer life with Git
    • Read the Git flow and Github flow branching models
    • Here’s a great summary/cheatsheet for git
    • GitHub and GitLab are two very popular repository management services that each have their uses
      • Get familiar with Issues, Pull Requests, Licenses, OAuth, and automation tools like Github Actions
  • Start learning terminal-level dev tools
  • Go from command to Powershell
Intermediate programming
  • Advanced Programming
    • To reach this point, one should have a strong foundation built with OOP, Data structures/Algorithms, and skills with some handy dev tools. Fluency in at least one language as well as some background in other languages (ideally one that is different from your native language) is beneficial.
      • Design Patterns have their criticisms, but are useful as they are ubiquitous. If you can a hand of the OG GoF textbook online, definitely give it a go, otherwise, settle for “Head first: Design Patterns”, or “Dive into Design Patterns”
      • Youtube has a great wealth of design pattern resources, one of my favorites of which is a playlist by Christopher Okhravi
      • This brief video by William Lin, a competitive programmer provides a great outline on how to get into programming
    • Also start learning vim
      • Start with vimtutor (on mac and linux type in vimtutor in your terminal, on Windows… tough luck)
      • Also have a peek at some of the many vim cheatsheets (there are many, I just linked the most generic one)
      • In the meantime, use something like Anki to rote memorize commands
      • Anki is for learning words, the vim grammar is for writing poetry
      • Indoctrinate yourself as a vim vimmer by watching the top two (1, 2) vim tutorials
      • Use vimgenius to practice, or vim adventures if you’re into that kinda thing
      • Pick up some basic vimscript along the way, and learn about .vimrc
      • Get destroyed by the pros over at vimgolf
  • General Programming resources
    • I literally worship Runestone Academy’s library of courses. Each little course serves as a great touch up on topics you may have encountered before, or as a way to get your feet wet in something new, as I have with the Java for Python programmers coursette
    • Learn x in y minutes is a collection of “tours” for a variety of programming stuffs
    • This Information-dense course by Stanford is called “The Missing Semester of your CS education”, and it truly lives up to its name.
    • If you like learning by analogy or comparison, have a look at “Rosetta Code”, a wiki of hundreds of programming languages demonstrating hundreds of tasks.
    • There’s also “Hyperpolyglot”, a simple site with side by side comparison charts of different programming languages organized by various catagories. Keep in mind the language versions in the demos are very old (the site isn’t maintained).
  • Language Specific Resources
    • Python
      • Socratica is a YouTube channel for learning science and Python; their Python course is probably one of the best in quality, time, and understandability on the internet.
      • Codeacademy, something of a gatekeeper for learning coding — Just don’t use this alone and call it a day.
      • The Real Python website is the internet’s best collection of in depth tutorials and lessons on literally anything to do with Python. I would tell you to read everything but it would probably take years to go through all the amazing content.
        • And if you’re going to learn Python, it’s mandatory that you know NumPy.
    • Javascript
    • Java
      • This is jumping ahead a bit but the Algorithms in Java textbook by Robert Sedgewick is great for learning the ins and out of Java as well as algorithms in CS

Mathematics

  • General purpose
  • Linear Algebra
    • Khan Academy yes, 3Blue1Brown yes, but there are alot of other good stuff out there
      • Have a look at MIT’s Linear Algebra course (taught by the great Gilbert Strang), available on MIT OpenCourseware’s website.
      • For the best examples of concepts you should learn in lin alg, the Cliffnotes documents has a thorough albeit unstructured collection of notes.
      • I also really like Derek Banas YT channel for his linear algebra series (but his vegan recipes are very interesting)
  • Calculus
    • Important, but not as crucial as Linear Algebra and Data Science. Calculus concepts will come up every now and then, specifically in back propagation (differentiation) and gradient descent (derivatives)
  • Data Science
    • Pretty self explanatory. In machine learning, crap in means crap out. Data needs to be normalized, interpretable, and plentiful. A good machine learning needs to be able to recognize patterns in a given dataset before a machine tries to learn.
      • Udacity pulls through again, with another solid data course you can take for free. “Intro to Data analysis” will teach you the ins and outs of data science with Python, making use of CSVs, pandas, matplotlib, and more.
      • The StatQuest YouTube channel (brought to you by Josh Starmer) is one of the best channels for learning data and statistics for beginners. His dry humor and terrible song-writting makes for an equally informative and entertaining binge watching session
      • Once you understand the theory and concepts, it’s time to learn how to implement. The Data School YouTube channel is the go to channel for all things data, reviewing concepts, highlighting best practices, and providing professional tips.
      • Here’s a repo with a basic cheatsheet of Data Science inspired by this repo with a Probability cheatsheet.
      • Get a grasp of data visualization with this collection of infographics containing all possible graphs, charts, and diagrams.

You know some things:

You are a seasoned programmer. Perhaps you’re coming as an expert from another field like VR/AR, web dev, chemistry, biology, etc. That being said, you have a solid foundation in math and/or coding. A.I is just another skill you want to add to your toolbox.

Things you’ll need to suceed:

Machine learning

  • These models are not Neural Networks. These are your linear regressions, Kernel methods like Support Vector machines, Decision Trees, etc. keep in mind that Machine Learning isn’t about the model, it’s more about the method. Roughly speaking, there are 14 types of machine learning, and as the field develops, there will be more.
    • Udacity’s intro to A.I and machine learning is perfect for these pre-deep-learning concepts. Many times you will find (especially in times of scarce data), that these algorithms do a better job than the more popular neural network architectures.
    • The last resource you should take a look at is this wonderful paper called “A few useful things to know about machine learning”. These are 12 of the key takeaways, tips, and tricks, of machine learning.
    • In addition to the 12 tips, this article written by Patrick Riley of Google Advanced Science, highlights the 3 pitfalls most commonly seen in his real world experience at the intersection of machine learning and science.
    • Yannic Kilcher’s Youtube channel has a measly 3k subscribers but he deserves far more for his break-down of the newest and best machine learning papers. He has various conversations with expert researchers and frequently attends top conference, documenting both in videos of varying length. We need more channels like his.
    • How do you know whether or not you know your stuff? Google Data scien(tist)ce (interview) questions. If you can answer each one without help and within an accceptable degree of accuracy, you know your stuff.
    • Straight from Stanford’s CS 229 Deep Learning course, this repo contains cheatsheets to cover all basic machine learning needs. The same guys also made one for deep learning specifically (in the next section)
    • Mathematicalmonk’s channel. The most consistent and thorough youtube series, covers the explanation down to the math. Watch them all. Trust me it’s worth it.

Deep Learning

  • Neural Networks. The forefront of A.I research and focus. 70% of your time will be spent with NNs
    • Udacity Intro to Deep Learning is an obvious starting point. The course is relatively short and concise, and has just enough projects and content to give you a base level understanding of Deep Learning.
    • The deep learning book, written by Ian Goodfellow and Yoshua Bengio, is all you need to get familiar with the fundamentals of deep learning. This book is especially good at coinciding the math with the concepts.
    • What do neural networks learn? Is a great video that ties together math and neural networks. The video is structured such that even if you don’t know the math, you’ll still understand the computational differences between linear regression to neural nets and everything in between.
    • deeplearning.net and its documentation is one of the old-but-gold sites for learning everything deep learning. Curated by the makers of Theano, you’ll learn everything from Restricted Boltzmann Machines to Monte Carlo sampling. Each tutorial comes complete with a high level look at the math and the code (implemented in Python and Theano).
    • Arxiv Insights YouTube channel keeps to its namesake by publishing high qualiy videos of explainations behind some of the more complex topics in Deep Learning and RL. With proper references, clear visualizations, and thorough analysis, it’s a required watch.
    • ML From scratch’s blog post is the most thorough and holistic blog post on neural network basics. You don’t need any Medium articles or half-assed blog posts after this; everything else is just noise.
    • God bless these Stanford kids who made a repo containing cheat sheets for deep learning basics. It comprises of notes collected in Stanford’s CS 230 Deep Learning course.
    • I don’t know who this guy is, but for the past while, he has consistently put up silent short videos (2 mins on average) on his YouTube channel depcting the use of various Python and PyTorch tips and tricks. I watch these videos when I want to be productive without doing work. It’s fun to interpret what he’s doing and why something makes sense. I think that’s why it’s so compelling (kind of like Primitive Technology).
    • Convolutional Neural Networks, easily the most revered deep learning paradigm since its inception. There are literally thousands of resources, ranging from generic to overcomplicated. And in the eye of the storm, we have this.
    • Applicable to Machine Learning in general as well, but this blog post by Andrej Kaparthy started as a Twitter thread and became the biblical guide for intermediate ML engineers trying to train their models.

Production Tools

  • Knowledge of the A.I libraries and packages available to you (You will probably learn about the packages and libraries as you go). Knowledge of the other tools available to you, like Google Colab (and Cloud). AWS, and various others is important. Understanding which is better for what purpose is key.
    • Udacity’s Intro to Deep Learning with PyTorch is possibly the greatest deep learning course currently available for free online. Created by Facebook A.I, the course covers everything from the fundamental basics of Deep Learning, all the way to CNNs and RNNs, all with guided PyTorch tutorials.
    • Tensorflow has the benefit of being backed by the biggest company in the world, and hence it has the most detailed documentation and series of tutorials than any other production tool on the internet. You can’t go wrong with their proprietary tutorial series.
    • Deep lizard is a YouTube channel that puts out consistent, concise, and high quality content. They have series on neural networks in Javascript and Python, and utilizing libraries from Keras to Pytorch. They even have series on RL. They deserve far more subscribers.
    • Jeff Heaton is probably the most likable machine learning tutor on YouTube. His tutorials cover the basics of deep learning, particularly with Keras and TF, and ranges from AWS and Flask pipelines all the way to visualization and preprocessing. Subscribe, if not for his content, then for his mustache.
    • The entire Google Suite at your disposal, most of which is free, some of which are paid per use. Amazon Web Services is also a must have for large projects or deployment level tasks.
    • Learning to visualize multidimensional data is a machine learning must. This medium article does a fantastic job explaining and demonstrating data visualization using an open source dataset using only matplotlib and seaborn, from 1D to 6D.
    • At surface level, there’s not much difference in between Keras, PyTorch, and TF. It’s only when you become an intermediate user that they each come into their own. It’s stuff like this and this that makes PyTorch consistent, intuitive, and highly flexible for professional users.
    • If you’re a total badass then have a look at JAX (and it’s precursors Autograd and XLA)
    • For a while I hated TensorFlow and refused to use it unless necessary, until I discovered Sonnet, DeepMind’s in house DL Library.
      • DeepMind’s family of libraries (Haiku, RLax) make TensorFlow a bit more approachable, and if it’s good enough for DeepMind, it’s more than good enough for me
    • Parameter optimization
      • So you’ve built your model and want to make sure that it’s the best version it can be. Enter hyperparameter tuning. There are multiple common methods ranging from large scale (GA algos and Grid Search) to more refined and empirical (Bayesian opt and Random Search). There are also multiple frameworks that support their development. Ray Tune is one of the few that has good-enough documentation and is kept up to date.
      • Botorch and Ax are PyTorch extension libraries meant for parameter optimization.

Putting it all Together

  • Once you’ve learned from the somewhat watered-down resources above, review your knowledge with these advanced resources brought to you by some of the most legitamate people in the field of Machine Learning and Deep Learning.
    • Deep Drizzle repo is a collection of the very very academic/university courses (video lectures included). If you can sit through any given video on 2x speed and understand everything without question, you’re a legend

Bonus: A history of A.I

  • Not a hard skill, but certainly very useful to know. Understanding the past provides deeper insight to the situation in the present as well as elucidates certain possibilities in the near-future. This means covering everything from the “symbolic A.I” era and precursor methods like autoregressive models. Learn about the development of Back propagation. You’ll know you’re far back enough when you start hearing more Pearl, Minsky, and Chomsky and less Hinton, Lecun, and Bengio. Learn about how David Rumelhart one-upped the development of neural networks, and how Alex Krizhevsky’s simple idea changed the game.

Blogs and things to follow

  • The ML explained blog is by a Carnegie Mellon university student who curates some quality content about once a month, explaining some more niche areas of deep learning as well as tutorials on building things from scratch.
  • Distill is so well curated that some people actually consider it to be on par with mid to top tier research papers in terms of quality of content. Many people on the notoriously serious ML subreddit also approve (which you should definitely follow).
  • Paperspace Blog is another one of those gems hidden in the pile of noise on the internet. Created by the Paperspace company (who does cloud and A.I stuff), the blog gathers quality content, approved by the company itself, who pays it’s writters with GPU credit (pretty clever).
  • Remember to try and keep your ego and biases away from studying. This Stack Exchange dicussion is an example of why something as simple as citation count should be regarded with skepticism.
  • Here are 7 websites to help you keep up to date with the rapidly changing world of machine learning:

Repositories and Tools

Know More

Not for use as the basis for learning machine learning, but rather as a complimentary to whatever regiment you have set yourself.