✨🧠 Building @GoogleAI for everyone, and for every platform. There has never been a more exciting time to be a compiler nerd who is into machine learning. 👩💻
Oct 8, 2018 • 7 tweets • 5 min read
TIL:
💃Good folks at @UniofOxford tagged+categorized pose categories in several episodes of "Buffy the Vampire Slayer"
📃"2D pose estimation in TV shows" is a body of academic work
🙌I've a heretofore unrealized desire to determine which BuffyPose has the highest % of frames
✨🧠
(1) Train a model on the annotated example BuffyPoses;
(2) have it cycle through every episode of every season of "Buffy";
(3) determine BuffyPose with the greatest percentage of frames;
(4) don't forget to take ample time in S01 and S02 to:
Sep 27, 2018 • 14 tweets • 6 min read
📓 Am rereading my class notes from grad school, as well as from mentoring students for @Coursera and @EdX courses on statistics - and thought I'd share the most common mistakes when doing data analysis.
✨Have counted 8 of 'em, with examples - please feel free to add your own!
MISTAKE #1:
Garbage in, garbage out.
🤦♀️Failing to investigate your input for data entry or recording errors.
📊Failing to graph data and calculate basic descriptive statistics (mean, median, mode, outliers, etc.) before analyzing it in-depth.
Sep 18, 2018 • 11 tweets • 6 min read
🗣Some recommendations for budding machine learning engineers:
(1) Make sure your sample dataset is representative of your entire population - and remember that more data is usually - but not necessarily! - better.
Also consider using image preprocessing tools, like Augmentor. (2) Use small, random batches to train rather than the entire dataset.
⏳Reducing your batch size increases training time; but it also decreases the likelihood that your optimizer will settle into a local minimum instead of finding the global minimum (or something closer to it).
This #nbextension uses a @CodeMirror overlay mode to highlight incorrectly-spelled words in Markdown and Raw cells. The typo.js library does the actual spellchecking, and is included as a dependency.
This extension adds codefolding functionality from @CodeMirror to each code cell in your notebook. The folding status is saved in the cell metadata, so reloading a notebook restores the folded view.
Inspired by the big ol' long list of deep learning models I saw this morning, and @SpaceWhaleRider's love of science-y A-Z lists, I've decided to create an A to Z series of tweets on popular #MachineLearning and #DeepLearning methods / algorithms.
Ready? Here we go:
A is for... the Apriori Algorithm!
Intended to mine frequent itemsets for Boolean association rules (like market basket analysis). Ex: if someone purchases the same products as you, in general, then you'd probably purchase something they've purchased.
So, time to drop some knowledge bombs. Most data scientists aren't taught:
- TCP/IP Protocol architectures
- how to deploy a server
- RESTful vs SOAP web services
- Linux command line tools
- the software development life cycle
- modular functions + the concept of writing tests
- distributed computing
- why GPU cores are important
- client-side vs server-side scripting
..and that's just a subset. If you meet a data scientist who has familiarity with those concepts, it's because they either have a CS or IT background, or they taught themselves.