👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠 Profile picture
✨🧠 Building @GoogleAI for everyone, and for every platform. There has never been a more exciting time to be a compiler nerd who is into machine learning. 👩‍💻
Oct 8, 2018 7 tweets 5 min read
TIL:

💃Good folks at @UniofOxford tagged+categorized pose categories in several episodes of "Buffy the Vampire Slayer"

📃"2D pose estimation in TV shows" is a body of academic work

🙌I've a heretofore unrealized desire to determine which BuffyPose has the highest % of frames ✨🧠

robots.ox.ac.uk/~vgg/data/buff…

(1) Train a model on the annotated example BuffyPoses;

(2) have it cycle through every episode of every season of "Buffy";

(3) determine BuffyPose with the greatest percentage of frames;

(4) don't forget to take ample time in S01 and S02 to:
Sep 27, 2018 14 tweets 6 min read
📓 Am rereading my class notes from grad school, as well as from mentoring students for @Coursera and @EdX courses on statistics - and thought I'd share the most common mistakes when doing data analysis.

✨Have counted 8 of 'em, with examples - please feel free to add your own! MISTAKE #1:
Garbage in, garbage out.

🤦‍♀️Failing to investigate your input for data entry or recording errors.

📊Failing to graph data and calculate basic descriptive statistics (mean, median, mode, outliers, etc.) before analyzing it in-depth.
Sep 18, 2018 11 tweets 6 min read
🗣Some recommendations for budding machine learning engineers:

(1) Make sure your sample dataset is representative of your entire population - and remember that more data is usually - but not necessarily! - better.

Also consider using image preprocessing tools, like Augmentor. (2) Use small, random batches to train rather than the entire dataset.

⏳Reducing your batch size increases training time; but it also decreases the likelihood that your optimizer will settle into a local minimum instead of finding the global minimum (or something closer to it).
Apr 12, 2018 9 tweets 7 min read
1) @ProjectJupyter Extension of the Day: Spellchecker!

This #nbextension uses a @CodeMirror overlay mode to highlight incorrectly-spelled words in Markdown and Raw cells. The typo.js library does the actual spellchecking, and is included as a dependency.

…r-contrib-nbextensions.readthedocs.io/en/latest/nbex… .@ProjectJupyter Extension of the Day #2: Codefolding!

This extension adds codefolding functionality from @CodeMirror to each code cell in your notebook. The folding status is saved in the cell metadata, so reloading a notebook restores the folded view.

…r-contrib-nbextensions.readthedocs.io/en/latest/nbex…
Feb 12, 2018 28 tweets 18 min read
Inspired by the big ol' long list of deep learning models I saw this morning, and @SpaceWhaleRider's love of science-y A-Z lists, I've decided to create an A to Z series of tweets on popular #MachineLearning and #DeepLearning methods / algorithms.

Ready? Here we go: A is for... the Apriori Algorithm!

Intended to mine frequent itemsets for Boolean association rules (like market basket analysis). Ex: if someone purchases the same products as you, in general, then you'd probably purchase something they've purchased.

cran.r-project.org/web/packages/a…
Dec 1, 2017 8 tweets 2 min read
So, time to drop some knowledge bombs. Most data scientists aren't taught:

- TCP/IP Protocol architectures
- how to deploy a server
- RESTful vs SOAP web services
- Linux command line tools
- the software development life cycle
- modular functions + the concept of writing tests - distributed computing
- why GPU cores are important
- client-side vs server-side scripting

..and that's just a subset. If you meet a data scientist who has familiarity with those concepts, it's because they either have a CS or IT background, or they taught themselves.