Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

Feb 12, 2018 • 28 tweets • 18 min read • Read on X

@SpaceWhaleRider

Inspired by the big ol' long list of deep learning models I saw this morning, and @SpaceWhaleRider's love of science-y A-Z lists, I've decided to create an A to Z series of tweets on popular #MachineLearning and #DeepLearning methods / algorithms.

Ready? Here we go:

A is for... the Apriori Algorithm!

Intended to mine frequent itemsets for Boolean association rules (like market basket analysis). Ex: if someone purchases the same products as you, in general, then you'd probably purchase something they've purchased.

cran.r-project.org/web/packages/a…

B is for... Bootstrapped Aggregation (Bagging)!

This is an ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification+regression. Reduces variance, helps to avoid overfitting.

Example: Random Forests.

C is for... Convolutional Neural Networks (ConvNet, CNN)!

Feed-forward artificial neural networks used for analyzing visual imagery. Use multiple layers of perceptrons (nodes), and require minimal processing in comparison to machine learning algorithms.

tensorflow.org/tutorials/deep…

D is for... Decision Trees (my favorite)!

Decision support tools that use a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. C5.0 is the standard.

cran.r-project.org/web/packages/C…
scikit-learn.org/stable/modules…

E is for... Elastic Net Regularization!

Regularized regression method that combines L1 and L2 penalties of lassio and ridge methods. Can be reduced to the linear support vector machine (2014); and full disclosure, I haven't really played with this much.

scikit-learn.org/stable/modules…

F is for... Faster R-CNN! Remember those convolutional neural networks we talked about before? They're gonna come up a lot.

This is newish (2016); introduces a Region Proposal Network (RPN) whose aim is real-time object detection.

arxiv.org/pdf/1506.01497…
github.com/rbgirshick/py-…

@goodfellow_ian

G is for... Generative Adversarial Networks (GANs)!

Closest thing to magic; you probably know them best for creating photorealistic images of scenes/topics. Two neural networks contesting w/ each other.

Link to @goodfellow_ian's workshop from NIPS 2016:
arxiv.org/pdf/1701.00160…

H is for... Hierarchical Cluster Analysis (HCA)!

This is a method of cluster analysis that attempts to build a hierarchy of subclusters -- a multilevel hierarchy -- and a cluster tree (dendrogram). I've only done this in R.

Ex: pvclust() and rpuclust():
cran.r-project.org/web/packages/p…

I is for... Inception! (Which should probably be GoogleNet, but G was taken.)

Revolutionary because it showed CNNs don't have to be stacked sequentially; you can be creative about your structures, and it improves performance and computational efficiency.

cv-foundation.org/openaccess/con…

J is for... the Johnson-Lindenstrauss bound for embedding with random projections!

This states that any high-dimensional dataset can be randomly projected into a lower-dimensional space, while still controlling the distortion. (Dimensionality reduction.)

scikit-learn.org/stable/auto_ex…

K is for.. k-Means Clustering!

The name's a bit explanatory: you attempt to group observations into a subset of k clusters based on proximity to the cluster with the nearest mean (centroid). End result looks like a Voronoi diagram (if you've seen those).

scikit-learn.org/stable/modules…

L is for... Linear Regression!

Which, I kid you not, is a line of best fit given a set of observations. If you've been fitting lines on data in Excel and seeing the R-squared values returned, you've technically been doing machine learning. Congrats! 😊

scikit-learn.org/stable/auto_ex…

M is for... Mask R-CNN!

This algorithm does image segmentation (picking out regions of interest in images, like different human beings or items), and is built on top of Fast R-CNN (the "F" example we saw earlier).

Paper: arxiv.org/pdf/1703.06870…
Code: github.com/facebookresear…

N is for... Naïve Bayes!

NB methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features. Scikit-learn has a whole family of 'em!

scikit-learn.org/stable/modules…
cran.r-project.org/web/packages/n…

@scikit_learn

O is for... Out-of-core classification of text documents!

Full disclosure: this is something I haven't played with, but now I kinda want to. With OOCC, you can learn from data that doesn't fit into main memory. Love finding new @scikit_learn tools!

scikit-learn.org/stable/auto_ex…

P is for... Principal Components Analysis (PCA)!

This method uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

scikit-learn.org/stable/modules…

Q is for... Quadratic Discriminant Analysis!

This is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data & using Bayes’ rule. The model fits a Gaussian density to each class. Compare to Linear DA.

scikit-learn.org/stable/modules…

@Microsoft

R is for... @Microsoft's ResNet!

152 layer deep convolutional neural network architecture that set new records in classification, detection, and localization - and has an error rate of 3.6%. Good going, @MSFTResearch Asia!

arxiv.org/pdf/1512.03385…

S is for... Shake-Shake Regularization!

Motivation: resolve overfitting. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10+CIFAR100 by reaching test errors of 2.86% & 15.85%.

arxiv.org/pdf/1705.07485…

@GoogleBrain

T is for... @GoogleBrain's Transformer!

Performs a small, constant # of steps (chosen empirically) for machine translation. Applies a self-attention mechanism which models relationships between all words in a sentence, regardless of respective position.

research.googleblog.com/2017/08/transf…

U is for... Uh. *cough* sUpport Vector Machines? *cough*

Set of supervised learning methods used for classification, regression and outliers detection. Effective in high-dimensional spaces, and quite versatile - though have lost popularity recently.

scikit-learn.org/stable/modules…

V is for... VGG Net!

Reinforced the notion that convolutional neural networks have to have a deep network of layers in order for the hierarchical representation of visual data to work. 7.3% error rate in 2014, which isn't bad!

arxiv.org/pdf/1409.1556v…

W is for... Wake-Sleep Algorithm!

Unsupervised; for a stochastic multilayer neural network. Adjusts parameters to produce a good density estimator. There are two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately.

cs.toronto.edu/~fritz/absps/w…

X is for... eXpectation Maximization!

Iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.

www2.ee.washington.edu/techsite/paper…

Y is for...
YELLING AT THE COMPUTER

(sometimes it works?)

Z is for... ZFNet!

Old school (2013); trained for 12 days on 1.3M images; achieved ~11% error rate. Similar to AlexNet (15M). Used ReLUs for activation functions, cross-entropy loss for error function, & trained using batch stochastic gradient descent.

arxiv.org/pdf/1311.2901v…

@arxiv_org

Whew! That was fun. 😊

Keep an eye on this thread! There are so many #MachineLearning & #DeepLearning models I haven't added yet, and so many waiting to be created. Will also add links to papers as they're placed on @arxiv_org and research blogs.

Learn like you'll live forever!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @DynamicWebPaige

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

@DynamicWebPaige

Oct 8, 2018

@UniofOxford

TIL:

💃Good folks at @UniofOxford tagged+categorized pose categories in several episodes of "Buffy the Vampire Slayer"

📃"2D pose estimation in TV shows" is a body of academic work

🙌I've a heretofore unrealized desire to determine which BuffyPose has the highest % of frames

✨🧠

robots.ox.ac.uk/~vgg/data/buff…

(1) Train a model on the annotated example BuffyPoses;

(2) have it cycle through every episode of every season of "Buffy";

(3) determine BuffyPose with the greatest percentage of frames;

(4) don't forget to take ample time in S01 and S02 to:

💁‍♀️(so, so happy rn)

Read 7 tweets

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

@DynamicWebPaige

Sep 27, 2018

@Coursera

📓 Am rereading my class notes from grad school, as well as from mentoring students for @Coursera and @EdX courses on statistics - and thought I'd share the most common mistakes when doing data analysis.

✨Have counted 8 of 'em, with examples - please feel free to add your own!

MISTAKE #1:
Garbage in, garbage out.

🤦‍♀️Failing to investigate your input for data entry or recording errors.

📊Failing to graph data and calculate basic descriptive statistics (mean, median, mode, outliers, etc.) before analyzing it in-depth.

👉EXAMPLE #1:
It's easy to make bad decisions on shoddy input! Here you see an outlier's impact on descriptive statistics.

Also: always consider the uncertainty in your measuring instruments. Just because you've gotten an *accurate* value doesn't mean it's *actually* correct.

Read 14 tweets

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

@DynamicWebPaige

Sep 18, 2018

🗣Some recommendations for budding machine learning engineers:

(1) Make sure your sample dataset is representative of your entire population - and remember that more data is usually - but not necessarily! - better.

Also consider using image preprocessing tools, like Augmentor.

(2) Use small, random batches to train rather than the entire dataset.

⏳Reducing your batch size increases training time; but it also decreases the likelihood that your optimizer will settle into a local minimum instead of finding the global minimum (or something closer to it).

@TensorFlow

(3) Make sure the data that you're using is standardized (mean and standard deviation for the training data should match that of the test data). 📊

If you're using @TensorFlow, standardization can be accomplished with something like tf.nn.moments and tf.nn.batch_normalization.

Read 11 tweets

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

@DynamicWebPaige

Apr 12, 2018

@ProjectJupyter

1) @ProjectJupyter Extension of the Day: Spellchecker!

This #nbextension uses a @CodeMirror overlay mode to highlight incorrectly-spelled words in Markdown and Raw cells. The typo.js library does the actual spellchecking, and is included as a dependency.

…r-contrib-nbextensions.readthedocs.io/en/latest/nbex…

@ProjectJupyter

.@ProjectJupyter Extension of the Day #2: Codefolding!

This extension adds codefolding functionality from @CodeMirror to each code cell in your notebook. The folding status is saved in the cell metadata, so reloading a notebook restores the folded view.

…r-contrib-nbextensions.readthedocs.io/en/latest/nbex…

@ProjectJupyter

.@ProjectJupyter Extension of the Day #3: ExecuteTime!

This extension displays when the last execution of a code cell occurred and how long it took. The timing information is stored in the cell metadata, restored on notebook load, & can be togged on/off.

…r-contrib-nbextensions.readthedocs.io/en/latest/nbex…

Read 9 tweets

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

@DynamicWebPaige

Dec 1, 2017

So, time to drop some knowledge bombs. Most data scientists aren't taught:

- TCP/IP Protocol architectures
- how to deploy a server
- RESTful vs SOAP web services
- Linux command line tools
- the software development life cycle
- modular functions + the concept of writing tests

- distributed computing
- why GPU cores are important
- client-side vs server-side scripting

..and that's just a subset. If you meet a data scientist who has familiarity with those concepts, it's because they either have a CS or IT background, or they taught themselves.

@hadleywickham

So be thankful if folks are following along! 😀

And be mindful that sometimes more detailed, patient, lower-level explanations are necessary - especially when writing docs.

R is fantastic at this: for example, @hadleywickham's httr vignette.

cran.r-project.org/web/packages/h…

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

Try unrolling a thread yourself!

More from @DynamicWebPaige

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

👩‍💻 DynamicWebPaige @ 127.0.0.1 🏠

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!