Sebastian Ruder Profile picture
Jul 20, 2018 39 tweets 6 min read Twitter logo Read on Twitter
#Repl4NLP at #ACL2018 panel discussion:
Q: Given that the amount of data and computing power is rapidly increasing, should we just quit working on models altogether?
Yejin: Sounds like a good idea for the companies. The more data the better. Please create more data.
Meg: Different people have different strengths. People say: “We should all care about ethics”. Geek out about what you love. Apply yourself to what you. Lots of other things that come to bear besides just working with data, e.g. sociology, psychology, maths, etc.
Important to focus on what you really love. Work with people that have complimentary and different interests.
Yoav: Personally don’t work on huge data. If some company would like to train a huge LM on the entire web, that’d be great to have and analyze.
Graham: 20% percent error reduction on task is only 20% interesting. Lots of tasks that we care about that don’t have massive data, such as other languages, e.g. Swahili.
Ryan: Creating datasets and data annotation is not sexy. Community does not value it.
Yoav: Training a language model does not require expensive data annotation.
Audience: Web is much smaller in non-English world. Data is enclosed in social networks, not easily accessible.
Audience: Models that people are developing don’t scale very well to web-scale data.
Yoav: Running a model on multiple GPUs for many weeks costs what annotating several treebanks would cost. Money seems to be available for certain tasks but not for others.
Audience: Should we keep working on models that incorporate distributional semantics given that they can only take us so far or build hybrid models that incorporate both distributional and denotational semantics?
Graham: Hard to answer. Different ACL papers isolate problems, e.g. context necessary for MT, number agreement, etc. Probing models is important. Examining models to determine if going beyond distributional semantics is necessary is important.
Yejin: Distributional semantics leads to good performance because some contextual information is incorporated without human intuition as bottleneck. Important to go beyond what current distributional semantics models can capture using e.g. hierarchical, memory structures.
Audience: Going towards more human-readable representations by e.g. automatically extracting discrete formal representations (such as in Yoav's talk) might be a good direction.
Audience: How do our current efforts match fare in terms of representation stability across time?
Meg: Representations do well if future is similar / has same distribution as the past. Does not generalize well to different future events.
Yoav: Hard to do in end-to-end settings, but possibly easier to manipulate if we have modules, so maybe move away a bit from end-to-end models.
Audience: Should we focus more on stateful or stateless representations?
Yoav: Should focus on both.
Meg: Depends on downstream task (some need attention/information flow, others don't). A little bit of both.
Audience: Representations don't really outperform a bag-of-sentences, can't really model a narrative, underlying structure.
Meg: True if only done word-by-word. Other things to be done e.g. in generative models using latent variables.
Yejin: A lot of focus on sentence-level tasks, which receive a lot of attention. Some challenging tasks that go beyond the sentence-level, e.g. NarrativeQA.
Yejin: Community pays a lot of attention to easier tasks such as SQuAD or baBi, which are close to solved. Have to solve harder tasks and develop more datasets with increasing levels of difficulty. If a dataset is too hard, people don't work on it.
Yejin: The community should not work on datasets for too long (datasets are getting solved very fast these days), so creating more datasets is even more important.
Yoav: SQuAD is the MNIST of NLP.
Ryan: State-of-the-art LMs don't reset hidden states between sentences, so can incorporate sentence dependencies.
Graham: Nice work on incorporating discourse structure and on incorporating coref structures. Don't be afraid to do something slightly more complicated than an LSTM.
Graham: As a reviewer, don't ask authors to evaluate on SQuAD if people propose a new discourse-driven model for QA or reasoning.
Yejin: We found discourse-level structure to be useful, slightly outperform LSTMs. Structure, however, mainly captured cross-sentence entity relations
Yejin: Mainly better because LSTMs don't work very well across sentence boundaries. Will take several years to develop better models. More work on this is needed.
Q: Human brain is only known system for understanding language. Should we be more inspired by human language learning? Humans is contextual; humans likely wouldn't learn language just by listening to the radio, in a dark room, etc.? Can we learn language just by reading?
Ndapa: Background knowledge is useful. Collecting commonsense knowledge, though, is challenging.
Meg: We're not working with neurotransmitters. Definitely things we can learn from humans, but should not try to exactly replicate human process of language learning.
Meg: History of feral children not exposed to language who fail to learn language later on in life. Part of human language learning is having conversations and exchanging observations. One thing to take away is to get into interactions and interacting with humans is important.
Audience: People who are blind and/or deaf can learn language.
Yejin: Groundedness matters a lot, but perhaps more important is ability to abstract, learn about the world, observe it. Children build a model of the world, ask questions about it and refine the model.
Yejin: Children learn from declarative explanations about how the world works. Robotics field will likely not evolve enough to really enable interactions for language learning. Ability of abstraction, e.g. remembering concepts about Wikipedia is important and might be doable.
Ryan: Psycholinguistics research does not help to build better NLP systems, i.e. does not enable practical models. Some papers can still inspire, though.
Yejin: Do not need to confine ourselves to psycholinguistics but can look for broad inspirations.
Graham: Going back to previous answer, if we can prove that we are doing sth suboptimally, then psycholinguistics might help.
Audience: Should some of our systems focus on more constrained worlds, e.g. BlocksWorld so that we can focus on certain aspects, e.g. reasoning?
Yejin: Already do that to some extent, e.g. VQA. Often deeper understanding of world and language is required. Integration between language and vision is still very shallow. VQA is not solved at all. Really difficult to solve it by training on VQA data.
Yejin: Possibly better to transfer knowledge from elsewhere. Also relevant for NLP QA tasks, which require understanding of how the world works.
Meg: Isolating particular kinds of instances is useful. Very clear dependent and independent variables is useful to have.
Meg: Gains on constrained tasks are often confused with gains on the general task. That's where messier, more general tasks come in.
Audience: Allow scope for experimentation with simple, formulaic language, not just open-domain language.
Audience: Is there something fundamentally wrong with our current architectures, the hacks we're using?
Yoav: They're suboptimal. They're simple and silly. In 10 years, we'll have different models.
Yejin: Same problem in computer vision. Convnets work very well.
Yejin: Can't parse entire image or understand whole image. Action recognition is still very poor. NLP tasks that work very well such as MT that work very well are perceptual tasks, similar to image recognition in CV. Impressive results primarily for perceptual tasks.
Yejin: We are struggling with tasks that actually require understanding and reasoning.
Audience: Doing really well on current metrics. A while since last winter/bust. People tend to forget that there are booms and busts. Gives rise to unrealistic expectations.
Audience: Are we actually able to deliver. Should better describe what we don't do.
Graham: A lot of papers examine this. Hype is mostly generated by corporate press releases.
Yoav: Not the right crowd to complain to about hype.
Ryan: RNNs are still very recently used models. Might take a few more years.
Audience: Will just building better models, building empirical models without theory enable language understanding?
Ndapa: People also appreciate empirical work.
Meg: Interesting when theory can drive empirical work. More need for theory.
Yejin (being controversial): Academia will survive even in an empirical world. Creative work will always be there and being creative does not require a lot of GPUs.
Graphical models had a lot of theory.
Yejin: Did not deliver as much. Curious why in some cases a lot of theory does not deliver. Inference in such methods is often hard or intractable, requires approximations, and things get lost. In the Bayesian era, only good universities could give the required rigorous education
Yejin: Current era democratizes. Writing insightful papers or blogs is really useful to the community.
Audience: Recent paper criticizes ML scholarship.
Yoav: Empirical work can be good or bad. Not many interesting papers in NLP make use of massive amounts of compute.
Graham: Lots of really good papers that use a lot of compute and lots of good papers that don't use a lot of compute. Less necessary to convince people that NNs can work with a lot of compute these days.
Jason Eisner: Parameters drive out theory. If you want more theory, reduce the number of parameters.
Fin.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sebastian Ruder

Sebastian Ruder Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @seb_ruder

Sep 13, 2018
David Silver on Principles for Reinforcement Learning at the #DLIndaba2018. Important principles that are not only applicable to RL, but to ML research in general. E.g. leaderboard-driven research vs. hypothesis-driven research (see the slides below).
Principle 2. How an algorithm scales is more important than its starting point. Avoid performance ceilings. Deep Learning is successful because it scales so effectively.
Principles are meant to be controversial. I would argue that sample efficiency is at least as important.
Principle 3. Generality (how your algorithm performs on other tasks) is super important. Key is to design a diverse set of challenging tasks.
This. We should evaluate on out of distribution data and new tasks.
Read 10 tweets
Jun 5, 2018
All-star panel at the generalization in deep learning workshop at @NAACLHLT #Deepgen2018
: "We should have more inductive biases. We are clueless about how to add inductive biases so we do dataset augmentation, create pseudo training data to encode those biases. Seems like a strange way to go about doing things."
Yejin Choi: Language specific inductive bias is necessary to push NLG. Inductive bias as architectural choices. Current biases are not good at going beyond the sentence-level but language is about more than a sentence. We require building a world model.
Read 22 tweets
Mar 31, 2018
1/ People (mostly people working with Computer Vision) say that CV is ahead of other ML application domains by at least 6 months - 1 year. I would like to explore why this is, if this is something to be concerned about, and what it might take to catch up.
2/ I can’t speak about other application areas, so I will mostly compare CV vs. NLP. This is just a braindump, so feel free to criticize, correct, and disagree.
3/ First, is that really true? For many specialized applications, where task or domain-specific tools are required, such as core NLP tasks (parsing, POS tagging, NER) comparing to another discipline is not meaningful.
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(