Post

Sebastian Ruder

@seb_ruder

Jul 20, 2018 • 39 tweets • 6 min read • Read on X

#Repl4NLP at #ACL2018 panel discussion:
Q: Given that the amount of data and computing power is rapidly increasing, should we just quit working on models altogether?
Yejin: Sounds like a good idea for the companies. The more data the better. Please create more data.

Meg: Different people have different strengths. People say: “We should all care about ethics”. Geek out about what you love. Apply yourself to what you. Lots of other things that come to bear besides just working with data, e.g. sociology, psychology, maths, etc.

Important to focus on what you really love. Work with people that have complimentary and different interests.
Yoav: Personally don’t work on huge data. If some company would like to train a huge LM on the entire web, that’d be great to have and analyze.

Graham: 20% percent error reduction on task is only 20% interesting. Lots of tasks that we care about that don’t have massive data, such as other languages, e.g. Swahili.
Ryan: Creating datasets and data annotation is not sexy. Community does not value it.

Yoav: Training a language model does not require expensive data annotation.
Audience: Web is much smaller in non-English world. Data is enclosed in social networks, not easily accessible.
Audience: Models that people are developing don’t scale very well to web-scale data.

Yoav: Running a model on multiple GPUs for many weeks costs what annotating several treebanks would cost. Money seems to be available for certain tasks but not for others.

Audience: Should we keep working on models that incorporate distributional semantics given that they can only take us so far or build hybrid models that incorporate both distributional and denotational semantics?

Graham: Hard to answer. Different ACL papers isolate problems, e.g. context necessary for MT, number agreement, etc. Probing models is important. Examining models to determine if going beyond distributional semantics is necessary is important.

Yejin: Distributional semantics leads to good performance because some contextual information is incorporated without human intuition as bottleneck. Important to go beyond what current distributional semantics models can capture using e.g. hierarchical, memory structures.

Audience: Going towards more human-readable representations by e.g. automatically extracting discrete formal representations (such as in Yoav's talk) might be a good direction.

Audience: How do our current efforts match fare in terms of representation stability across time?
Meg: Representations do well if future is similar / has same distribution as the past. Does not generalize well to different future events.

Yoav: Hard to do in end-to-end settings, but possibly easier to manipulate if we have modules, so maybe move away a bit from end-to-end models.
Audience: Should we focus more on stateful or stateless representations?
Yoav: Should focus on both.

Meg: Depends on downstream task (some need attention/information flow, others don't). A little bit of both.
Audience: Representations don't really outperform a bag-of-sentences, can't really model a narrative, underlying structure.

Meg: True if only done word-by-word. Other things to be done e.g. in generative models using latent variables.
Yejin: A lot of focus on sentence-level tasks, which receive a lot of attention. Some challenging tasks that go beyond the sentence-level, e.g. NarrativeQA.

Yejin: Community pays a lot of attention to easier tasks such as SQuAD or baBi, which are close to solved. Have to solve harder tasks and develop more datasets with increasing levels of difficulty. If a dataset is too hard, people don't work on it.

Yejin: The community should not work on datasets for too long (datasets are getting solved very fast these days), so creating more datasets is even more important.
Yoav: SQuAD is the MNIST of NLP.

Ryan: State-of-the-art LMs don't reset hidden states between sentences, so can incorporate sentence dependencies.
Graham: Nice work on incorporating discourse structure and on incorporating coref structures. Don't be afraid to do something slightly more complicated than an LSTM.

Graham: As a reviewer, don't ask authors to evaluate on SQuAD if people propose a new discourse-driven model for QA or reasoning.
Yejin: We found discourse-level structure to be useful, slightly outperform LSTMs. Structure, however, mainly captured cross-sentence entity relations

Yejin: Mainly better because LSTMs don't work very well across sentence boundaries. Will take several years to develop better models. More work on this is needed.

Q: Human brain is only known system for understanding language. Should we be more inspired by human language learning? Humans is contextual; humans likely wouldn't learn language just by listening to the radio, in a dark room, etc.? Can we learn language just by reading?

Ndapa: Background knowledge is useful. Collecting commonsense knowledge, though, is challenging.
Meg: We're not working with neurotransmitters. Definitely things we can learn from humans, but should not try to exactly replicate human process of language learning.

Meg: History of feral children not exposed to language who fail to learn language later on in life. Part of human language learning is having conversations and exchanging observations. One thing to take away is to get into interactions and interacting with humans is important.

Audience: People who are blind and/or deaf can learn language.
Yejin: Groundedness matters a lot, but perhaps more important is ability to abstract, learn about the world, observe it. Children build a model of the world, ask questions about it and refine the model.

Yejin: Children learn from declarative explanations about how the world works. Robotics field will likely not evolve enough to really enable interactions for language learning. Ability of abstraction, e.g. remembering concepts about Wikipedia is important and might be doable.

Ryan: Psycholinguistics research does not help to build better NLP systems, i.e. does not enable practical models. Some papers can still inspire, though.
Yejin: Do not need to confine ourselves to psycholinguistics but can look for broad inspirations.

Graham: Going back to previous answer, if we can prove that we are doing sth suboptimally, then psycholinguistics might help.
Audience: Should some of our systems focus on more constrained worlds, e.g. BlocksWorld so that we can focus on certain aspects, e.g. reasoning?

Yejin: Already do that to some extent, e.g. VQA. Often deeper understanding of world and language is required. Integration between language and vision is still very shallow. VQA is not solved at all. Really difficult to solve it by training on VQA data.

Yejin: Possibly better to transfer knowledge from elsewhere. Also relevant for NLP QA tasks, which require understanding of how the world works.
Meg: Isolating particular kinds of instances is useful. Very clear dependent and independent variables is useful to have.

Meg: Gains on constrained tasks are often confused with gains on the general task. That's where messier, more general tasks come in.
Audience: Allow scope for experimentation with simple, formulaic language, not just open-domain language.

Audience: Is there something fundamentally wrong with our current architectures, the hacks we're using?
Yoav: They're suboptimal. They're simple and silly. In 10 years, we'll have different models.
Yejin: Same problem in computer vision. Convnets work very well.

Yejin: Can't parse entire image or understand whole image. Action recognition is still very poor. NLP tasks that work very well such as MT that work very well are perceptual tasks, similar to image recognition in CV. Impressive results primarily for perceptual tasks.

Yejin: We are struggling with tasks that actually require understanding and reasoning.
Audience: Doing really well on current metrics. A while since last winter/bust. People tend to forget that there are booms and busts. Gives rise to unrealistic expectations.

Audience: Are we actually able to deliver. Should better describe what we don't do.
Graham: A lot of papers examine this. Hype is mostly generated by corporate press releases.
Yoav: Not the right crowd to complain to about hype.

Ryan: RNNs are still very recently used models. Might take a few more years.
Audience: Will just building better models, building empirical models without theory enable language understanding?
Ndapa: People also appreciate empirical work.

Meg: Interesting when theory can drive empirical work. More need for theory.
Yejin (being controversial): Academia will survive even in an empirical world. Creative work will always be there and being creative does not require a lot of GPUs.
Graphical models had a lot of theory.

Yejin: Did not deliver as much. Curious why in some cases a lot of theory does not deliver. Inference in such methods is often hard or intractable, requires approximations, and things get lost. In the Bayesian era, only good universities could give the required rigorous education

Yejin: Current era democratizes. Writing insightful papers or blogs is really useful to the community.
Audience: Recent paper criticizes ML scholarship.
Yoav: Empirical work can be good or bad. Not many interesting papers in NLP make use of massive amounts of compute.

Graham: Lots of really good papers that use a lot of compute and lots of good papers that don't use a lot of compute. Less necessary to convince people that NNs can work with a lot of compute these days.

Jason Eisner: Parameters drive out theory. If you want more theory, reduce the number of parameters.
Fin.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Sebastian Ruder

Try unrolling a thread yourself!

More from @seb_ruder

Sebastian Ruder

Sebastian Ruder

Sebastian Ruder

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!