All-star panel at the generalization in deep learning workshop at @NAACLHLT#Deepgen2018
: "We should have more inductive biases. We are clueless about how to add inductive biases so we do dataset augmentation, create pseudo training data to encode those biases. Seems like a strange way to go about doing things."
Yejin Choi: Language specific inductive bias is necessary to push NLG. Inductive bias as architectural choices. Current biases are not good at going beyond the sentence-level but language is about more than a sentence. We require building a world model.
A promising direction in this line: Memory networks, entity networks.
Chris: Inductive bias is not language-specific; entities need to be tracked e.g. by animals for perception. Early work in recursive neural networks can also be applied to vision.
Yejin: World models applicable to other parts, but for the moment, developing models for language (vs. language + perception, etc.) is most feasible.
Sam: Not aware of any recent work that uses a more linguistic-oriented inductive bias that works in practice on a benchmark task.
Percy: Cycle of new paradigms, which throw out previous extensions and start from scratch (rules --> DL).
Q: Is there an endless cycle of collecting data, finding biases, trying to address these biases?
Sam: Building datasets to make vague intuitive definitions more concrete.
Hard to define tasks precisely enough to break out of cycle.
Yejin: Revisit how dataset is constructed (balance, counteract biases). Come up with algorithms that generate datasets.
Chris: Problem is lack of education. In psych, students spend a lot of time working on exp design.
Standard CS degree does not have a single lecture on how to design an experiment. In grad school, vast % of PhD students have to do experiments w/o having ever been taught on how to do this before. Problems are unsurprising.
Yejin: Not clear whether to maintain natural distribution of data, balance and modify distribution. Important to run baselines and analyze biases of data, modify data distribution if too easy.
Percy: 6 months ago really worried about bias. Not worried about it anymore.
Hard to rule out bias in general. With 100% accuracy, you would get everything correct anyway. Better to improve models than to eradicate bias. ML is used when we don't know what's going on (vs. rules).
Sam: Cannot break out of cycle of experimental design (RQ --> experiments --> results). We can do better but cannot break out of cycle.
Devi: Synthetic, complex datasets are also useful to build models that do certain things (e.g. CLEVR).
Chris: Psychology is too obsessed with controlled experiments vs. more natural data in ML/NLP.
Indigenous NLP tradition is being replaced by ML tradition in last decade. ML people require i.i.d. data. Should not use i.i.d. data, but from a different distribution.
Requirement of i.i.d. data came through ML empiricism. Linguistic data is not i.i.d., any text is associated with other pieces of text, metadata, etc. Better to have models that generalize on data that is not i.i.d. with train data.
Important to return to indigenous NLP tradition and ignore some things that ML has brought into NLP.
Q: Other ways to induce inductive bias besides architecture/data?
Yejin: Evaluation metrics like BLEU/ROUGE are not that meaningful; important to do more human evaluations.
Sam: On SQuAD, performance drop even if students ask factoid questions (vs. MTurk workers in test data).
Chris: Humans are not at constructing auxiliary tasks.
Percy: Summarization model trained with human in the loop on lots of examples would get human performance.
Still pretty far from language understanding.
Q: Will pre-trained models will be used in all NLP tasks in the next years?
Percy: Room for pre-trained representation for some tasks; for most tasks, we will need to go beyond that.
Q: Should people release more challenge datasets?
Percy: Numbers will look really low. Possibly tag examples with phenomenon/difficulty. Problem is that we don't have challenging datasets that are big enough (for training).
Devi: Don't want to lose easy knowledge. E.g. for VQA, not clear what problems subsume each other.
Q: Possible to have declarative representations to build NLU upon?
Yejin: Possible. Shouldn't repeat what people have tried with symbolic logic. Model could encode natural language in knowledge representation.
Percy: Knowledge representations are quite different from current, task-oriented mentality. Worth investigating.
Q: For MultiNLI, small gap btw in-domain and out-of-domain. Do we learn more about way annotators generate training examples than natural language phenomena?
Sam: Brittleness of models is not with regard to overfitting to genre.
Q: How do we evaluate the abstractiveness of NLG systems?
Yejing: Good question. Might want to measure whether summary compresses text well through rewriting vs. substitution.
Sam: Abstractiveness is used as measure in new Newsroom corpus (NAACL-HLT 2018).
Fin.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
David Silver on Principles for Reinforcement Learning at the #DLIndaba2018. Important principles that are not only applicable to RL, but to ML research in general. E.g. leaderboard-driven research vs. hypothesis-driven research (see the slides below).
Principle 2. How an algorithm scales is more important than its starting point. Avoid performance ceilings. Deep Learning is successful because it scales so effectively.
Principles are meant to be controversial. I would argue that sample efficiency is at least as important.
Principle 3. Generality (how your algorithm performs on other tasks) is super important. Key is to design a diverse set of challenging tasks.
This. We should evaluate on out of distribution data and new tasks.
#Repl4NLP at #ACL2018 panel discussion:
Q: Given that the amount of data and computing power is rapidly increasing, should we just quit working on models altogether?
Yejin: Sounds like a good idea for the companies. The more data the better. Please create more data.
Meg: Different people have different strengths. People say: “We should all care about ethics”. Geek out about what you love. Apply yourself to what you. Lots of other things that come to bear besides just working with data, e.g. sociology, psychology, maths, etc.
Important to focus on what you really love. Work with people that have complimentary and different interests.
Yoav: Personally don’t work on huge data. If some company would like to train a huge LM on the entire web, that’d be great to have and analyze.
1/ People (mostly people working with Computer Vision) say that CV is ahead of other ML application domains by at least 6 months - 1 year. I would like to explore why this is, if this is something to be concerned about, and what it might take to catch up.
2/ I can’t speak about other application areas, so I will mostly compare CV vs. NLP. This is just a braindump, so feel free to criticize, correct, and disagree.
3/ First, is that really true? For many specialized applications, where task or domain-specific tools are required, such as core NLP tasks (parsing, POS tagging, NER) comparing to another discipline is not meaningful.