On Learning and Applying Text Representations

Chu, Zewei

doi:10.6082/uchicago.2666

Published August 2020 | Version v1

Dissertation Open

On Learning and Applying Text Representations

Chu, Zewei¹

1. University of Chicago

Contributors

Advisors:

Committee member:

Maire, Michael

Unsupervised learning text representations aims at converting natural languages into vector representations. These vector representations are used in bigger models such as neural networks to improve the performances of supervised tasks. In this line of work, we have Word2Vec, Skip-thought, ELMo, BERT, and other improved BERT models such as RoBERTa and ALBERT. To evaluate the effectiveness of these unsupervised learned text representations, people create suites of natural language processing tasks, including SentEval and GLUE. These tasks aims to evaluate the capabilities of these text representations at improving a variety of NLP tasks, including text classification, semantic relatedness and similarity, question answering, sequence labeling, etc. This thesis discuss our work on both sides. We develop methods to train better language representations and also develop better NLP task suites to evaluate these representations. Most of our pretrained unsupervised models use free text resources available online as training data. We use text and their categories to improve text classification tasks. We use Wikipedia category hierarchies to improve natural language inference tasks. We use Wikipedia document structures to learn sentence representations with discourse information. We also use the hyperlink structures from Wikipedia to learn entity representations. Along with these work we also propose a variety of test suites with standardized tasks to evaluate text representations in these aspects.

Files

Chu_uchicago_0330D_15471.pdf

Files (817.2 kB)

Name	Size	Download all
Chu_uchicago_0330D_15471.pdf md5:e19dd947022c2b71fd05d73125f6767d	817.2 kB	Preview Download

Additional details

Other: oai:uchicago.tind.io:2666

Division(s): Physical Sciences Division
Department(s): Computer Science

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

On Learning and Applying Text Representations

Contributors

Advisors:

Committee member:

Files

Chu_uchicago_0330D_15471.pdf

Files (817.2 kB)

Additional details

Identifiers

UChicago Information

On Learning and Applying Text Representations

Creators

Contributors

Advisors:

Committee member:

Description

Files

Chu_uchicago_0330D_15471.pdf

Files (817.2 kB)

Additional details

Identifiers

UChicago Information