• No results found

TROLLing: Open data for linguists

N/A
N/A
Protected

Academic year: 2022

Share "TROLLing: Open data for linguists"

Copied!
13
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

TROLLing: Open Data for Linguists

Laura A. Janda, UiT

with a lot of help from colleagues at the Department of

Language and Linguistics and the University Library

(2)

How linguistics used to work...

(3)

Linguistics and data

Two things happened >10 years ago

•  Advent of digital corpora –  for many languages

–  100s of millions of words –  balanced, annotated

•  R became widely used

–  open source statistical software

(4)

Linguistics and data

Two things happened >10 years ago

•  Advent of digital corpora –  for many languages

–  100s of millions of words –  balanced, annotated

•  R became widely used

–  open source statistical software

(5)

Linguistics and data

Two things happened >10 years ago

•  Advent of digital corpora –  for many languages

–  100s of millions of words –  balanced, annotated

•  R became widely used

–  open source statistical software

(6)

The view from one journal...

(7)

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

percent  quan+ta+ve  ar+cles  in  Cogni&ve  Linguis&cs  

1990-­‐2012   ?  

(8)

How linguistics works today...

The usual process:

•  Data is extracted from corpus or collected from experiments

•  Laborious cleaning, tagging

•  Statistical analysis

•  Publications BUT:

•  What happens to the data after results are published?

•  Can the researcher find and interpret the data later?

•  What if someone else wants to use the data?

(9)

What today’s linguists need

A PLACE TO PUBLICLY ARCHIVE DATA AND CODE WHY? Because we need to:

•  Create ethical standards for sharing of data and code

•  Set norms for use of statistical methods

•  Learn from each other and help our community grow

•  Secure and maintain scientific integrity

(10)

What today’s linguists need

A PLACE TO PUBLICLY ARCHIVE DATA AND CODE WHY? Because we need to:

•  Create ethical standards for sharing of data and code

•  Set norms for use of statistical methods

•  Learn from each other and help our community grow

•  Secure and maintain scientific integrity

(11)

TROLLing

•  is an international archive of linguistic data and statistical code

•  is built on the Dataverse platform from Harvard University and complies with DataCite, the international standard for storing and citing research data

•  is compliant with CLARIN, the EU research infrastructure for language-based resources

•  assigns a permanent URL to each post

•  uses metadata that ensures visability and retrieval through international services

•  is professionally managed by the University Library of Tromsø and

an international steering committee

(12)

Find TROLLing at opendata.uit.no

(13)

Getting started with TROLLing

http://site.uit.no/trolling/getting-started/

•  Promotional video

•  Instructional videos

•  User guide

•  TROLLing banner

Referanser

RELATERTE DOKUMENTER

However, the aim of this report is not to explain why NATO still is regarded as a relevant military alliance by its members, nor is the aim to explain why Europe still needs to

randUni  t compared to the vulnerable period, and the current version does not support larger random delay. It is necessary to increase this scheduling interval since the

It is well known that a variety of anatomical, physiological and functional differences exist between men and women, including body composition, cardiovascular and

The left panel of Figure 3.4 shows the range estimates for the eastern run inverting the six parameters: water depth, array tilt, sediment density and sediment velocity, in

The system can be implemented as follows: A web-service client runs on the user device, collecting sensor data from the device and input data from the user. The client compiles

The dense gas atmospheric dispersion model SLAB predicts a higher initial chlorine concentration using the instantaneous or short duration pool option, compared to evaporation from

association. Spearman requires linear relationship between the ranks. In addition Spearman is less sensible for outliers, and a more robust alternative. We also excluded “cases

The increasing complexity of peace operations and the growing willingness of international actors to assume extended responsibil- ity for the rule of law in often highly