Blog

PINC presentation at the seminar of the Linguistic Engineering Group IPI PAN (Institute of Computer Science, Polish Academy of Sciences)

We presented our project at the seminar of the Linguistic Engineering Group IPI PAN (Institute of Computer Science, Polish Academy of Sciences). The presentation was held in Polish under the title: PINC (Polish Interpreting Corpus): how a corpus can help study the process of simultaneous interpreting. The recording (in Polish) is available here. And here’s … Czytaj dalej PINC presentation at the seminar of the Linguistic Engineering Group IPI PAN (Institute of Computer Science, Polish Academy of Sciences)

Adding linguistic annotation to EMU using the spaCy toolkit

EMU based corpora are often annotated on multiple levels. Each word can contain orthographic and phonetic annotation, but linguistic annotation is rarely found in practice. The addition of linguistic information in the corpus can provide the option to test hypotheses like „what happens with a word that is a particular POS type”, or „what happens … Czytaj dalej Adding linguistic annotation to EMU using the spaCy toolkit

Speaker Identification

Speaker identification is a process of determining the person who spoke a particular piece of recorded speech. In some cases, it may be a single long recording, but in other situations, we can have people exchanging roles frequently within a single recording session – in that case we first segment the speech into portions where … Czytaj dalej Speaker Identification

PINC presentation at UCCTS 2020 in Italy

We will be presenting the results of our first study at the UCCTS 2020 (Using Corpora in Contrastive and Translation Studies) conference in Bertinoro (Italy) on 7-9 September 2020 (if the conference takes place in the current coronavirus reality). Here’s the abstract of our presentation: Cross-linguistic similarities in lexis – examining cognate activation through temporal … Czytaj dalej PINC presentation at UCCTS 2020 in Italy

Automating word segmentation

One of the more important components of the project is the ability to calculate statistics based on the time when each word was spoken. To achieve this, we need to align the transcription to the audio and denote precisely where each word occurs in the signal. We have chosen to complete this process in several … Czytaj dalej Automating word segmentation