PINC presentation at the seminar of the Linguistic Engineering Group IPI PAN (Institute of Computer Science, Polish Academy of Sciences)
We presented our project at the seminar of the Linguistic Engineering Group IPI PAN (Institute of Computer Science, Polish Academy of Sciences). The presentation was held in Polish under the title: PINC (Polish Interpreting Corpus): how a corpus can help study the process of simultaneous interpreting. The recording (in Polish) is available here. And here’s … Czytaj dalej PINC presentation at the seminar of the Linguistic Engineering Group IPI PAN (Institute of Computer Science, Polish Academy of Sciences)
Adding linguistic annotation to EMU using the spaCy toolkit
EMU based corpora are often annotated on multiple levels. Each word can contain orthographic and phonetic annotation, but linguistic annotation is rarely found in practice. The addition of linguistic information in the corpus can provide the option to test hypotheses like „what happens with a word that is a particular POS type”, or „what happens … Czytaj dalej Adding linguistic annotation to EMU using the spaCy toolkit
Counting basic statistics and pauses using EMU and R
This post will talk about how to utilize EMU to do some simple statistics on a corpus. For starters note that this is not the most time efficient approach for solving this particular issue, but it’s fairly easy to set-up and uses EMU how it was intended. It is also not the best use-case to … Czytaj dalej Counting basic statistics and pauses using EMU and R
Speaker Identification
Speaker identification is a process of determining the person who spoke a particular piece of recorded speech. In some cases, it may be a single long recording, but in other situations, we can have people exchanging roles frequently within a single recording session – in that case we first segment the speech into portions where … Czytaj dalej Speaker Identification
PINC presentation at UCCTS 2020 in Italy
We will be presenting the results of our first study at the UCCTS 2020 (Using Corpora in Contrastive and Translation Studies) conference in Bertinoro (Italy) on 7-9 September 2020 (if the conference takes place in the current coronavirus reality). Here’s the abstract of our presentation: Cross-linguistic similarities in lexis – examining cognate activation through temporal … Czytaj dalej PINC presentation at UCCTS 2020 in Italy
Automating word segmentation
One of the more important components of the project is the ability to calculate statistics based on the time when each word was spoken. To achieve this, we need to align the transcription to the audio and denote precisely where each word occurs in the signal. We have chosen to complete this process in several … Czytaj dalej Automating word segmentation