1. Home
  2. Archives
  3. Vol 20 (2021) Issue 2
  4. Articles

INVESTIGATING LEXICAL BUNDLES IN THE CORPORA OF ENGLISH AND INDONESIAN RESEARCH ARTICLES WITH THE SKETCH ENGINE

Abstract

The low publication rate of Indonesian researchers in reputable international journals, particularly in arts and humanities,is caused, among others, by difficulties they faced in producing precise expository texts in English, which are differentfrom texts in Indonesian. The present study examines lexical bundles in the corpora of English and Indonesian researcharticles (RA) on literature and linguistics to describe the similarities and differences of conventionalized phraseology inthe scientific genre of English and Indonesian by using corpus software, namely Sketch Engine. The study focuses onthe frequency, structural and functional characteristics of lexical bundles using a mixed-method research design. TheEnglish corpus comprises 1,351,048 words derived from 124 RA, while the Indonesian corpus consists of 637,910 wordscollected from 124 RA. We found that three-word lexical bundles are more prevalent than four-word lexical bundles inboth corpora. Based on the structural forms, prepositional-based bundles are the most frequent form in English RA, whilenoun-based bundles are the most common form in Indonesian RA. There were no participant-oriented bundles foundin the Indonesian RA corpus in terms of functional classification, whereas the English RA corpus involved more variedfunctional categories of lexical bundles. The findings provide an understanding of phraseological combinations in Englishand Indonesian scientific writing, characterizing disciplinary discourse as well as native and non-native English speakers’rhetorical style, and have pedagogical implications for EAP practitioners.

Keywords

ABSTRAK

Rendahnya tingkat publikasi para peneliti Indonesia di jurnal internasional bereputasi, terutama dalam bidang seni dan humaniora, mungkin disebabkan oleh gaya retorika yang berbeda dalam artikel ilmiah berbahasa Indonesia dan Inggris. Artikel ini mengkaji simpul leksikal dalam korpus artikel ilmiah berbahasa Inggris dan Indonesia tentang sastra dan linguistik, dengan menggunakan rancangan metode gabungan. Kajian berfokus pada pembahasan frekuensi penggunaan, struktur, dan fungsi sampul leksikal dalam korpus. Korpus bahasa Inggris terdiri atas 1.351.048 kata yang diperoleh dari 124 artikel di jurnal internasional, sedangkan korpus bahasa Indonesia terdiri atas 637.910 kata yang dikumpulkan dari 124 artikel di jurnal nasional. Berdasarkan hasil analisis, kami menemukan bahwa pada kedua korpus simpul leksikal yang terdiri atas tiga kata lebih banyak daripada yang empat kata. Berdasarkan strukturnya, korpus artikel berbahasa Inggris didominasi oleh bentuk simpul berbasis preposisi, sedangkan korpus artikel berbahasa Indonesia memiliki lebih banyak simpul berbasis nomina. Dari fungsinya, simpul yang berorientasi partisipan tidak ditemukan dalam bahasa Indonesia, sedangkan dalam bahasa Inggris simpul leksikal memiliki fungsi yang lebih beragam. Hasil penelitian ini memberikan kontribusi pada pemahaman tentang kombinasi fraseologi dalam penulisan ilmiah berbahasa Inggris dan bahasa Indonesia, yang mencirikan wacana disipliner dan juga gaya retoris penutur jati dan penutur nonjati bahasa Inggris, serta memiliki implikasi pedagogis untuk para praktisi di bidang bahasa Inggris untuk tujuan akademik.

Kata kunci: frekuensi, korpus, simpul leksikal, artikel ilmiah.

INTRODUCTION

Ministry of Research, Technology, and Higher Education of the Republic of Indonesia reported in 2016 that the lowest number of academic publications is from the fields of arts and humanities (0.91%), while the highest is from the fields of science, technology, health, and medicine (15,14%) (Arsyad, Purwo, Sukamto, & Adnan, 2019). The report suggests that researchers in arts and humanities have published the fewest research articles (RA) in reputable international journals compared to researchers in the other fields. One of the possible reasons hindering them from publishing scientific articles is their difficulties in writing accurate and effective expository texts in English that are different from those in Indonesian.

The reason might be a cliché, but it is undeniable that a significant number of nonnative researchers from all over the world are facing the fact that English plays a central role in disseminating academic knowledge. Consequently, they have been struggling with a lack of proficiency in English and unfamiliarity with the standard rhetorical style expected in English journal articles. The difficulties are challenged by non-native scientists who have encouraged scholars to conduct many studies on the elements that create well-written academic prose. Some insightful studies used corpora, large bodies of machine-readable text, to investigate the linguistic forms and discourse structures within particular texts or genres.

Corpus-based language studies have encouraged a paradigm shift in learning English as a foreign language, specifically for adult learners. From the traditional perspective, words are thought of as the basic building blocks of language learning and processing. Therefore, some of the research recommended vocabulary and lexical approach as the ground for learning a foreign language (Wilkins, 1972; Harmer, 1991; & Lewis, 1993). However, recent theories and empirical evidence show that multi-word sequences are the integral building blocks for language. Additionally, the predominance of multi-word sequences in a discourse shows that meaning creation and understanding largely

depend on stocks of the multi-word sequences in language users' lexicon (Sinclair, 1991 and Hong & Hua, 2018). For this reason, studies on multi-word expressions and lexicon in a variety of registers have been flourishing in recent years.

Multi-word sequences have significantly been studied under many rubrics, for example, phraseological sequences, formulaic language, chunks, clusters, multi-word units, recurrent sequences, recurrent word combinations, lexical phrases, formulas, routines, fixed expressions, prefabricated patterns (prefabs), phrasicon, n-grams, and lexical bundles (Biber, Conrad & Cortes, 2004; Hong & Hua, 2018; & Hernandéz, 2013). According to Biber, Conrad & Cortes (2004) and Biber, Johansson, Leech, Conrad, & Finegan (1999), lexical bundles are multi-word units that occur with a high frequency in a register. They specifically define that lexical bundles are "bundles of words that show a statistical tendency to co-occur" (Biber, Johansson, Leech, Conrad, & Finegan, 1999: 989).

Salazar (2014) explains that the main feature of lexical bundles is that they have an empirical basis due to the method of determination which primarily depends on frequency criteria. Therefore, she defines lexical bundles as "frequently occurring lexical sequences automatically extracted from a given corpus using a computer program" (Salazar, 2014: 13). Lexical bundles are regarded as the fundamental part of a discourse that plays a significant role in creating fluency and achieving the natural use of language, either in speech or writing (Kashiha, 2015). As a result, many studies have investigated the relations between lexical bundles and language proficiency.

Millar (in Allen, 2011) argued that the knowledge and use of various lexical bundles could help language learners attain naturalness in language use. On the contrary, the misapplication of lexical bundles is shown to be a potential cause of communication problems. Besides, some studies showed that language learners with a higher frequency of lexical bundles demonstrated higher language proficiency (Novita & Kwary, 2018). The knowledge about the high frequent lexical bundles and the patterns of use in scientific writing of a specific discipline are essential for non-native writers because they are highly expected to produce brief and accurate explanatory texts to communicate their thoughts and research findings to a worldwide scientific audience.

Due to their importance in language learning for academic purposes, there have been many studies on lexical bundles used by the first language (L1) and second language (L2) writers in academic genres. For example, Chen and Baker (2010) conducted research on frequentlyused lexical bundles in L1 and L2 academic writing and argued that the frequency-driven lexical bundles found in native expert writing could greatly assist learner writers in achieving a more native-like style of academic writing. Salazar (2014) compared the use of lexical bundles in a corpus of biomedical RA written by native Spanish-speaking scientists with a corpus of health science RA written by English native speakers. Kashiha (2015) examined lexical bundles in two different corpora of RA conclusion sections of native and Iranian nonnative English. Pan, Reppen, & Biber (2016) studied lexical bundles in the context of the structural and functional types used by L1 English and L1 Chinese professional writing in Telecommunications journals.

Nevertheless, no research to date has compared the lexical bundles of RA from different languages. The current study addresses to fill the gap by investigating the frequency of use, structural and functional characteristics of lexical bundles in English and Indonesian RA. The study aims to compare lexical bundles in the same genre and discipline, which are literature and linguistics, but written in different

languages to reveal fundamental similarities and differences in terms of frequency and patterns of multi-word expressions. In this context, the present study focuses on the formulaic language in published RA of Indonesian and English instead of language proficiency. Hence, this study can demonstrate the norm of language use in scientific writing of Indonesian and English as well as to gain an understanding of the linguistic aspects hindering Indonesian researchers from publishing RA in reputable international journals.

METHOD

Data for this study are two corpora of written texts comprising Indonesian and English RA from literature and linguistics, which are open access articles. The Indonesian RA corpus was built from Indonesian national journals indexed in Science and Technology Index (SINTA) from the Ministry of Research and Technology of the Republic of Indonesia from SINTA 1 to SINTA 3. The corpus consists of 124 published RA in the leading journals of each category.

On the other hand, the English RA corpus was collected from international journals indexed in Scopus with the category of Q1 and Q2, comprising 124 published articles. From the same number of articles we collected, the size of the corpora is different. As shown in Table 1, the English RA corpus is two times bigger than the Indonesian RA corpus. The corpus size suggests that the number of words of articles published by Indonesia's reputable journals is generally smaller than those published by reputed international journals. Indonesia's journal publishers may consider this to achieve a more standard quality of international journals.

TABLE I CORPORA WORD COUNTS

NoCorpusNumber of
Articles
Number of
Tokens
Number of Types
1Indonesian RA124637,91047,938
2English RA1241,351,0459,771

We extracted lexical bundles of the corpus data using corpus software, namely Sketch Engine (Kilgarriff et al., 2014). The software was used to generate the most frequent lexical bundles in both corpora ranging from 3-word bundles to 5-word bundles for frequency analysis. However, we focused on 4-word bundles for structural and functional analyses. The determination is based on the research conducted by Hyland (2008), stating that 4-word and 5-word bundles provide a more precise range of structures and functions than 3-word bundles. In selecting the lexical bundles with a high frequency, we also set a minimum frequency of 20.

The data analyses consist of several steps. First, we compared the pattern of the top 50 most frequent lexical bundles in the corpora of English and Indonesian published RA in terms of frequency. Second, we chose the 4-word bundles to 5-word bundles in the top 50 most frequent lexical bundles and categorized them based on the structure or grammatical types and the function or their meaning in the texts. The structural classification of lexical bundles follows the taxonomy developed by Biber, Johansson, Leech, Conrad, & Finegan (1999), consisting of noun-based, prepositional-based, and verb-based bundles.

On the other hand, the functional classification of lexical bundles refers to the category initially created by Biber (2006) and Biber, Conrad, & Cortes. (2004) and then modified by Hyland (2008 & 2012), which consists of research-oriented, text-oriented, and participant-oriented. The research-oriented bundles "help writers to structure their activities and experiences of the real world (Hyland, 2012: 150), which subcategories are location, procedure, quantification, description, topic. The text-oriented bundles involve "the organization of the text and its meaning as a message or argument" (Hyland, 2012: 150). The subcategories of this function are transition, resultative, structuring, and framing signals. The participant-oriented bundles pay particular attention to the reader or writer of the text, consisting of stance and engagement features (Hyland, 2012: 150). Based on these analysis results, we compared and interpreted the pattern of lexical bundles in the corpora of English and Indonesian published RA.

RESULTS AND DISCUSSION

In the present study, the description of lexical bundles in the corpora greatly depends on frequency criteria. It follows the way Biber, Johansson, Leech, Conrad, & Finegan (1999) investigated lexical bundles, which is exclusively grounded in the frequency. The analysis is based on the idea that frequency provides strong evidence of the characteristic combinations and primary meaning of words in specific contexts (Hunston, 2006). This approach certainly helps us analyze and compare lexical bundles' structure and function in two different languages of the same genre.

TABLE II TOP 50 MOST FREQUENT LEXICAL BUNDLES IN ENGLISH AND
INDONESIAN PUBLISHED RA
NoItemNormalized
Freq.
NoItemNormalized
Freq.
1as well as10,8461dalam bahasa Indonesia18,585
2the use of8,7982dalam penelitian ini17,085
3one of the7,1923oleh karena itu16,694
4in terms of6,8664penelitian ini adalah11,216
5in order to6,8665di bawah ini10,303
6the fact that5,9126dalam hal ini10,108
7in which the4,5627yang ada di9,977
8the end of4,3298yang dilakukan oleh9,325
9on the other3,8409yang digunakan dalam8,086
10a number of3,84010dengan kata lain7,825
11the number of3,74711merupakan salah satu7,630
12in other words3,74712yang berkaitan dengan7,173
13there is a3,67713yang terdapat dalam5,999
14part of the3,60814makian dalam bahasa5,934
15the United States3,58415makian dalam bahasa Indonesia5,869
16of the novel3,58416yang berasal dari5,739
17the same time3,49117dalam penelitian ini adalah5,412
18it is not3,46818oleh sebab itu5,086
19at the same3,44519bahasa Indonesia yang4,956
20the present study3,30520ini menunjukkan bahwa4,826
21in relation to3,30521laki-laki dan perempuan4,695
22at the same time3,30522anak disabilitas tunarungu4,695
23the case of3,18923yang digunakan oleh4,500
24the context of3,11924yang berhubungan dengan4,500
25such as the3,11925makian dengan referensi4,500
26end of the3,04926bahasa Minangkabau Bukittinggi4,500
27of the world3,02627Nyi Roro Kidul4,434
28to be a3,00328klitika pronominal pemarkah4,434
29can not be2,97929yang digunakan untuk4,369
30in the first2,95630dapat disimpulkan bahwa4,304
31the role of2,86331dan berbau harum4,304
32in the context2,74632yang berada di4,108
33use of the2,67733pronomina pemarkah kasus4,108
34the other hand2,65334klitika pronomina pemarkah kasus4,108
35the importance of2,63035dalam bahasa Inggris4,043
36the end of the2,63036adalah salah satu4,043
37on the other hand2,63037yang terkait dengan3,978
38some of the2,56038di samping itu3,978
39of world literature2,56039sebagai bagian dari3,913
40in the context of2,53740digunakan dalam penelitian3,913
41in the case2,53741menjadi salah satu3,847
42the
relationship
2,51442dapat dikatakan bahwa3,717
between
43the waste land2,49043yang digunakan dalam penelitian3,652
44in this study2,44444sebagai salah satu3,652
45a variety of2,44445dapat dilihat pada3,652
46a kind of2,44446yang ada dalam3,521
47that it is2,42147digunakan dalam penelitian ini3,521
48understanding of2,39748dalam bahasa Jawa3,456
the
49in the case of2,37449yang terjadi di3,391
50the form of2,35150ibu rumah tangga3,326

Based on the method described in the previous section, the focus of the analysis is the top 50 most frequent lexical bundles in English and Indonesian corpora of published RA. As shown in Table II, the lexical bundles in high frequency consist of 3-word bundles and 4-word bundles, and the lists are mainly composed of three-word strings. In other words, the 3-word bundles are more productive not only in English but also in the Indonesian RA corpus. However, the English corpus has slightly more 3-word lexical bundles than the Indonesian corpus. It can be seen from the number of 4-word bundles in both of the corpora. The English corpus has only two 4-word bundles, which are on the other hand and in the context of, while the Indonesian corpus has five 4-word bundles, which are makian dalam bahasa Indonesian, dalam penelitian ini adalah, klitika pronomina pemarkah kasus, yang digunakan dalam penelitian, dan digunakan dalam penelitian ini.

As expected, the result of frequency analysis is in line with what was stated by Hyland (2012), who studied lexical bundles in academic discourse. According to him, 3-word bundles are exceedingly prevalent, but they are often less interesting to investigate further. In this context, the most important thing to note is that the pattern of lexical bundles in the corpora of English and Indonesian published RA is similar in terms of the frequency of use.

After comparing the lexical bundles in the English RA corpus with the Indonesian RA corpus from the aspect of frequency, it will also be much more insightful if we investigate them from the structural forms. As stated in the method section, the structural classification is based on the taxonomy developed by Biber, Johansson, Leech, Conrad, & Finegan (1999), who divided the structural forms into three broad structural categories, namely noun-based, prepositional-based, and verb-based bundles. NP-based bundles comprise any nouns with postmodifier fragments, PP-based bundles include any word combinations initiated by preposition followed by noun phrase fragments, and verbbased bundles refer to a string of words with verb components. The structural forms of lexical bundles in the corpus of English language RA are shown below in Table III, while in the corpus of Indonesian language RA is presented in Table 4.

TABLE III THE STRUCTURAL FORMS OF LEXICAL BUNDLES IN ENGLISH PUBLISHED RA

Structural formstypes% of
types
Lexical bundles
Noun-basedNoun phrase with of- phrase fragment1020%1the end of
2the rest of the
3the use of the
4one of the most
5the beginning of the
6a wide range of
7the total number of
8the case of the
9the nature of the
10the context of the
Noun phrase with other post-modifier48%11the extent to which
fragment12the ways in which
13the fact that the
14the way in which
Total1428%
Prepositional1836%15in the context of
basedPrepositional-based with embedded
-of phrase
16in the case of
17at the end of
18in the form of
19on the basis of
20at the end of the
21in terms of the
22in the use of
23as a result of
24on the part of
25in the face of
26at the university of
27over the course of
28at the beginning of
29at the heart of
30of the waste land
31of look to the
32of the singular marker
Prepositional-based
with
other
48%33to the fact that
post-modifier fragment34in a way that
35by the fact that
36in the sense that
Other
prepositional
phrase
918%37at the same time
segments38on the other hand
39on the one hand
40in the United States
41in the present study
42in relation to the
43in the same way
44with respect to the
45with regard to the
Total3162%
Verb-basedVerb phrase with active verb12%46looks to the subject
Be+noun phrase12%47is one of the
Passive verb12%48can be seen in
Verb/adjective+to12%49It is important to
Adverbial clause12%50as well as the
Total510%

The data in Table III reveal that lexical bundles in English RA are primarily in the form of prepositional-based and noun-based bundles. The prepositional-based bundles are sixty-two percent, and the noun-based bundles are twentyeight percent, making a total of ninety percent. The lowest number of structural forms is verbbased bundles, which are only ten percent. The results are similar to the research findings shown by Hyland (2008), who analyzed doctoral dissertations across four disciplines (electrical engineering, business studies, applied linguistics, and microbiology), Jalali, Moini, & Arani (2015), who studied medical research articles, Pan, Reppen, & Biber (2016), who investigated research articles in telecommunications research journals, and other previous research conducted by Dontcheva-Navratilova (2012), Bal (2010) and Liu (2008) who examined a variety of academic registers. They discovered that the most common lexical bundles are prepositionalbased and noun-based. The results of the present study are slightly different from those found by Kwary, Ratri, & Artha (2017) and Qin (2014), who analyzed lexical bundles in journal articles across four disciplines (life sciences, health sciences, physical sciences, and social sciences) and applied linguistics respectively. They found that prepositional-based is the most frequent bundles, but the verb-based is the second frequent one instead of noun-based bundles.

It is also important to note that the prepositional-based bundles are predominantly prepositional phrases with embedded phrase fragments, i.e., 18 out of 31 types, such as in the context of, in the case of, on the basis of, and in terms of the. These structural forms typically relate to the text structure and its meaning, especially to establish arguments by describing limiting conditions. On the other hand, the nounbased bundles are mainly noun phrases with of phrase fragments, i.e., 10 out of 14 types, for example, the end of, the rest of the, the use of, and one of the most. These forms function to help writers organize their activities and experiences of the real world by indicating time/ place, quantity, and procedure.

TABLE IV THE STRUCTURAL FORMS OF LEXICAL BUNDLES IN INDONESIAN PUBLISHED RA

Structural formstypes% of typesLexical bundles
Noun-basedNoun phrase segments48%1klitika pronomina pemarkah kasus
2kongres bahasa Indonesia I
3wayang orang Ngesti Pandowo
4tari Bedhaya Bedhah Madiun
Noun phrase with2958%5ragam tutur yang lebih
post- modifier fragment6satu dengan yang lain
7panas dan berbau harum
8tanah panas dan berbau harum
9tanah panas dan berbau
10tanah hangat dan berbau
11tutur yang lebih kasual
12tanah hangat dan berbau harum
13ragam tutur yang lebih kasual
14anak disabilitas tunarungu usia
15Jamee dan bahasa Minangkabau
Bukittinggi
16Jamee dan bahasa Minangkabau
17bahasa
Jamee
dan
bahasa
Minangkabau
18bahasa Jamee dan bahasa
19makian dalam bahasa Indonesia
20data dalam penelitian ini
21makian dengan referensi binatang
22penggunaan
makian
dalam
bahasa
23penggunaan
makian
dalam
bahasa Indonesia
24nama-nama geng sekolah di
25geng sekolah di Yogyakarta
26nama-nama
geng
sekolah
di
Yogyakarta
27penggunaan ragam tutur yang
28kata tabu yang berhubungan
29ungkapan
yang
mengandung
sikap seksis
30ungkapan
yang
mengandung
sikap
31tabu yang berhubungan dengan
32kata
tabu
yang
berhubungan
dengan
33hangat dan berbau harum
Total3366%
Prepositional
based
Prepositional-phrase
segments
612%34oleh klitika pronomina pemarkah
kasus
35oleh klitika pronomina pemarkah
36dalam bahasa Indonesia yang
37ke dalam bahasa Indonesia
38yakni penggunaan ragam tutur
39dalam penelitian ini adalah
Total612%
Verb-basedPassive verb510%40yang digunakan dalam penelitian
41digunakan dalam penelitian ini
42yang digunakan dalam penelitian
ini
43digunakan dalam penelitian ini
adalah
Active verb510%45menggunakan
makian
dengan
referensi
46yang mengandung sikap seksis
47abstrak penelitian ini bertujuan
48this study aims to
49hasil
penelitian
menunjukkan
bahwa
Total1020%
Other12%50dan
bahasa
Minangkabau
Bukittingi

On the other hand, the Indonesian RA corpus mostly comprises noun-based and verbbased bundles, as shown in Table IV. Sixtysix percent of clusters are noun-based, whereas twenty percent are verb-based, making a total of eighty-six percent. The lowest number of structural types is prepositional-based, which is twelve percent, and other categories, which are two percent. The results suggest that the patterns of lexical bundles in Indonesian RA are different from those found in English RA in terms of structural classification. As discussed before, English research articles have more prepositional-based (62%), while Indonesian research articles have more noun-based (66%). The noun-based bundles are pretty common in English RA, but the distribution differs from the Indonesian RA. The noun-based bundles in English RA (14 types) are less than half of those found in the Indonesian RA (33 types). These results, in general, are also different from the findings shown in the research conducted by Hyland (2008), Qin (2014), Jalali, Moini, & Arani (2015), Pan, Reppen, & Biber (2016), and Kwary, Ratri, & Artha (2017) who found that the most frequent bundles are prepositional-based. Thus, the differences are possibly caused by the difference in terms of language rather than the fields of study.

If we examine further, the noun-based bundles in Indonesian RA are mostly noun phrases with post-modifier fragments, for example, ragam tutur yang lebih, panas dan berbau harum, tanah panas dan berbau harum, dan tutur yang lebih kasual. Writers typically use these to structure their activities and experiences, mainly related research topics. From this function, it can also be seen that the noun-based bundles in Indonesian RA and English RA are different in subcategories of the structural forms as well as the function.

TABLE V FUNCTIONAL CLASSIFICATION OF LEXICAL BUNDLES IN ENGLISH AND INDONESIAN PUBLISHED RA

FunctionEnglishIndonesian
Types% of TypesTypes% of Types
Research-oriented bundles2958%4692%
Location7-
Procedure24
Quantification6-
Description8-
Topic642
Text-oriented bundles1938%48%
Transition signals4-
Resultative signals11
Structuring signals13
Framing signals13-
Participant-oriented bundles24%--
Stance features1-
Engagement features1-

As stated in the method section, the functional classification of lexical bundles in this current study refers to the classification proposed by Hyland (2008 & 2012). The results show that the function of lexical bundles in English and Indonesian RA shares some similarities and differences. The research-oriented bundles are found to be the most frequent category in both English and Indonesian RA, but the distribution differs. In the English RA corpus, the functional type of research-oriented bundles is fifty-eight percent, while in the Indonesian RA corpus, it is ninety-two percent. It suggests this type of function much more dominates the Indonesian RA. The rest, which is eight percent, is textoriented bundles, while participant-oriented bundles are not found. As shown in Table V, the research-oriented bundles were mainly used to impart the research topics. Many of these bundles specified the subject of the research. They were realized by noun phrase structure, such as makian dalam bahasa Indonesia, klitika pronomina pemarkah kasus, Kongres Bahasa Indonesia I, wayang orang Ngesti Pandowo, tari bedhaya bedhah Madiun, ragam tutur yang lebih, geng sekolah di Yogyakarta, and makian dengan referensi binatang.

In contrast, the word combinations functioning as research-oriented bundles in English RA corpus are lower, i.e., 58% and their types are not dominated by topic; they are more varied instead. The bundles are mainly used to describe objects, relation, and degree, for example, in the form of, the ways in which, in relation to the, and the extent to which. Many of them also contribute to the description of location, such as the end of the, at the end of, the beginning of the, at the beginning of the, and at the heart of. Meanwhile, lexical bundles functioning to explain procedure are the least, e.g., in the use of and the use of.

Furthermore, the number of text-oriented bundles in the English RA corpus is relatively high, i.e., 38%. They primarily function to frame arguments by showing limitation, describing connection, and specifying cases, such as in the context of, in the case of, on the basis of, in terms of, the fact that, in the sense that, with respect to the, with regard to the, and the case of the. As can be seen, these bundles are realized mainly by preposition with embedded -of phrase structure. The other kind of bundles in the text-oriented category that is found quite many is transition signals, e.g., as well as the, on the other hand, and in the one hand. These are mainly used to link arguments in a logical order by introducing additional information and contrasting a point of view. The category of resultative signals is also found in the data, e.g., as a result of. According to Hyland (2008), transition words, particularly the resultative markers, for instance, as a result of, is a crucial function in rhetorical presentation of research because they signal the main conclusions from the research and emphasize the inferences the writers want readers to draw from the discussion.

The most notable difference between the English RA corpus and Indonesian RA corpus is in terms of participant-oriented bundles. As mentioned before, none of the participantoriented bundles is found in the Indonesian RA corpus. However, in the English RA corpus, we

found stance features, i.e., it is important to, and engagement features, i.e., can be seen in, in the participant-oriented category. Hyland (2008) stated that stance features relate to the ways writers explicitly intervene into the discourse to communicate epistemic and evaluative judgment, evaluations, and degrees of commitment to what they tell, while engagement features concern the ways the writers address readers as participants in the unfolding discourse. In line with that statement, in the English RA corpus, the bundle it is important is mainly used to convey the writers' evaluation of what they believe to be essential to note and consider. Meanwhile, the use of the bundle can be seen in demonstrates the way the writers want readers to recognize. Thus, the participant-oriented bundles used in the corpus of English RA are a part of the dialogic element of research writing to direct the readers to some understanding, which is not found in the Indonesian RA corpus.

CONCLUSION

The main objective of the current study is to explore the patterns of lexical bundles in the corpora of English and Indonesian RA, built from published scientific articles in the fields of literature and linguistics, by using a corpus tool, namely the Sketch Engine. By analyzing the frequency, structural forms, and functional classification, lexical bundles in English RA and Indonesian RA corpora show some similarities and differences. Based on the top 50 most frequent lexical bundles, the results show that the number of three-word bundles is higher than four-word bundles in both English and Indonesian RA corpora. The results strengthen findings revealed by Hyland (2012) that threeword bundles are the most common bundle found in English academic discourse and proven that this typical lexical bundle occurs not only in English but also in Indonesian academic discourse.

The most notable differences found between English RA and Indonesian RA are in the case of structural forms and the distribution of functional categories of four-word bundles. While the English RA corpus is dominated by prepositional-based bundles (62%), the

Indonesian RA corpus is mostly noun-based bundles (66%). Furthermore, the second most common types of structural forms in both corpora are different, i.e., noun-based bundles in English RA corpus (28%) and verb-based bundles in Indonesian RA corpus (20%). The findings suggest that in terms of structure, there are differences between the way writers write articles in English and Indonesian.

The other differences between English RA and Indonesian RA corpora can be seen from the functional classification. Although the research-oriented bundles are the most common type found in both corpora, the distribution of the type and its subcategories differs. Indonesian RA corpus has a more significant number of research-oriented bundles (92%) than the English RA corpus (58). Besides, the researchoriented bundles in English RA are more varied, including all the subcategories, i.e., location, procedure, quantification, topic, and description. In contrast, in the Indonesian RA corpus, the research-oriented bundles are predominantly topic (92%). Unlike the English RA corpus, the Indonesian RA corpus has no participantoriented bundles. It indicates that the writers in Indonesian RA tend not to show a dialogic aspect with their readers.

The study demonstrates how technology, in this case, the corpus tool Sketch Engine, has greatly facilitated researchers to identify the phraseological pattern in a large sample collection of language use and indicates writers of native and non-native English use different rhetorical styles. However, the findings need to be considered with some caution because we analyzed based on relatively limited kinds and the number of data and have not deeply discussed the data in terms of rhetorical style in the related discipline as well as the discourse style in the related languages. In spite of that, the results have clear pedagogic implications for English for Academic Purposes practitioners, especially those who teach EAP for Indonesian EFL. The findings can be used as the source of learning materials about the phraseological forms in English scientific articles as well as the norm of language in academic English in general.

REFERENCES

  • Allen, D. (2009). Lexical bundles in learner writing: An analysis of formulaic language in the ALESS learner corpus. Komaba Journal of English Education, 1, 105-127.
  • Ang, L. H., & Tan, K. H. (2018). Specificity in English for Academic Purposes (EAP): A Corpus analysis of lexical bundles in academic writing. 3L: Language, Linguistics, Literature®, 24(2).
  • Arsyad, S., Purwo, B. K., Sukamto, K. E., & Adnan, Z. (2019). Factors hindering Indonesian lecturers from publishing articles in reputable international journals. Journal on English as a Foreign Language, 9(1), 42-70.
  • Bal, B. (2010). Analysis of Four-word Lexical Bundles in Published Research Articles Written by Turkish Scholars. Georgia State University.
  • Biber, D. (2006). University language: A Corpus-based Study of Spoken and Written Registers. Amsterdam: John Benjamins.
  • Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied linguistics, 25(3), 371-405.
  • Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999). Longman Grammar of Spoken and Written English. Pearson Education, Ltd.
  • Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (2000). Longman grammar of spoken and written English. Longman.
  • Dontcheva-Navratilova, O. (2012). Lexical bundles in academic texts by non-native speakers. Brno Studies in English, 38-2, 37-58.
  • Harmer, J. (1991). Teaching vocabulary: The practice of English language teaching (2nd ed.). Longman.
  • Hernández, P. S. (2013). Lexical bundles in three oral corpora of university students. Nordic Journal of English Studies, 12(1), 187- 209.
  • Hunston, S. (2006). Corpora in Applied Linguistics. Cambridge University Press.
  • Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for specific purposes, 27(1), 4-21.

  • Hyland, K. (2012). Bundles in academic discourse. Annual review of applied linguistics, 32, 150-169.
  • Jalali, Z. S., Moini, M. R., & Arani, M. A. (2014). Structural and functional analysis of lexical bundles in medical research articles: A corpus-based study. International Journal of Information Science and Management (IJISM), 13(1).
  • Kashiha, H. (2015). Recurrent formulas and moves in writing research article conclusions among native and nonnative writers. 3L: Language, Linguistics, Literature®, 21(1). 47-59.
  • Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., ... & Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography, 1(1), 7-36.
  • Kwary, D. A., Ratri, D., & Artha, A. F. (2017). Lexical bundles in journal articles across academic disciplines. Indonesian Journal of Applied Linguistics, 7(1), 131-140.
  • Lewis, M. (1993). The lexical approach: The state of ELT and a way forward. Language Teaching Publications.
  • Liu, D. (2012). The most frequently-used multiword constructions in academic written English: A multi-corpus study. English for Specific Purposes, 31(1), 25-35.
  • Novita, H., & Kwary, D. A. (2018). Comparing the use of lexical bundles in Indonesian-English translation by student translators and professional translators. Translation & Interpreting, The, 10(1), 53-74.
  • Pan, F., Reppen, R., & Biber, D. (2016). Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in Telecommunications research journals. Journal of English for Academic Purposes, 21, 60-71.
  • Salazar, D. (2014). Lexical bundles in native and non-native scientific writing: Applying a corpus-based study to language teaching (Vol. 65). John Benjamins Publishing Company.
  • Sinclair, J., & Sinclair, L. (1991). Corpus, concordance, collocation. Oxford University Press, USA
  • Wilkins, D. A. (1972). Linguistics in Language Teaching. Arnold.

Research Intelligence

Data from OpenAlex ↗

Metrics

7
Citations
1.71
FWCIfield-weighted
88th
Percentilevs same year + field
Article
Work type
Open Access

Citation Trend

Citation Timeline

YearCitations
20261
20243
20233

Semantic Profile AI-classified research signals

Institution Network