kumulative dissertation copyright

Thesis by publication (cumulative dissertation)

Copyright notices prior to publication.

If your dissertation contains complete published or submitted papers, there are several copyright issues to consider.

1. Checking the legal conditions

Please check the publisher  conditions  regarding the reuse of articles in your dissertation. The provisions of your publishing contract are authoritative. If no specific provisions are expressed in the contract, the following policies apply: Publishing Policies Dissertations . Your publisher is not listed? Please ask the editor/publisher.

Did you receive no answer or you are unsure about the publisher's specifications? Please contact the  Open Access Team . 

2. Phrase embedding

In case of publication reuse a common demand of publishers is that a fixed phrase is included in the publication. Please always place them at the beginning of the corresponding chapter. 

3. Agreement of the co-authors

Please ask your possible co-authors for their  consent  to the self-archiving (e.g. in writing by email). The dissertation office does not require proof.

4. Overview of included publications

Please attache a  summary overview  to your dissertation, listing all publications that have been fully included in your work. The following information should be included:

  • Complete bibliographic data
  • Version information (preprint, accepted manuscript, publisher's version) (see:  glo ssary of publishing policies dissertations )
  • DOI (if available) actively linked, starting with https://doi.org/...

Ways to publish

For cumulative dissertations, you can choose one of the following ways of publication:

  • Online-Publishing of cumulative dissertations on DepositOnce
  • 15 print copies in dissertation print

Implementation rules

Before the scientific discussion, the implementation rules of the individual faculties are binding.

Implementation rules by the faculties

  • Faculty I - Humanities and Educational Sciences
  • Institut of Chemistry  (pdf, 62 kB)
  • Institute of Physics  (pdf, 64 kB)
  • Institut of  Mathematics
  • Faculty III – Process Sciences
  • Faculty IV – Electrical Engineering and Computer Science  (pdf, 108 kB)
  • Faculty V – Mechanical Engineering and Transport Systems
  • Faculty VI – Planning Building Environment
  • Implementation regulations (2014)  (pdf, 12.62 kB)
  • Implementation regulations (2021)  (pdf, 138 kB)
  • Creative Commons licenses
  • legal information on publishing
  • Publishing Policies Dissertations

Dissertation Service of the University Library

[email protected]

Privacy notice: The TU Berlin offers a chat information service. If you enable it, your IP address and chat messages will be transmitted to external EU servers. more information

The chat is currently unavailable.

Please use our alternative contact options.

  • Dissertation Copyright
  • Dissertation Embargo Guidelines
  • Dissertation Templates
  • ETD Administrator
  • Formatting FAQs
  • Sample Dissertation Title Page

Copyrighting your Dissertation

In the United States, you automatically own the copyright in your original creative authorship, such as your dissertation, once it is fixed in a tangible form ( i.e. , written down or recorded). United States law does not require you to include a copyright notice on your dissertation or to  formally register  with the U.S. Copyright Office in order to secure copyright protection over your work. However, there are some benefits to including a copyright notice and registering your work. See the  Copyright Guide  for more information or to schedule a consultation.

Including a Copyright Page in your Dissertation

Including a copyright page in your dissertation is optional but recommended. For details on how to format the copyright page, consult the  PhD Dissertation Formatting Guide  and the  PhD Dissertation Formatting Checklist .

Dissertations Based on Joint Work

  • For dissertations based on joint work with other researchers, a unique and separate dissertation must be presented by each degree candidate. You must include a concise account of your unique contribution to the joint work, and remainder of the dissertation must be authored solely by you. Authorship of an entire dissertation by more than one degree candidate is not allowed.

Using Your Own Previously Published Material in Your Dissertation

University of Pennsylvania  policy  allows you to include your own previously published work or articles submitted for publication as part of the dissertation with the following conditions:

  • You must obtain approval of the dissertation committee and Graduate Group Chairperson.
  • You must obtain written permission from the copyright owner, which may be the journal, publisher, and/or any co-authors, unless you are the sole copyright holder (depends on your publishing agreement).
  • You must upload any permission letters in ETD Administrator as an  Administrative Document  titled “Permission Letter – Do Not Publish.”
  • Your dissertation must be formatted as a single document with consistent formatting and styles throughout. If you are using multiple previously published articles, make sure to make the formatting consistent with the rest of the document.

When using previously published or in press work, you must disclose this information in your dissertation in the following format :

  • Under the Chapter title, list the full citation for the previously published/in-press article in the citation style used in your Bibliography.
  • If it is a jointly authored article, describe your contribution to the work in a separate sentence.

Example of Dissertation Formatting

Using Other Copyrighted Material in Your Dissertation

If you use third party copyrighted material (images, quotations, datasets, figures), you are responsible for re-use of that material (see the  Policy on Unauthorized Copying of Copyrighted Media ). In many cases, you may be able to use copyrighted material under the “ fair use ” provision of U.S. copyright law. Consult the  PhD Dissertation Formatting Guide  and the  PhD Dissertation Formatting Checklist  for information on how to submit written permission from a copyright holder. Typically, you will need to request a permission letter and upload the letter as an  Administrative Document  in  ETD Administrator .

If you still have questions regarding copyright and “fair use” refer to the  Penn Libraries Copyright Guide  or email  [email protected]  for further support.

Patent and Intellectual Property

Any inventions that you make as part of your research for your degree and disclosed as part of your dissertation, and any patent or other intellectual property rights arising therefrom, are governed by the policies of the University of Pennsylvania, including the  Patent and Tangible Research Property Policies and Procedures  and  Policy Relating to Copyrights and Commitment of Effort for Faculty.  For more information, please contact the  Penn Center for Innovation .

There are strict deadlines under U.S. and international law regarding the timing for filing patent applications and the public availability of your dissertation. Contact the  Penn Center for Innovation  to discuss whether there might be a patentable invention disclosed in your dissertation prior to deposit of your dissertation.

Frequently Asked Questions

Do i have copyright over my dissertation .

Yes. According to US Copyright law, you have copyright immediately and automatically over any of your new, original works in a “fixed, tangible form” ( i.e. , written down, recorded, etc.). You do not need to register or to include a copyright symbol © or any other formal marks to secure your copyright, though there are some benefits to doing so. See the  Copyright Guide  for more information or email  [email protected]  for further support.

Should I register the copyright in my dissertation with the U.S. Copyright Office? 

It depends on what you want to do with your dissertation. There are  some benefits to registering the copyright  in your dissertation depending on your future goals. However, keep in mind that you automatically have copyright over your dissertation without formally registering. To learn more about formally registering the copyright in your dissertation, see the  Copyright Guide  or schedule a consultation.  

Should I pay ProQuest to register my copyright?

Note that you already have copyright over your dissertation, but if you would like to  formally register your copyright with the U.S. Copyright Office , you can pay ProQuest to do it for you (you will have the option in ETD Administrator). For less cost, you can register it yourself on the  copyright.gov  web page. Information on registering your copyright is available in the  Copyright Guide . Please keep in mind that if portions of your dissertation are comprised of previously published co-authored material,  you cannot  register your copyright through ProQuest. 

What is a Creative Commons license?

A copyright license grants permission for someone else to use your copyrighted work.  A  Creative Commons  license is one type of copyright license. It works hand in hand with your copyright. It is not an independent type of copyright. By using a Creative Commons license you are telling the world under what circumstances they are able to use your work without asking your permission each and every time.  You can only add a Creative Commons license to your work if you are the copyright holder, and have not transferred your rights to someone else (like a publisher).

You may choose to apply a Creative Commons license to your dissertation by adding it to the copyright notice page; see the  PhD Dissertation Formatting Guide  for an example. V isit the  Creative Commons website  to review all the licenses in full detail and select one that fits your needs. 

Refer to the  Services for Authors Guide  or  schedule a consultation  to learn more about using a Creative Commons license on your dissertation.

I want to use copyrighted materials in my dissertation. Is that okay?

It depends. If the materials you wish to incorporate into your dissertation are copyrighted, you will need to do a  fair use analysis  for each item you use to determine if you can proceed without getting permission. If you do not feel that you can make a good “fair use” case, you will need to  request permission  from the copyright holder and provide all permission letters as  Administrative Documents  in ETD Administrator. Just because you are using the work for educational purposes does not automatically mean that your work is “fair use” or that you have permission to use the work.  Request a consultation  to learn more about fair use and other copyright considerations.

I want to use my own previously published materials in my dissertation. Is that okay?

It depends. If the materials you may wish to incorporate into your dissertation are published in a journal or other publication, you may need to seek permission from the journal, publisher, or any co-authors. These permission letters must be uploaded as supplementary material in ETD Administrator before the deposit date. Please refer to your publication agreement for further information.

Additionally, using previously published materials as part of your dissertation requires approval of the dissertation committee and Graduate Group Chairperson.

I would like to know more about publishing, copyright, open access, and other/related issues. How can I find out more?

The Penn Libraries offers a range of workshops and presentations on these topics (and other digital skills related topics)  throughout the year . Groups can request a number of these workshops for classes or other group settings. For personal discussions about copyright, fair use, Creative Commons, scholarly publishing, and other related topics, please  contact your subject librarian  for support and further referrals. For more general information about these and related topics, review the  Penn Libraries’ guides  by keyword or subject.

/images/cornell/logo35pt_cornell_white.svg" alt="kumulative dissertation copyright"> Cornell University --> Graduate School

Fair use, copyright, patent, and publishing options.

  • Is information that you plan to include from others considered “fair use” and are you acknowledging these sources correctly?
  • Embargo of online copies
  • Creative Commons license
  • Has a patent application been filed (or will one be) on the basis of your thesis or dissertation research?
  • Register for copyright?
  • Supplementary materials
  • Make your work discoverable on search engines?
  • Make your work accessible to people with visual disabilities

1. Is information that you plan to include from others considered “fair use” and are you acknowledging these sources correctly?

You are responsible for acknowledging any facts, ideas, or materials of others that you include in your work. You must follow the guidelines for acknowledging the work of others in the “Code of Academic Integrity and Acknowledging the Work of Others” (published in the Policy Notebook for the Cornell Community ) .

If you use any copyrighted material in the dissertation or thesis, it is your responsibility to give full credit to the author and publisher of work quoted. The acknowledgment should be placed in a footnote at the bottom of the first page of the paper or chapter. Additionally, you must determine whether use of the material can be classified as a “fair use” by performing an analysis of your use of each copyrighted item. The Cornell Copyright Information Center’s Fair Use Checklist ) is a helpful tool for performing this analysis. (See also, Copyright Law and the Doctoral Dissertation: Guidelines to Your Legal Rights and Responsibilities , published by ProQuest, or The Chicago Manual of Style , published by the University of Chicago Press.)

If your use of material is not considered a “fair use,” you must obtain written permission from the copyright owner. Two copies of each permission letter must be submitted with the dissertation or thesis. ProQuest has specific requirements for the content of the permission letter. For these guidelines, consult the ProQuest Doctoral Dissertation Agreement form (published by ProQuest).

If you have already published or had accepted for publication part of your own dissertation or thesis material in a journal, depending on the terms of your publication agreement, it may be necessary to write to that journal and obtain written authorization to use the material in your dissertation.

2. Embargo of online copies

The value of your dissertation extends well beyond your graduation requirements. It’s important that you make an informed decision about providing online access, via ProQuest and eCommons, to your work. This decision can expand the visibility and impact of your work, but it can also shape the options available to you for publishing subsequent works based on your dissertation.

ProQuest’s ProQuest Dissertations and Theses (PQDT) database indexes almost all dissertations published in the U.S. and provides subscription access online to the full text of more recent dissertations. ProQuest also sells print copies of dissertations, paying royalties to authors, when they exceed a minimum threshold. Authors retain copyright in the works they submit to ProQuest.

eCommons is a service of the Cornell University Library that provides long-term, online access to Cornell-related content of enduring value. Electronic theses and dissertations deposited in eCommons, unless subject to embargo, are freely accessible to anyone with an internet connection. When submitting to eCommons, you retain copyright in your work. Ph.D. dissertations and master’s theses submitted to ProQuest are automatically submitted to eCommons, subject to the same embargo you select for ProQuest.

Electronic copies of dissertations in PQDT or eCommons may be made accessible immediately upon submission or after an embargo period of six months, one year, or two years. You may wish to consider an embargo period which helps address publishers’ interests in being the first to publish scholarly books or articles, while also ensuring that scholarship is accessible to the general public within a reasonable period of time. Your decision should be made in consultation with your special committee.

3. Creative Commons license

Creative Commons licenses provide authors with a straightforward and standardized means of prospectively granting certain permissions to potential users of the author’s material. Authors may request proper attribution, permit copying and the creation of derivative works, request that others share derivative works under the same terms, and allow or disallow commercial uses. Authors may even choose to place their works directly into the public domain. You will have the option of selecting a Creative Commons license when you upload your dissertation or thesis to ProQuest, and your choice will automatically be applied to the copy of your work in eCommons.

4. Has a patent application been filed (or will one be) on the basis of your thesis or dissertation research?

Cornell University Policy 1.5 governs inventions and related property rights. Inventions made by faculty, staff, and students must be disclosed to the Center for Technology Licensing at Cornell University (CTL). Theses and dissertations describing patentable research should be withheld from publication, in order to avoid premature public disclosure.

Use the delayed release (embargo) option if a patent application is or will be in process, noting the reason for the delay as “patent pending.” If you have any questions, please contact Cornell’s Center for Technology Licensing at 607-254-4698 or [email protected] .

5. Register for copyright?

Copyright law involves many complex issues that are relevant to you as a graduate student, both in protecting your own work and in referencing the work of others. Discussion of copyright in this publication is not meant to substitute for the legal advice of qualified attorneys. A more detailed discussion of copyright law can be found in the publication from ProQuest entitled Copyright Law and the Doctoral Dissertation: Guidelines to Your Legal Rights and Responsibilities by Kenneth D. Crews.

Copyright protection automatically exists from the time the work is created in fixed form and the copyright immediately becomes the property of the author. Registration with the United States Copyright Office is not required to secure copyright; rather it is a legal formality to place on public record the basic facts of a particular copyright. Although not a condition of copyright protection itself, registering the copyright is ordinarily necessary before any infringement suits can be filed in court.

To register a copyright for your dissertation or thesis, register online or download printable forms . You may also request forms by mail from the Information Section, U.S. Copyright Office, Library of Congress, Washington, D.C. 20559, or contact them by telephone at 202-707-3000.

Doctoral candidates: You may authorize ProQuest to file, on your behalf, an application for copyright registration. This option will be presented to you as part of the submission process.

6. Supplementary materials

If supplementary materials (audio, video, datasets, etc., up to 2GB per file) are part of your thesis or dissertation, you may submit them as supplementary files during the online submission process. For help selecting long-lived file formats, note ProQuest’s guidance in their document, “Preparing Your Manuscript for Submission (Including Supplemental Files).” File formats for which ProQuest does not guarantee migration may still have a high likelihood of preservation in Cornell’s digital repository; please see the eCommons help page for further guidance.

Do not embed media files in the PDF version of your thesis or dissertation, as this can significantly increase the size of the file and make it difficult to download and access. Include a description of each supplementary file in the abstract of your thesis or dissertation. You may include an additional supplementary file containing more detailed information about the supplementary materials as a “readme” file or other form of documentation; this is particularly advisable for data sets or code. The Research Data Management Service Group ( [email protected] ) offers assistance in preparing and documenting data sets for online distribution.

7. Make your work discoverable on search engines?

ProQuest offers authors the option of making their graduate work discoverable through major search engines including Yahoo, Google, Google Scholar, and Google Books. If you chose the Search Engine option on their dissertation “paper” publishing agreement or within ProQuest’s PROQUEST ETD Administrator (electronic submission service), you can expect to have your work appear in the major search engines.

If you change your mind and do not want your work to be made available through search engines, you can contact customer service at [email protected] or 800-521-0600 ext. 77020. In addition, if you did not initially adopt this option but now want your works made available through this service, contact the customer service group to change your selection.

Please note that search engines index content in eCommons, regardless of the choice you make for ProQuest.

8. Make your work accessible to people with visual disabilities

When creating a PDF version of your thesis or dissertation it is important to keep in mind that readers may use assistive technology such as screen readers to access your document.  Follow best practices to ensure that your thesis or dissertation is accessible to everyone.  These resources may be helpful:

  • Cornell CIT’s guidance for creating accessible PDFs
  • Checking accessibility using Acrobat Pro
  • Embedding alternative text for images in Word
  • Save a Word doc as an accessible PDF

Banner

Theses & Dissertations

  • Submitting your Thesis or Dissertation
  • Depositing with ProQuest
  • Understanding Copyright
  • Understanding Embargoes
  • Frequently Asked Questions

Copyright is an important component to publishing your dissertation or thesis. Students should consider copyright as early in their work as possible, especially if you wish to reuse content from another copyright holder, such as images or figures. Here are some details on things that students should consider when reviewing their copyright needs and uses.

For additional information and resources on copyright, please visit the Copyright Guide . 

Determining Copyright Ownership

Under Carnegie Mellon University’s  Intellectual Property Policy , you most likely own the copyright to your dissertation. However, if the research was sponsored by the university or conducted under an agreement between an external sponsor and the university, check the agreement to see who owns the intellectual property. When in doubt, consult Carnegie Mellon’s  Center for Technology Transfer and Enterprise Creation  (CTTEC),  412-268-7393  or  [email protected] .

Neither the University Libraries nor ProQuest/UMI require copyright transfer to publish your dissertation. Both require only the non-exclusive right to reproduce and distribute your work.

Copyright Permissions

According to the  Fair Use Policy of Carnegie Mellon University , all members of the University community must comply with U.S. Copyright Law. When a proposed use of copyrighted material does not fall within the fair use doctrine and is not otherwise permitted by license or exception, written permission from the copyright owner is required to engage in the use.

To avoid publication delays, Carnegie Mellon’s Office of the General Counsel encourages graduate students to get permission from copyright holders as early in the dissertation process as possible. This includes permission to use your own previously published work if you transferred your copyright to the publisher. See  Copyright Issues Related to the Publication of Dissertations  for more information.

If you choose to publish your dissertation with ProQuest/UMI, you must sign an agreement indicating that you have the necessary copyright permissions, and provide UMI with copies of the permission letters. If you choose to publish with Carnegie Mellon University Libraries, you need not provide copies of the permission letters. The assumption is that you have complied with university policy.

Registering Your Copyright

The  Copyright Law of the United States  gives the copyright owner the exclusive right to copy and distribute the work, perform and display it publicly, and create derivative works. Copyright owners do not need to register their work with the U.S. Copyright Office to acquire these rights. However, if you own the copyright to your dissertation and you have a compelling need to acquire additional legal rights, such as the right to file a copyright infringement lawsuit, then you should register your copyright with the U.S. Copyright Office.

You can register your copyright using the U.S. Copyright Office’s  eCO Online System  for a fee of $35. Alternatively, if you choose to publish your work with ProQuest/UMI, UMI will register your copyright for you for a fee of $55. (See page 6 of the  ProQuest Publishing Agreement .)

  • << Previous: Depositing with ProQuest
  • Next: Understanding Embargoes >>
  • Last Updated: May 9, 2024 2:30 PM
  • URL: https://guides.library.cmu.edu/etds

Writing a cumulative dissertation

A cumulative dissertation is a collection of articles which have been published in recognised scientific journals or accepted for publication. My PhD dissertation is a cumulative one and in this blog post I describe its structure and things to pay attention to when writing your own.

Identify articles and contributions

Identify the articles that make up your dissertation and identify the main contributions of these articles. A single article can have multiple contributions and a single contribution can be explored in multiple articles.

For example, consider the following articles with their corresponding contributions:

The first three contributions (ingredients) are used in the fourth and final contribution (recipe).

The structure of your dissertation looks as follows (in chapters):

  • Introduction
  • Chapter about contribution 1
  • Chapter about contribution 2

Chapter 1 provides an introduction to the work that is described in chapters 2 and onwards. Chapters 2 until the second to last align with the articles/contributions. The final chapter concludes the dissertation and looks beyond the work conducted during the PhD.

Considering that we have four articles/contributions in our example, its structure looks like this:

  • How to better harvest bananas
  • Discovery of a brown magic powder
  • What do almonds and butter have in common?
  • 3-Ingredient brownies

Note that the titles of the chapters do not necessarily need to be the same as the title of the articles. They can be altered if it better fits the story.

For example

  • Ingredient: bananas
  • Ingredient: cacao powder
  • Ingredient: almond butter
  • Recipe: 3-Ingredient brownies

Introduction chapter

Or "the story of your PhD".

The introduction chapter is similar to an introduction section in an article. This chapter is the most important one as it describes the problems the dissertation tackles, what the contributions are, and how the contributions are related to each other. This last one is really important as it turns everything in a single, coherent story! Consequently, you will spend the most time on this chapter.

The chapter includes the following parts:

  • General introduction to the problem space.
  • More detailed introduction to a part of the problem space.
  • Challenges that are tackled in the dissertation.
  • Background information needed to understand the research questions and hypotheses.
  • Research questions and hypotheses based on the challenges, together with the corresponding contributions.
  • List of publications.

Try to make parts 1 - 3 as understandable as possible for people who are not part of your academic community, such as family, friends, colleagues from other departments, and so on. I highly recommend to start using an example as early as possible and make that example related to something that everybody understands.

Parts 4 and 5 should still be as understandable as possible. Of course, here it's hard to avoid all technicalities because you have to get to the essence of your dissertation at some point. But at least try.

Some of your articles might have a research question. If that is the case: nice, you can reuse them! But check if they need rephrasing to fit your story. If you have articles without a research question, create one that both fits the story and the work described in the article. Ideally, every research question should also have an hypothesis, which might already be in the original article or that is newly created for the dissertation. Note that not every research question needs an hypothesis. In my case I had a more exploratory article without a research question. I created a research question, but not a hypothesis because that did not make sense. This all depends on the article itself. So check your articles and your story, and see what is possible and what is not.

Part 6 contains two lists: one list with the main articles of the dissertation and a list with all publications you (co-)authored during your PhD.

In general, make sure that you have story (not just a bunch of words and paragraphs) that is clear, not too technical, but still positions and outlines the great contributions of your dissertation.

Aligning this with our example results in

  • General introduction to cooking and desserts.
  • More detailed introduction to creating cake-like desserts with a limited amounts of ingredients.
  • Improve the efficiency of bananas in desserts,
  • Find an alternative way to flavor a cake-like dessert,
  • Find an alternative butter that works in desserts, and
  • Create a cake-like dessert that has at most four ingredients.
  • Background information about desserts, recipes, ingredients, and so on.
  • "How to improve the harvesting of bananas?" with as contribution a new method,
  • "How to flavor desserts through the use of powder?" with as contribution cacao powder,
  • "How to improve the consistency of desserts through a non-conventional ingredient?" with as contribution almond butter, and
  • "How to create a cake-like dessert using bananas, cacao powder, and almond butter?" with as contribution: 3-ingredient brownies.

Contribution chapters

Ideally every contribution is contained in a single article and you can put every article in a separate chapter. That way a single research question and hypothesis is aligned with a single chapter. If it reads better (for the story) to put multiple articles in the same chapter, then that is not a problem. Looking at our example we could have something like

  • Ingredients: bananas, cacao power, and almond butter

This is fine. Important to note here is that this all depends on the story you set out in the introduction chapter. So I suggest to create a decent version of that chapter first before moving on to the contribution chapters.

The structure of a contribution chapter is as follows

  • A copy of the original article

The introduction includes how the chapter fits into the story, where in the story the reader is, and what is discussed. Note that the content of the original article can be adjusted, for example, to

  • Rephrase the research questions and hypotheses.
  • Replace words to make them the same across all articles, because you might use different words for the same concept across different articles. It happens to all of us, especially when there are four years between the first and last article.

Conclusion chapter

The conclusion chapter is similar to a conclusion section in an article. It concludes your story and looks at what can be investigated in the future. More specific, its structure is as follows

  • Impact of contributions
  • Remaining challenges and future directions

The first part reflects on the research questions, hypotheses, and corresponding contributions, and discusses how they tackle the challenges mentioned in the introduction chapter. The second part describes remaining challenges based on aforementioned challenges and contributions.

Remaining challenges for our example are

  • How to create similar brownies with different flavors?
  • The production of almond butter needs to be improved if its used in more and more recipes.

Additionally, the second part also discusses future directions:

  • What we can do next to tackle these new challenges.
  • What was already done by you regarding these challenges. This is not needed, but do mention it if have you done something.
  • What your vision is for the future.
  • Can we change the flavor of cacao powder to change the whole flavor of the brownies?
  • Higher quality almond butter can be produced through the use of dedicated fridge as described in my most recent article.
  • In the future more and more recipes will rely on almond butter, both from an economic and flavor point of view.

These are the most important things that I have learned during the writing of my dissertation. Note that these are mere suggestions that might or might not work for your dissertation, so do not hesitate to deviate from them if you feel the need to. It is your story after all 😉

If you have any questions or remarks, don’t hesitate to contact me via email or via Twitter .

Munich Medical Research School

Links and Functions

  • www.en.lmu.de
  • Medical Faculty

Breadcrumb Navigation

  • Information, FAQs and Downloads

cumulative dissertations

Main navigation.

  • Doctoral Degrees

Determining the rank of a publication

  • monograph with published data
  • Citation guidelines
  • Support in case of conflicts
  • Examination Regulations
  • Plagiarism Check
  • Publications
  • Doctoral projects
  • International Students
  • Funding possibilities & awards

Cumulative dissertations

What’s a cumulative dissertation and what are the rules to consider.

A cumulative dissertation combines different publications in well-respected scientific journals into one doctoral thesis. You do not have to request a specific permission for a cumulative dissertation at the Faculty of Medicine at the LMU. However, you have to fulfill a number of criteria listed below. Most importantly, you have to have at least 2 publications of which at least one is a first-author publication. Both have to be at least accepted for publication. In addition, both papers have to be published in journals, which belong to the top 80% in their field .

Important: The foundation on which a cumulative dissertation is graded are not the papers themselves but the doctoral candidate's contribution to these publications! Thus, it is crucial to give a detailed explanation about the candidate’s contribution to the papers . This is best done in a separate chapter. The candidate should outline in detail, what his or her contribution was as this is the only way to evaluate the dissertation. Such an explanation has to be provided for each paper contributing to the cumulative dissertation, whenever there was more than one author. This means the explanation about the candidate’s contribution is also necessary for papers where the candidate is the first author.

Shared first authorship is also permitted but has to be explained clearly.

Since publications that are to be part of a cumulative dissertation must have been demonstrably produced as part of the doctoral project, the supervisor or at least one of the TAC members must be a co-author of these publications.

All the important information about the cumulative dissertation is summarized in our guidelines .

The information in these guidelines can be found in the corresponding Examination Regulations or was decided upon by the Doctoral Committees.

Prerequisites for a cumulative dissertation

  • You do not need to apply for submitting a cumulative dissertation. However, only original work published in the top 80% of the subject-related journals will be accepted for cumulative dissertations.
  • A minimum of 2 articles must be accepted or published in peer-reviewed, international journals. At least one of these articles must be published as first author. A shared first-author is possible and will be accepted as a regular first-author publication. Please note that as with co-authors, your own contribution must be shown in great detail. Further, you must submit an explanation as to how a shared first-authorship came to be. For dissertations in human medicine and dentistry that were registered after October 1, 2018 , a publication-based dissertation can be submitted even if only a very high-quality publication is available. The prerequisite is that the doctoral candidate is the sole first author of the publication and that it has been published in a journal which, based on its impact factor, is among the best 30% in the respective field .
  • The publications must be original work .
  • For all dissertations in human medicine, dentistry and human biology (aiming for an Dr. med., Dr. med. dent. and Dr. rer. biol. hum.) that have been registered before October 1, 2018 (under the “old study regulations”) publications used in the cumulative dissertation are not allowed to be part of another cumulative (current or completed) dissertation.
  • The following publication forms are not allowed to or can only be used in certain cases for a cumulative dissertation:
  • Short Report – if it corresponds to a publication on original work in form and content, then it can be used. Subject to decision on a by-case basis by the Doctoral Committee.
  • Letter – if it is published in a journal with a double-digit impact factor and the data presented is equivalent to that of original work, then it can be used. Subject to decision on a by-case basis by the Doctoral Committee
  • Methodological Publications . Subject to decision on a by-case basis by the Doctoral Committee.
  • Meta-Analysis – you can submit a maximum of one meta-analysis for your cumulative dissertation. The second publication must be based on original work. Cochrane reviews and comparable systematic reviews are treated as equivalent to a meta-analysis.
  • The following publication forms may not be used: review-articles, case studies.
  • Unpublished manuscripts, review-articles and case studies; as well as short reports, letters and methodological publications, were not found sufficient after case-by-case evaluation by the Doctoral Committee, may nevertheless be included additionally in the dissertation if they are necessary for the understanding of the work, but must be clearly marked as additional contributions and are to be placed as an appendix at the end of the dissertation. These additional contributions can in no case replace the two main articles required under section A2. Additional contributions can only help to provide better scientific context and a more complete picture about the candidate’s scientific work. Please note that the Apendix of the dissertation is not a "collection point" for manuscripts and articles.

Formal composition of a cumulative dissertation

  • The dissertation can be submitted either in German or English. Please note: only dissertations written in English are accepted for the PhD!
  • Cover page (title, name, place of birth, year)
  • Table of contents
  • Abbreviations
  • Publication list
  • Confirmation of co-authors The contribution of all co-authors must be confirmed and submitted separately . This is applies also to additional contributions as outlined by section 6A. The forms for the confirmation of co-authors, will be generated for you in your Campus Portal account. Please use these templates. Please do not include the signed lists into your bound thesis but hand them in together with all other documents when you submit your dissertation.
  • Introduction The publications must be preceded by an introduction (5 – 10 pages, German or English), which describes the research project, as well as showing which higher problem connects the publications and which aspects are highlighted by the individual papers. In case of co-authorship, your own contribution must be described in detail for every original work (section A2) and all additional contributions (section A6). This also holds true for your first-author publications. For publication-based dissertations with only one very high-quality publication (see A2), the introduction must be very detailed and integrate the work into the scientific context (about 10 pages) . It is highly recommended, to explain your own contribution in a separate chapter.
  • Summary The summary must be submitted in both German and English. Please note: for the PhD, points 8 and 9 are combined in an “introductory summary”, which is to be written in English. For publication-based dissertations with only one very high-quality publication (see A2) a detailed summary in your own words must be written (usually 2 pages). This summary must clearly explain the doctoral student's own contribution (ideally in a separate chapter).
  • Publication I (including full details on title, authors, journal, year, issue. The publication must be included in your dissertation which you submit for the evaluation, the link to the publication is not sufficient at this point!)
  • Publication II (including full details on title, authors, journal, year, issue. The publication must be included in your dissertation which you submit for the evaluation, the link to the publication is not sufficient at this point!) (You are, of course, allowed to submit more than 2 publications for a cumulative dissertation, as long as these fulfil the criteria mentioned under (A.).
  • Acknowledgements
  • Curriculum vitae

Style templates for cumulative dissertations

We prepared style templates for cumulative dissertations for you, which are optional for you to use. You find each template under "thesis submission" for your respective title.

  • Doctor of medicine, doctor of dental medicine and doctor of human biology (old examination rules)
  • Doctor of medicine, doctor of dental medicine and doctor of human biology (new examination rules)
  • Doctor of natural sciences
  • Ph.D. in Medical Research
  • Imprint and Disclaimer
  • Privacy Policy
  • Accessibility

Advertisement

Issue Cover

  • Previous Article
  • Next Article

1. INTRODUCTION

2. literature review, 6. discussion, acknowledgments, funding information, competing interests, data availability, identifying constitutive articles of cumulative dissertation theses by bilingual text similarity. evaluation of similarity methods on a new short text task.

ORCID logo

Handling Editor: Vincent Larivière

  • Funder(s):  Bundesministerium für Bildung und Forschung
  • Award Id(s): 01PQ16004 , 01PQ17001
  • Cite Icon Cite
  • Open the PDF for in another window
  • Permissions
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Search Site

Paul Donner; Identifying constitutive articles of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task. Quantitative Science Studies 2021; 2 (3): 1071–1091. doi: https://doi.org/10.1162/qss_a_00152

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Cumulative dissertations are doctoral theses comprised of multiple published articles. For studies of publication activity and citation impact of early career researchers, it is important to identify these articles and link them to their associated theses. Using a new benchmark data set, this paper reports on experiments of measuring the bilingual textual similarity between, on the one hand, titles and keywords of doctoral theses, and, on the other hand, articles’ titles and abstracts. The tested methods are cosine similarity and L 1 distance in the Vector Space Model (VSM) as baselines, the language-indifferent methods Latent Semantic Analysis (LSA) and trigram similarity, and the language-aware methods fastText and Random Indexing (RI). LSA and RI, two supervised methods, were trained on a purposively collected bilingual scientific parallel text corpus. The results show that the VSM baselines and the RI method perform best but that the VSM method is unsuitable for cross-language similarity due to its inherent monolingual bias.

1.1. Background and Motivation

What is the contribution of early career researchers (ECRs) to a country’s research output? This question is currently of high science-political interest in Germany and of similarly high practical difficulty to answer ( Consortium for the National Report on Junior Scholars, 2017 , p. 19). The training of qualified research workers is widely regarded as a core mission of universities and accordingly the performance of universities and departments with respect to the training of ECRs plays a prominent role in research evaluation systems. Yet, despite the acknowledged interest in performance of ECRs in terms of scientific output—publications and their citations—this facet of performance, research output, has so far not become part of university evaluation systems or national scale monitoring instruments. Beyond these science-political considerations, the research contribution and performance of ECRs is intrinsically interesting. Comprehensive performance data would enable longitudinal observation and trend detection, comparisons between ECRs of different disciplines, and perhaps the detection of effects of political interventions or different institutional conditions across legislatures (federal states) and organizations (universities) pertaining to ECR performance.

PhD theses are the primary published research output of a completed PhD degree in Germany because the full thesis needs to be published in some format for a degree to be conferred. The publication format may be a regular book with a scientific publishing house or a digital document deposited at a university repository. The regulations vary and are locally determined at the university or department level. Many doctoral students also publish in periodicals and contribute to conference proceedings and edited book chapters. These articles by one PhD candidate might be collated, supplemented with introduction and conclusion material, and submitted as a cumulative PhD thesis, which still needs to be published as a unit. The other class of theses is monograph theses, which have been designed as one single work from the outset and which are not published in parts otherwise.

The importance of cumulative (i.e., article-based) dissertations in Germany can be seen from a number of recent surveys among PhD students and graduates. A 2014 survey of PhD graduates found that 14.5% completed a cumulative thesis while 84% handed in a monograph thesis ( Brandt, Briedis et al., 2020 ). This survey also inquired about the number of cumulative articles. The mean number was 4.0 and the median 3. An analysis by the German Federal Statistical Office found that in 2015 the share of PhD students working on a cumulative dissertation was 23% while 77% worked on monograph theses ( Hähnel & Schmiedel, 2016 ). For seven science domains, the figures for cumulative dissertations varied between 13% in language and cultural studies and 60% in agricultural and food sciences. A 2018 survey asked PhD students on the planned format of their thesis: 25% planned a cumulative thesis, 57% a monograph thesis, and the bulk of the remainder were undecided ( Adrian, Ambrasat et al., 2020 ). In summary, while the monograph dissertation remains the more common format, the importance of cumulative dissertations is substantial and increasing. Therefore, both theses and constitutive articles need to be taken into account in studies of knowledge production and the citation impact of ECRs.

Our study aims to partially assemble the technical requirements for bibliometric studies of the output of doctoral degree holders, which so far are lacking. In particular, we evaluate several methods of short text similarity calculation on their ability to support the identification of the elemental articles of cumulative PhD theses. This is only one component of a complete PhD candidate article identification system, as will be discussed below, but a centrally important one. It is necessary to identify thesis-related articles in the first place because, in the case of Germany, there is no public register of PhD students or graduates, nor is there a comprehensive source of persistent identifiers of PhD students or graduates and their nonthesis articles 1 . With a central register containing PhD candidate names and their university affiliations, it would be possible to do comprehensive and targeted searches of publication databases. Yet, without due caution the results would probably contain inaccuracies. As different persons can have the same name and a single person can publish with different names, an author name disambiguation system is a general requirement for high-quality author-level data. Not all publication databases have such systems, and some vendors do not report on their matching quality. Even if there were a perfect author name disambiguation system, one would need a reliable automatic system for identifying the correct author record among candidates (identity matching) because of name homonymy (different persons with the same name) 2 . Another reason why information on names and university affiliations is insufficient for finding thesis-associated articles is that far from all doctoral candidates are formally affiliated with the universities at which they obtain their degree (e.g., Gerhardt, Briede, & Mues, 2005 ).

It is possible to bypass the author identity problem by simply considering the author names of a thesis and of candidate-associated articles as only one feature of a larger, jointly used feature set. In other words, instead of matching articles via their disambiguated authors, one can match theses and articles via author names (rather than author identities) and a suite of other features, such as information on affiliation, publication year, and topic. This is the approach pursued in the project of which this study is a part 3 . We anticipate handling the matching of thesis records and candidate article records by supervised classification algorithms. As a recent example of this approach, Heinisch, Koenig, and Otto (2020) , in a thematically closely related study, perform machine learning-based record linkage on bibliographic data on German doctorate recipients with administrative labor market data to trace their career outcomes.

An important feature for matching cumulative theses and their constitutive articles is their topical similarity ( Echeverria, Stuart, & Blanke, 2015 ; Zamudio Igami, Bressiani, & Mugnaini, 2014 ) and we investigate in this paper the optimal computation of topical similarity under the specific conditions of the task at hand. In other words, due to the complexity of the subtask of finding a good topic similarity measure for this specific application, in this contribution we cannot address the entire PhD candidate article identification system, but focus on this important subtask. It is important to mention this context to preclude the misapprehension that the similarity measures studied will be used in isolation to identify articles constituting cumulative PhD theses. The basic premise is that articles on the same or a similar topic are more likely to be proper parts of a given thesis than topically remote or unrelated articles, even by the same authors. In addition, topical similarity can be useful to distinguish between articles by different authors with the same name or name abbreviation in the aforementioned automatic classification stage. The results obtained in this study could therefore also inform future research in author disambiguation. Topical similarity is most conveniently operationalized as textual similarity. While other operationalizations are also appropriate, such as distance in the citation network, the basic data for such approaches is not directly available in the dissertation bibliographic data at hand.

1.2. Contribution of This Study

The best available dissertation data (national library bibliographic records) only contain the titles and, for a subset, content descriptor terms assigned by catalogers. Dissertation titles can be quite succinct—two examples from the data set introduced later are “On operads” and “Fairer Austausch” (“Fair exchange”). This is less of a problem for the candidate associated articles to be matched because their metadata usually also contains an abstract. Therefore this task is an example of the short text similarity problem, which is an area of intense specialized research at the intersection of information retrieval and natural language processing in recent years ( Kenter & De Rijke, 2015 ). The difficulty in calculating similarities for short texts is that two short texts on the same topic are not very likely to use the same terms (the vocabulary mismatch problem). Methods based on exact lexical matching of terms are thus likely to be inaccurate because of the restricted amount of information.

The textual data are domain specific, as they all are formal reports of scientific research. Scientific text usually contains many specialized technical terms and may use some words with specific meanings other than those in common language use. Therefore, methods and language resources designed for domain-general text, such as news, might not be ideal.

Dissertation theses from Germany are typically written in either German or English, other languages being uncommon. For cumulative theses, the incorporated articles need not be in the same language as the thesis title might indicate. It follows that a text similarity method should be able to measure similarity across different languages.

The present paper reports on a study in applied cross-language information retrieval (CLIR) for the purpose of science studies. No new methods are developed, but existing methods are applied to a novel task. The above presented combination of specific factors of the nature of the task means that we cannot simply rely on prior descriptions of the performance of text similarity methods, as these were evaluated on very different problems. Measuring the textual similarity between doctoral theses and their possible constitutive articles on the basis of bibliographic data in a cross-language setting has to our knowledge not been studied before. It was therefore necessary to collect an appropriate ground truth sample to evaluate the studied methods. Furthermore, we also collected domain-specific translated texts to train the tested supervised methods on appropriate data. As the conventional evaluation metrics cannot be applied because of the cases for which no matches should be retrieved (monograph theses—theses without constitutive articles), the choice of appropriate metrics is discussed in some detail.

While this study is concerned with measuring textual similarity between doctoral theses and associated articles, the task of semantic similarity calculation between short representations of scientific texts is of wide applicability in science studies. The calculation of text similarity in bilingual scenarios is of particular importance to those national science systems where English is not the native language and where much research is published in other languages, for which it is crucial to determine links with the international English research literature. It should therefore be noted that even though we only consider the specific scenario of German and English language publications, the methods studied here can be used for any language combination.

The paper proceeds as follows. In the next section we review the related literature. In Section 3 we describe the data sets used in this study. Data preprocessing and the various tested text similarity methods are treated in Section 4 , followed by the presentation of our results ( Section 5 ) and a discussion of these findings ( Section 6 ).

We focus here, first, on prior research in the paradigm of distributional semantics 4 for CLIR. We make this restriction because of the decisive advantage of these methods, which is that they do not require lexical matches to calculate text similarities. This is crucial for the task of short text similarity calculation, where the probability that two compared texts include the same terms is inherently small, quite independent of their true topical similarity. Second, we also review the application of the selected methods for similarity calculation in the field of scientometrics in general (beyond cross-language information retrieval) to provide a more specific context for the use of these methods in the present study.

This line of research was inaugurated with the Latent Semantic Analysis (LSA) model ( Deerwester, Dumais et al., 1990 ) which was extended for cross-language retrieval by Dumais, Letsche et al. (1997) . LSA applies statistical dimension reduction (singular value decomposition, SVD) to the sparse weighted term-document matrix created from a text corpus to obtain a smaller and dense “semantic” vector space representation in which both terms and documents are located. For all input documents and terms, profiles of factor weights over latent extracted factors are obtained that characterize these entities based on observed term co-occurrences in the data. LSA has found significant use in the field of scientometrics. Landauer, Laham, and Derr (2004) illustrate the use of LSA for visualizing large numbers of scientific documents by applying the method to six annual volumes of full texts of papers from PNAS. This study highlights the possibilities of interactive, user-adjustable displays of documents. The study by Natale, Fiore, and Hofherr (2012) studied scientific knowledge production on aquaculture using LSA and other quantitative publication analysis methods. This is an interesting case study because LSA as a topic identification method was triangulated with topic modeling and cocitation analysis on the same corpus of documents. Article titles and keywords were used as inputs for LSA and the similarity values between words from the semantic space were visualized with multidimensional scaling. Important for scientometric applications is that the LSA method is not restricted to textual term data, which was exploited by Mitesser, Heinz et al. (2008) and Mitesser (2008) , who applied SVD to matrices encoding papers and cited references of volumes of journals to measure the topical diversity of research and its temporal development, assuming topical structure to be implicit in the patterns of cited literature.

Random Indexing (RI) is a direct alternative to LSA with lower computational demands, meaning that it can be applied to much larger corpora. Sahlgren and Karlgren (2005) used the RI approach to CLIR for the task of automatic dictionary acquisition from parallel (translated) texts in two languages. Moen and Marsi (2013) experimentally studied the performance of RI in ad hoc monolingual and cross-language retrieval (German queries, English documents) on standard evaluation data sets. Their variant of the RI method only used a translation dictionary but no aligned translated texts. RI compared unfavorably to the standard VSM and dictionary-based query translation in CLIR 5 . To this day, there is little empirical research on CLIR applications of RI and none on short text similarity, to the best of our knowledge. RI has been introduced into the domain of science studies by Wang and Koopman (2017) , describing the application of the method to a benchmarking data set used for testing different approaches to scientific document clustering for automatic data-driven research topic detection. A critical particuliarity of their method is that each document is represented by features of distinct types, namely topic terms and phrases extracted from titles and abstracts, author names, journal ISSNs, keywords, and citations. For each such entity, a 600-dimension real-valued vector representation is learned by random projection from their co-occurrences in the corpus. Next, a vector representation for every document is calculated as the weighted centroid of the vectors of its constituting entities. Clustering algorithms are then applied, one to the set of semantic document vectors directly, and another one to a network of similarity values of each document to its nearest neighbors, in which similarities are calculated as the cosine of the document vectors. Their implementation of RI is further developed in Koopman, Wang, and Englebienne (2019) with improved entity weighting and entity vector projection giving better representations. This study showed the application of the method to a different task of relevance to scientometrics: automatic labeling of documents with terms from a large controlled vocabulary. A version of RI was benchmarked against competing state-of-the-art word embedding methods, trained on the same data, at predicting known withheld Medical Subject Headings for biomedical papers, in which it achieved good results.

Vulić and Moens (2015) introduced a comprehensive CLIR system using word embeddings of several languages in one common vector space, which they call “shared inter-lingual embedding space.” They point out that in the word embedding retrieval paradigm, monolingual search and cross-lingual search can be integrated into one system in which search within a single language only uses that part of the system relating to one language. Cross-lingual and monolingual search in any of the supported languages are combined seamlessly in multilingual word embedding-based systems, obviating the need for query expansion or result list merging inherent in machine translation-based systems. The study also demonstrated that bilingual embeddings viable for cross-language ad hoc retrieval can be obtained from document-aligned parallel corpora and that finer-grained information, such as sentence or word alignments, is not required. The system of Vulić and Moens (2015) relies on bilingual pseudodocuments, documents formed by merging and shuffling terms from aligned documents in two languages, similar to the method of cross-language LSA ( Dumais et al., 1997 ) and showed very competitive results on standard benchmarking data sets for ad hoc CLIR, outperforming prior state of the art models in their test setting (English and Dutch queries and documents). Their results further show that for constructing the document-level representations from the term vectors in the word embeddings CLIR paradigm, a standard term weighting approach outperforms an unweighted additive approach to composition.

To conclude this overview, interested readers are directed to Ruder, Vulić, and Søgaard (2019) for a comprehensive survey of cross-language word embedding models, categorizing them by the required training data according to type of alignment (word, sentence, document) and the comparability (translated or comparable).

3.1. Bilingual Dissertation-Article Pairs

Because the objective of this study is to compare methods of measuring text similarity between cumulative doctoral theses and the articles they are comprised of, we created a manually curated ground truth data set of doctoral theses accepted at German universities and their associated Scopus-covered articles which we use to evaluate the performance of the methods. We chose the bibliographical and bibliometric database Scopus as a source for article-level bibliographic data, as many thesis-related articles are published in the German language and Scopus covers more German-language literature compared to Web of Science. For the period from 1996 to 2017, Scopus contained around 694,000 German-language article records and Web of Science around 500,000 records.

The German National Library (Deutsche Nationalbibliothek, DNB) catalog currently provides the most comprehensive source of data on German dissertation theses. As part of its legal mandate the DNB collects and catalogs all works published in Germany and universities regularly submit dissertation and habilitation theses as deposit copies to the DNB. The DNB collection mandate extends to theses accepted at German universities but published by foreign publishing houses. Nevertheless, there is no reliable information on the completeness of the DNB thesis collection. The DNB catalog clearly identifies all dissertation theses but may contain multiple versions of one thesis, such as a print version, a digital version, and a commercially published version. These versions of one work are not always linked and need to be de-duplicated for analytical purposes. DNB dissertation data has recently been used several times in scientometric research ( Heinisch & Buenstorf, 2018 ; Heinisch et al., 2020 ) 6 . DNB dissertation data are a viable, if challenging, data source for studies on German PhD theses and there is at present no better large-scale source for German PhD thesis data.

Basic bibliographic data for dissertation theses was therefore obtained from DNB catalog records. Records for all PhD dissertations from the German National Library online catalog were obtained in April 2019 using a search restriction in the university publications field of “diss*”, as recommended by the catalog usage instructions, and publication year range 1996 to 2018. Records were downloaded by subject fields in the CSV format option, except for the subject medicine 7 . In this first step 534,925 records were obtained. In a second step, the author name and work title field were cleaned and the university information extracted and normalized and non-German university records excluded. We also excluded records assigned to medicine as a first subject class, which were downloaded because they were assigned to other classes as well. As the data set often contained more than one version of a particular thesis because different formats and editions were cataloged, these were carefully deduplicated. In this process, as far as possible the records containing the most complete data and describing the temporally earliest version were retained as the primary records. Variant records were also retained separately. This reduced the data set to 361,655 records, only a small part of which is used in this study.

After these cleaning operations, the DNB dissertation data set contains the bibliographic information for probably nearly all German nonmedical PhD theses in the period covered. To construct the ground truth data set, the next steps were to identify cumulative theses in the processed DNB data, to extract the bibliographic information of constitutive articles of the cumulative theses, and to link these article-level records with Scopus records.

For the first of the above steps, the identification of a sample of cumulative dissertations, we proceeded as follows. In general, the DNB records do not indicate the type (cumulative or monographical) of dissertations. However, we found and used a small number of DNB records from the above data set containing the phrase “kumulative Dissertation” in the title of the record. These were mostly from a single university. Our second approach was to use the full-text URLs in the DNB data. For those DNB records that contained such a URL we attempted to download the full-text PDF file, if successful, extracted the plain text, and indexed it for searching. While we were able to download many full-texts, the majority of university repository URLs turned out to be outdated and unreachable. We were able to obtain 36,640 thesis full-texts, which were searched for keywords and phrases indicating a cumulative thesis.

As a third approach, we randomly sampled universities and searched their online publication repositories for dissertations containing keywords or phrases indicating cumulative theses. For promising-looking hits, the thesis full-texts were downloaded for examination.

For all theses identified as possible cumulative dissertations through these methods we obtained the published full-texts via university repositories. We manually searched all downloaded full-text PDFs for explicit statements about articles associated with a cumulative thesis. For articles that are described by the thesis authors as being part of the thesis or that appear as chapters, we extracted the corresponding bibliographic data. We thus used three independent and complementary methods to identify theses which might contain information on constitutive articles.

Next, we manually searched for the identified associated articles from the cumulative theses in a snapshot of the Scopus bibliometric database from spring 2019 and assigned the Scopus item identifier to the extracted article records. Only Scopus items with the document types article, review, conference paper, chapter, or book were retained. We also kept track of all examined theses for which no associated articles were indicated in the full-texts. These are also included in the ground truth data as negative cases. This sample is therefore only a convenience sample and not a statistically representative random sample of a population 8 .

The resulting ground truth data set contains 1,181 doctoral thesis records, of which 771 refer to theses with German titles and 410 to theses with English titles. All thesis records are described by bibliographic information from the DNB. Of these records, 449 were identified as cumulative doctoral theses, but 21 of them did not have any Scopus-covered articles. Of the 428 cumulative theses with Scopus-indexed constituent articles, 218 had German titles and 210 had English titles. A total of 732 theses were identified as standalone theses without any incorporated articles. There were 1,499 pairs of theses and Scopus-contained articles out of 1,946 thesis-article pairs in total. The Scopus coverage of this data set’s thesis-associated articles is approximately 77%. The cross-tabulation of thesis language and article language for the subset of the final data set with Scopus-indexed articles is shown in Table 1 . Note that throughout the remainder of the paper we abbreviate German and English in the tables as “de” and “en,” respectively. It can be seen that among German-language theses there is a preponderance of English-language articles and while most articles of cumulative theses with English titles were also written in English, there are also a few German articles among them. However, it should be kept in mind that in the Scopus data there are always English titles and abstracts, while German titles are present additionally for German-language articles. Thus the cross-language problem is possibly at least partly mitigated by the presence of both German and English text for German articles in Scopus.

Dissertation thesis title language and article language for data set of cumulative dissertations

This test data set consists, for the doctoral theses, of author names, thesis title in either German or English, German language keywords assigned by DNB catalogers (only partial coverage), publication year, and university. Article bibliographic data from Scopus are comprised of author names, title in English (always present) and German (sometimes present), English abstract, publication year, and disambiguated German institution (if present). For the text similarity task of this paper, only the thesis title and keywords and the article titles and abstracts were used. Copyright statements in Scopus abstracts were removed. Author names and publication years were used for article candidate preselection, as described next.

At this stage, the validation data set consists of all true positive and a limited number of true negative pairs of thesis records and article records. Yet our envisioned PhD candidate article identification system in principle must be able to identify the right article records among all records in the Scopus database. As it would be too computationally costly to actually compare each thesis record with all article records, we have created an heuristic candidate article pre-filter method for the Scopus data as part of our larger article identification system. Because this procedure is only of minor importance to this study, its description is deferred to Appendix A1 in the supplemental material. This filtering stage reduces the number of candidates per thesis record to about 1,500 Scopus article records on average.

3.2. Training Data

Two of the methods that we experiment with, LSA and RI, require parallel bilingual training data. This means texts in the two languages of the models that are direct translations. As we are working only with texts in the scientific domain and large, manually translated text corpora are not available for this domain for the English-German language pair, we obtained purpose-specific training data. Bilingual scientific texts were collected from the abstracts of journals that provide both German and English abstracts (50,184 abstracts). A second source of bilingual data is dissertation abstracts, which were obtained from universities’ publication servers. We collected 30,275 abstracts of doctoral theses from 10 German universities. Furthermore, we used research project descriptions of projects funded by three funding organizations, the German DFG 9 , the Swiss SNF (obtained from the P3 database 10 ) and the EU ERC (obtained from the CORDIS database 11 ). We used 21,609 DFG, 685 SNF, and 4,997 ERC project descriptions.

We also included the German-English dictionary from the BEOLINGUS translation service of Technical University Chemnitz, version 1.8 12 , as doing so generally improves retrieval performance compared to parallel text alone ( Xu & Weischedel, 2005 ).

After preprocessing, the bilingual document-level corpus, without the dictionary, had a size of 14.4 million German and 15.8 million English tokens in 108,000 parallel documents. German documents had on average 134 terms, English documents 146 terms. The dictionary contains 190,000 translations. The entire corpus, including the dictionary, contained 800,000 different German and 240,000 different English terms and is therefore large enough for training in cross-lingual information retrieval ( Vulić & Moens, 2015 ; Xu & Weischedel, 2005 ).

4.1. Preprocessing

The text data of the bilingual training data and the test data (dissertation titles + keywords and article titles + abstracts) is processed by removing stopwords ( R package stopwords ; Benoit, Muhr, & Watanabe, 2020 ), tokenizing (including lowercasing) and language-specific stemming ( R package tokenizers ; Mullen, Benoit et al., 2018 ), removing numeric tokens ( R package tm ; Feinerer, Hornik, & Meyer, 2008 ), and discarding tokens of one and two characters length 13 . The stemming uses an interface to the libstemmer library implementation of Martin Porter’s stemming algorithm, which continues to exhibit high performance ( Brychcín & Konopík, 2015 ). Stemming helps in overcoming the vocabulary mismatch problem by reducing related terms to one common stem (see, for example, Tomlinson (2009) for German and English monolingual retrieval) while at the same time it also reduces the size of the vocabulary. This is not universally beneficial for all terms as some unrelated terms can be conflated to the same stem. Stemming thus can improve recall while incurring some loss in precision. For the fastText experiment, the terms are not stemmed, as the fastText word embeddings were built with unstemmed text, but otherwise processed as described.

4.2. Text Similarity Models

A large number of text similarity calculation methods and models have been proposed over the years in the literature. We have chosen three baseline and two state-of-the-art methods based on the following considerations. We employ both simple baselines and advanced methods as we are interested in whether basic methods show sufficiently good performance on this novel task or whether more recently proposed methods rooted in the semantic embedding paradigm can better handle the task. With regard to the specific choice of models, the vector space model (VSM) has been the central paradigmatic approach in the field of information retrieval for decades and is routinely used as a baseline to compare novel methods against. The LSA model is an early but well-studied representative of the semantic embedding family of methods, which was proposed for multilingual retrieval applications. The n -gram similarity method was chosen as it is a conceptually distinct approach from the vector space similarity of all the other considered models and it has shown some promise in multilingual retrieval. For state-of-the-art language-aware semantic embedding methods, we have chosen fastText because of its reportedly good results, wide application, and readily available precomputed vector data, while RI was chosen because it can be trained on custom data with little computational cost and thus serves well for studying the impact of using domain-specific training data to construct task-specific models. As our LSA model is also trained on the same data, we also have an opportunity to compare the performance of these two methods given the same training data.

4.2.1. Baseline models

4.2.1.1. vector space model.

We use the basic VSM ( Salton, Wong, & Yang, 1975 ) as a language-agnostic baseline. In the VSM, documents are represented by weighted term vectors. We apply standard term frequency-inverse document frequency weighting (tf-idf) to mitigate the distorting effects of unequal term occurrence frequencies. Vector representations of any two documents can be compared by several different vector distance or similarity operations and it is not a priori clear which one is best for a specific purpose ( Aggarwal, Hinneburg, & Keim, 2001 ). The conventional choice of similarity measure in the VSM is the cosine similarity and Aggarwal et al. (2001) have shown that for L k norm distance functions with different values of k the choice of k = 1 works well. We therefore experiment with cosine similarity and L 1 distance.

4.2.1.2. LSA model

As a second baseline model we construct a joint pseudobilingual vector space using LSA from the same preprocessed English-German parallel corpus introduced in Section 3.2 . LSA consists of the application of statistical dimension reduction of the document-term matrix to lower dimensionality to obtain a latent semantics vector space in which terms commonly occurring together in a context (here documents) have similar vectors and in which documents sharing many terms have similar vectors in the same vector space ( Deerwester et al., 1990 ). This method is one way to address the vocabulary mismatch problem for short texts. For two texts to be highly similar according to LSA, they need not share any terms; they only need to contain terms that frequently appeared in the same documents in the training data, or more indirectly, they need to contain terms that appeared in documents that contained other terms that frequently co-occurred in the training data. LSA can be applied to multilingual problems by creating combined multilingual pseudodocuments from translated texts ( Dumais et al., 1997 ). This method is not intrinsically multilingual as there is no information contained in the resulting model about which language a term is from. Therefore, terms of identical spelling with different meanings in different languages will inevitably be conflated to one vector representation.

For this experiment, the preprocessing consisted of tokenization, stopword removal, lowercasing and language-specific stemming. New pseudodocuments were created by concatenating the German and English texts of each document in the training data. From these processed document representations, a tf-idf matrix was created with the text2vec R package ( Selivanov, Bickel, & Wang, 2020 ). This resulted in an m × n (297,852 × 923,864) sparse document-term matrix M . Truncated Singular Value Decomposition with the R package RSpectra ( Qiu & Mei, 2019 ) was applied to the matrix to obtain the latent space model with t = 1,000 dimensions in which M ≈ U Σ V *, with U an m × t document by latent factors matrix of left singular values, Σ a t × t matrix with the t largest singular values on the diagonal used for weighting the latent factors and all other (off-diagonal) elements being 0, and V * an n × t term by latent factors matrix of right singular values. In this latent space it is possible to locate all input documents and all input terms. By calculating the position of new documents based on the latent space training terms which are also contained in the new documents as new 1,000-dimensional vectors it is possible to obtain the similarity of any two new documents regardless of language by calculating the cosine between their vector representations. The dimensionality of the latent vector space is a parameter that can be chosen by simply using the first d dimensions of the vector space to find the best performing value. We try parameter values between 100 and 1,000 by increments of 100.

4.2.1.3. Character n- grams

Character n -grams are substrings of n consecutive characters length of larger strings. Segmenting texts into sets of n -grams allows calculating subword-level similarities between texts and is therefore another method that can partially overcome vocabulary mismatch. N -gram similarity has shown good results in several cross-language applications, in particular in related languages (e.g., McNamee & Mayfield, 2004 ; Potthast, Barrón-Cedeño et al., 2011 ). Cross-language retrieval with n -grams might be assumed to work better for scientific text than general text because many technical terms are highly similar or identical across languages, such as names for health disorders, chemical substances, or organisms. We use the trigram ( n = 3) implementation of the PostgreSQL version 12 module pg_trgm 14 . The module’s pg_trgm.similarity() function returns a value between 0 (no common trigrams) and 1 (identical strings up to trigram permutation) based on the number of shared trigrams. The score is the ratio of the intersection and the union of the unique trigram sets of the two strings (Jaccard index). Similarities with this function are calculated on the preprocessed dissertation and article text data, where the English and German parts have been concatenated, to keep the input texts into all methods constant. Note that the preprocessing has already eliminated some of the language-specific elements, such as inflections and function words. Applying n -gram similarity on this already stemmed data can show if there is additional benefit to move further away from the original words used in the texts than the stemming alone does by splitting the stemmed tokens into n -gram subtokens.

4.2.2. Language-aware semantic word embedding models

Word embedding models are vector representations of words (or terms) of fixed dimensionality learned from natural language corpora. Unlike the classic VSM, the dimensionality of the vector space in word embedding models is far smaller than the number of different tokens in a corpus and semantically similar words have similar vectors. LSA is one early example of word embedding models. More than one language’s words can be represented in a single word space. Such multilingual models can be constructed either by learning simultaneously from parallel translated texts or by aligning pre-existing monolingual models using some external translation data or other alignment data. As there are quite a number of such models ( Ruder et al., 2019 ), we chose two methods that were straightforward to use or implement, did not require external resources or code dependencies, and were known to scale well. Note that, in contrast to LSA, the following two methods do incorporate information about the language of terms and are thus properly multilingual, having different vector representations for terms of identical spelling in different languages.

4.2.2.1. FastText aligned multilingual models

FastText is a state-of-the-art word embedding method which achieves good results by learning from n -gram subword strings, rather than surface word forms, and representing words in the result vectors as the sum of their constituent n -grams ( Bojanowski, Grave et al., 2017 ). This enables the method to overcome difficulties arising from word morphology and rare words. The fastText method is derived from the word2vec Skip-gram with Negative Sampling method ( Mikolov, Sutskever et al., 2013 ). We used the pre-computed multilingually aligned models released by the authors of ( Joulin, Bojanowski et al., 2018 ) 15 , which are trained on Wikipedia in different languages and aligned after training across languages to map all terms into a common vector space. Note that while Wikipedia is a domain-general knowledge source, it does include vast amounts of scientific knowledge. As the current version of the official fastText programming library is no longer compatible with these vectors, we computed document representations in the database as the average of the fastText word vectors looking up only the exactly matching terms. That means that we cannot benefit from the ability of the fastText library to return results for out-of-vocabulary words. Documents are compared by summing the vectors of their respective terms with tf-idf weights, normalizing the result vectors, and calculating the cosine similarity of these aggregate document representations, following the basic method of Vulić and Moens (2015) .

4.2.2.2. Random Indexing

RI is an incremental word embedding construction method and a direct alternative to LSA ( Sahlgren, 2005 ). Both use dimensionality reduction techniques to reduce the sparse term-document matrix of a training corpus into a smaller and dense real-valued vector space. In contrast to LSA, in RI the whole term-context matrix, which is usually very large and extremely sparse, is never materialized, the dimension reduction is less computationally demanding, and the model is incremental—it can be updated without a complete recomputation when new data is to be added. RI works by first assigning each document a different static index vector , which are vectors of specified dimensionality of values in {−1, 0, 1} drawn from a specific random distribution ( Li, Hastie, & Church, 2006 ). In multilingual RI, there is also a single index vector for each multilingual document. Next, context vectors for each term are created by scanning through all documents. For each term, the index vectors of contexts (documents) in which the term occurs are summed in a single pass through the corpus. In this step, term context vectors for each language are generated separately. This projects both languages’ words into the same random-vector space ( Sahlgren & Karlgren, 2005 ). Reflective Random Indexing (RRI) is the iterative indexing of contexts (respectively terms) with previously obtained index vectors of terms (respectively contexts) instead of random vectors (i.e., higher order indexing; Cohen, Schvaneveldt, & Widdows, 2010 ). This way the model can also learn indirect associations between terms that never co-occur in any documents but which co-occur with terms that co-occur, similar to Second Order Similarity ( Cribbin, 2011 ; Thijs, Schiebel, & Glänzel, 2013 ). Training was done with simple binary occurrence counting of terms in documents—a term was counted as either present or absent, regardless of frequency within a document. To obtain the similarity of two arbitrary documents, the tf-idf weighted context vectors of their constituent terms are added and normalized and then compared with cosine similarity, following Moen and Marsi (2013) and Vulić and Moens (2015) , just as in the other methods in the vector space paradigm. The dimensionality of the vector space is also a parameter in RI. Unlike in LSA, here the entire indexing process must be worked through for each different dimensionality parameter value. We also test values between 100 and 1,000 in increments of 100.

The RI methods were also implemented in PostgreSQL 12. For convenience, all vectors are L 2 -normalized. We use only second-order context vectors from RRI.

4.3. Remarks

An important distinction of the tested methods needs to be pointed out. The VSM and trigram methods are unsupervised methods—they do not require any training data and work only with the input texts that are to be compared. LSA, fastText, and RI on the other hand are supervised methods. They require training on a text corpus. The fastText vectors we use are the result of training on Wikipedia articles in multiple languages. LSA and RI were trained specifically for this study on the bilingual training data described in Section 3.2 , that is, bilingual scientific texts 16 . In particular, we have chosen to train these methods on whole abstracts (or brief project descriptions) for the domain-specific vocabulary and a dictionary for the domain-general vocabulary rather than only on smaller contexts such as sentences or fixed-size word windows. The reason is that we would like to obtain embeddings primarily optimized for document-level topical similarity rather than word-level similarity. In constrast, fastText uses between one and five surrounding words ( Bojanowski et al., 2017 ).

To conclude the presentation of the models, Table 2 illustrates term similarities for three example terms for three supervised models. These impressions confirm that the models can learn enough from the training data to provide related result terms.

Examples of term similarities for three terms in LSA, RI, and RRI models

Note : The term “gdr” is from the abbreviation for German Democratic Republic, “groundwat” is the stemmed form of groundwater, and “stahl” is from German “Stahl” (steel).

For each tested method, 1,728,816 similarity calculations between thesis record representations and prefiltered candidate articles are computed. In not every case can a similarity value be obtained for the supervised methods, namely when either the thesis or the article texts do not contain any of the terms of the training corpus. This happens rarely and the exact figures will be given in the next section. There are between one and 47,654 similarity calculation per thesis, with an average of about 1,500. Very few of these are true positives and many theses have no true positives.

5.1. Precision and Recall

The evaluation of results for the thesis-article matches data set is not straightforward. The reason is that, for many theses, there are no matching articles, so no matches ought to be found. Such a situation is difficult to evaluate with classic precision and recall methodology as it presuppose true positives for every query. However, we still calculated precision and recall figures to understand the outcomes of this approach despite our reservations. For each evaluated method, up to 1,000 quantile values of the distribution of similarity scores between dissertation text and candidate article text were calculated—fewer if different values actually occurred. The similarity scores at these quantiles were used as threshold values. At each different threshold score, precision and recall were calculated by assigning all document with score greater than or equal to the treshold as positive, those below, negative. We can thus obtain a picture of the possible range of the tradeoff between precision and recall, see Figure 1 . Note that here we only report results for LSA, RI, and RRI with dimensionality 1,000 as we found these to be consistently the best values across the tested parameters. Detailed results for the differently parametrized methods can be found in Appendix A2 in the supplemental material.

Recall-precision plot.

Recall-precision plot.

5.2. Correlation

Another evaluation approach is suggested by the observation that the similarity scores for nonmatches (true negatives) should be as low as possible and those for matches (true positives) as high as possible. We construct a new variable by assigning scores of 0 and 1 for true negatives and positives, respectively, and measure the association between this variable and the empirically measured similarity scores of the tested methods with the point-biserial correlation coefficient r pb ( Tate, 1954 ), which is equivalent to the Pearson correlation coefficient when numerical values are assigned to the dichotomous variable. Table 3 shows the averages of the r pb per method weighted by the number of candidates. Note that the absolute values are all very small and they were multiplied by 1,000 for display in the table.

Point-biserial correlation between ground truth and similarity methods, multiplied by 1,000

However, we have reason to doubt that r pb adequately measures the performance we are really interested in. The data set is strongly dominated by true negatives. Due to vocabulary mismatch arising from the very short texts and the bilingual data, the ordinary VSM methods’ similarity values are overwhelmingly often exactly 0 (cosine) or 1 ( L 1 distance). The other more sophisticated models can compute similarities other than 0 or 1 even if there are no common terms and produce values that are not massively concentrated on one end of the possible range of values. This leads to uninformatively high r pb for the VSM methods. We test this by computing the r pb between a constant (here 0) and the dichotomous match variable. The results are in column “always 0” in Table 3 . This method, equivalent to deterministic rejection of any candidate as irrelevant, achieves the best score according to r pb , confirming that the point-biserial correlation coefficient is not a useful evaluation criterion in this particular setting.

5.3. Global Similarity Scores

We have therefore devised the following evaluation method to test how well the scores for the similarity methods can differentiate between constitutive articles of theses and other articles. First, to establish comparability of similarity across methods, the scores for each method are z -transformed to obtain scores with mean 0 and standard deviation 1. Score values are expressed as differences from the overall mean in terms of standard deviations. Second, for theses in the sample for which true positives (associated articles) exist, we compute the average of the similarity scores of true positive cases by thesis. Third, for theses without associated articles, we simply select the highest standardized similarity score value. Fourth, we calculate the averages of the scores for the two groups: theses with articles and theses without articles. A good similarity method should have the average similarity value for theses with articles appreciably greater than the average for theses without articles. The results are presented in Table 4 , which show that the methods LSA, fastText, and trigram cannot achieve standardized scores for true positives greater than those for most similar articles of theses without associated articles. The two VSM variants and the RI methods exhibit much better performance. In particular, the L 1 distance VSM variant shows more than 9 SD differentiation on average, while the difference of the VSM cosine method is about 1 SD, and those of RI and RRI are 0.3 and 0.1 SD, respectively. Note that the value for L 1 distance VSM for the average standardized similarity is negative because the distance values for similar items are smaller than average values, unlike for the other methods, where the similarities are greater for more similar items.

Comparison of standardized similarity scores ( z -scores)

5.4. Local Similarity Ranks

Another issue is that the density of neighbors in similarity vector space is probably not uniform. If there are more and less dense regions, then the global similarity scores are less informative than the local scores, that is, the scores of similarities of candidates for a single thesis. Consequently, it seems more prudent to look at similarity ranks, stratified by thesis, rather than global similarity values. However, as there are no true positives for theses without constitutive articles, we cannot cover these cases by using this approach.

We proceed with the analysis of recall scores at different rank positions across the methods. Figure 2 shows the curves of the recall values for each considered similarity model for ranks 1 through 20; higher ranks are not interesting as any thesis only has a few integrated articles, if any. Again, these values only include observations of theses that do contain published material, not those that do not. RI shows the best performance here, followed by the baseline VSM methods. RI can achieve 0.8 recall at rank 20 on average, out of some 1,500 candidates per thesis.

Recall across similarity rank positions.

Recall across similarity rank positions.

To assess if the methods are biased in cross-language similarity measurement we split the validation data and compute recall by rank separately for item pairs with the same language and for pairs with different languages. Figure 3 displays the results. We find that the VSM methods, LSA, and fastText perform much worse in recall if items are of different languages. Trigram is somewhat less affected, RI is modestly affected, and RRI only slightly affected, exhibiting the least cross-language bias.

Recall across similarity rank positions by language concordance.

Recall across similarity rank positions by language concordance.

Finally, Table 5 shows the number of missing similarity values per method. There were three cases in which no method could calculate a score, as the processing of the Scopus texts left no terms to represent the documents. The supervised methods had additional missing values in cases when no terms in the processed text, either thesis or article, were present in the training data. However, the number of missing values is very small compared to the overall number of similarity calculations for all methods.

Missing similarity values

In summary, these results indicate that, somewhat unexpectedly, the baseline VSM similarity methods perform quite adequately, particularly when considering global similarity values. However, the best performing method when evaluating recall at low ranks is RI, whereas RRI performs a little worse. The pseudo-multilingual baseline method LSA shows only moderate performance, but clearly works to some extent, as can be seen from its results far exceeding random scores. FastText and trigram exhibit intermediate performance.

Before we proceed to the discussion of the results, a few limitations of this study need to be acknowledged. Because of the large number of choices that can be made in any information retrieval study, it is practically impossible to comprehensively cover all reasonable combinations of methods, settings, parameters, pre­processing steps, and so on. There are many different suggested methods for similarity calculation on vectors, term weighting, vector training, vocabulary pruning, and stop word removal. Parameters for supervised methods such as word context window size or parallel text alignment level could be varied. We have not tested more sophisticated methods for the composition of document-level representations from terms or of preprocessing steps such as decompounding, lemmatization, or word sense disambiguation. All of these factors could influence the results. To keep the scope of the study within feasible limits, we have chosen to apply only basic preprocessing and standard weighting and similarity methods. In the choice of evaluated methods, we have decided in favor of one representative state-of-the-art multilingual word vector method (fastText) and the conceptually attractive but little investigated trigram and RI methods.

ECRs, in particular doctoral students, publish many research outputs. Reliable quantitative estimates of their contribution to the total output of a country have hitherto been elusive, as has the assessment of the scientific impact of their research. Because all graduated PhDs have published a doctoral thesis, we have taken PhD thesis data as the starting point of our approach to quantify doctoral students’ research contributions. Cumulative doctoral theses consist of already published material; therefore it is crucial to identify their associated articles to quantify the citation impact of the doctoral research project as a whole. Moreover, the share of identified associated articles among all of a country’s articles can serve as a lower bound of the scientific contribution of doctoral students in terms of published output. Our prospective system for the identification of PhD thesis articles consists of a candidate article prefiltering stage and a subsequent automatic classification of candidate article records into those that are constitutive articles and those that are not. This second stage is anticipated to be accomplished by supervised machine learning algorithms trained and evaluated on sample data. For this matching of candidate associated articles to doctoral thesis records, not only the author names, authors’ institutional affiliations, and publication dates of candidate matches are important criteria but also the topical similarity of the research outputs. A good measurement of topic similarity can prove crucial in overcoming uncertainties in matching due to name ambiguities.

The text similarity calculation in this setting is demanding because of the brevity of the texts, the use of multiple languages, and the specialized scientific vocabulary. This rules out the unvalidated use of off-the-shelf solutions. No prior work in this setting has come to our attention, so this is a novel task. Following up on the call by Glavaš, Litschko et al. (2019) , the present study is also an instance of a “downstream evaluation” of cross-language distributional semantics models. To this end we have tested three baseline and two state-of-the-art short text similarity methods on a custom validation data set. We collected the necessary training and evaluation data sets and tested the five methods’ performance using evaluation measures adapted for the particularities of the data. While this study used German and English language text data, the findings can be informative for any other combination of two or perhaps more languages. Texts were preprocessed for all methods (except fastText) with language-specific stemming and in all similarity calculations (except trigram), tf-idf weights for terms were used.

Our results show that the long-established vector space model of text similarity measurement exhibits quite good performance for this task, likely benefiting from the fact that for one of the texts to be compared (Scopus article records) there will always be some English text and from the specialized scientific terminology. Once we look at the ranking results on the level of matching to individual theses, the limitations of the VSM become apparent as the RI method performs clearly better. The multilingual application of RI has so far only received limited attention ( Fernández, Esuli, & Sebastiani, 2016 ; Moen & Marsi, 2013 ; Sahlgren & Karlgren, 2005 ) but the present results are very encouraging. The findings also indicate that the trigram and fastText methods perform moderately well, while LSA is not competitive for this particular task. All methods suffer from some bias when the languages of the compared items differ, but to very different degrees, with RRI being almost unaffected. In conclusion, for the anticipated task of using text similarity as one of a set of features for identifying cumulative theses’ associated articles, a combination of VSM cosine similarity score and RI rank can be recommended, with the proviso that the VSM method is by its nature biased in favor of same-language texts. In addition, we can recommend the use of document records of cumulative doctoral theses and their constitutive articles as benchmark data sets for cross-language short text similarity tasks.

The author would like to thank the Information Management group of Deutsche Forschungsgemeinschaft for providing the bilingual project description data of funded DFG projects and Beatrice Schulz for her help in data collection. This research has made use of the PostgreSQL database system and the contributed extensions aggs_for_vecs and floatvec .

Funding was provided by the German Federal Ministry of Education and Research (grant numbers 01PQ16004 and 01PQ17001).

The author has no competing interests.

Data is made available at https://doi.org/10.5281/zenodo.4733850 and https://doi.org/10.5281/zenodo.4467633 except for proprietary data from Elsevier Scopus. Programming code is made available at https://gitlab.com/pdonner/ri_sql .

Germany does not maintain a central register of active PhD students or graduates ( Fräßdorf & Fräßdorf, 2016 ) and universities have only been required to systematically and comprehensively collect data on current doctoral students since 2017 ( Brauer, Oelsner, & Boelter, 2019 ). These new data are decentralized, not public, and only cover the period since 2017.

To give an extreme example, between 1996 and 2016, according to data collected from the German National Library, which was deduplicated and excludes medical theses, there are 48 persons named Thomas Müller who have authored a doctoral thesis. Of these, two different ones graduated from Heidelberg University in 1999.

Another approach would be to start with the available full-text electronic documents, apply automatic reference identification procedures to extract the cited sources and use these to identify associated articles by the thesis authors. This seems a promising alternative, albeit with the one drawback that some articles that have not been published at the time of the handing in of the dissertation are typically only cited in a provisional way. The larger problem is external to the data itself. As it stands, by no means all dissertations are published as publicly accessible electronic full-text versions. A reference extraction and matching approach would hence either be limited in coverage or need to collect and prepare theses published by publishing houses in book format or deposit copies from libraries. The effort required for this alternative approach was prohibitive in our project, so we decided to work with bibliographic data only.

Or word embeddings, or wordspace, or continuous word vectors, etc. The terminology has not yet stabilized.

Contrary to the standard RI method ( Sahlgren, 2005 ), the authors started out with assigning fixed index vectors to terms, using the same vector for the terms in both languages, rather than starting with index vectors for documents and constructing term index vectors from the document vectors.

For a very limited subset of recent cumulative theses, the DNB data contains information on included articles if these previously published works are completely incorporated in unchanged form. If the full-texts of the theses and the candidate articles are available, the true positive articles should be almost strict subsegments of the thesis they are part of. This suggests that pairs of cumulative dissertation and constituent articles could be ideal true positive gold standards for plagiarism detection methods.

German “Dr. med.” degree dissertations are considered incommensurable to other doctoral degree theses ( Senatskommission für Klinische Forschung, Deutsche Forschungsgemeinschaft, 2010 ; Wissenschaftsrat, 2014 ).

The data set is available at Donner (2021b) .

https://gepris.dfg.de/

https://p3.snf.ch/

https://cordis.europa.eu/

https://dict.tu-chemnitz.de/ , https://ftp.tu-chemnitz.de/pub/Local/urz/ding/de-en/

We also experimented with more sophisticated natural language processing by part-of-speech tagging, lemmatization, and extraction of noun phrases. This proved to be too computationally expensive for application to the entire corpus. The question of whether such higher-quality preprocessing can significantly improve results remains an open issue for further research.

https://www.postgresql.org/docs/12/pgtrgm.html

https://fasttext.cc/docs/en/aligned-vectors.html

We make our trained LSA, RI, and RRI models of dimensionality 1000 available in Donner (2021a) .

Author notes

Email alerts, affiliations.

  • Online ISSN 2641-3337

A product of The MIT Press

Mit press direct.

  • About MIT Press Direct

Information

  • Accessibility
  • For Authors
  • For Customers
  • For Librarians
  • Direct to Open
  • Open Access
  • Media Inquiries
  • Rights and Permissions
  • For Advertisers
  • About the MIT Press
  • The MIT Press Reader
  • MIT Press Blog
  • Seasonal Catalogs
  • MIT Press Home
  • Give to the MIT Press
  • Direct Service Desk
  • Terms of Use
  • Privacy Statement
  • Crossref Member
  • COUNTER Member  
  • The MIT Press colophon is registered in the U.S. Patent and Trademark Office

This Feature Is Available To Subscribers Only

Sign In or Create an Account

Elephant in the Lab

Mennathulla Hendawy

How to structure a cumulative dissertation: Five strategies

1 June 2021 | doi:10.5281/zenodo.4786446 | No Comments

How to structure a cumulative dissertation: Five strategies

In this article, Mennatullah Hendawy shares some insights on structuring cumulative dissertations based on her own experience

“The whole is other than the sum of its parts” ~ Aristotle~

In general, there are two styles of doctoral dissertations: monographs (thesis as a book), and cumulative thesis (thesis by publications/papers). In this article, I will share some insights with regard to cumulative dissertations based on my own experience. A cumulative dissertation consists of a series of papers published, or submitted for publication during the timeframe of the doctorate study. In addition to the papers, the PhD student is required to create an overarching argument that is to be presented in the thesis’s introductory and conclusion chapters. The number of papers to be published and/or submitted is determined by each university. I noticed that, usually, it ranges between 3-6 papers.

kumulative dissertation copyright

Before I start, let me briefly introduce myself: I am a PhD candidate at the Chair of Urban Design, TU Berlin with Prof. Jörg Stollmann. I started my PhD in May 2017 and recently submitted my cumulative thesis of 5 papers in January 2021. While I am waiting to defend my thesis I am writing this article in an attempt to provide some insights into one of the common challenges of conducting a cumulative dissertation: How to structure the series of papers so that they make sense. The papers have to be connected with a thread (sometimes referred to the thesis’s golden thread, see here and here ) and this thread can be presented as part of the overarching argument of the papers together. While each paper has one or more research questions, all the papers together respond to one central question. I would like to share with you some ideas on how the papers can be combined, creating something bigger than the individual parts. One might assume that it is usually the goal of the first year of a PhD to decide on the structure and scope of the papers which will be the guiding principles for the next phases of the dissertation. Well, in reality this is not always the case. 

Of course the logic of structuring a cumulative thesis depends heavily on the research area and interest. Accordingly, while I share five approaches on structuring cumulative dissertations, I will try to clarify what each approach is suitable for. Each topic can be addressed from different angles, based on the research question, objective, and preferences of the author. 

How my own research interest changed

To proceed from here, let’s take a simple derivative of my thesis topic as the basis for experimenting with the different strategies explained in the following. I wrote my thesis about “The Digitalization of Urban planning’. Over the years, my overarching research question became: In the mediatized world, how and why do planning visualizations become a question of social and spatial justice? I started the dissertation with a clear interest in exploring the entanglement of urban studies and media studies in relation to issues of justice in cities, but the final overarching research question only became clear towards the end of the thesis. This is because I was following an explorative and grounded research approach. Looking back now, I must admit that the earlier the overarching research question is clear the easier the research process. Nonetheless, it is also important to stay flexible throughout the thesis process and let it shape the overarching thread. A middle ground would be a good option!

The strategies

By the time I realized all this, I used more than one strategy to combine and look at the papers. In the following section, some of these strategies are presented in addition to other ideas. This list is surely not exclusive.

Structuring the papers in a cumulative dissertation by field would make each paper concentrate on one field or context of the topic, where it is   practiced. Speaking about my research, the different fields could be the digitalization of planning in planning education, the digitalization of planning in planning practice, politics, culture, context, theory, or research. Following this strategy, each paper would cover one of these fields. This strategy is useful for a thesis that involves an analysis of perception or disciplines and interdisciplinary analysis.

Structuring the papers by the actors or the participants involved in the research would allow each paper to tackle who is involved in the topic and whose visions are to be explored. Taking the example of my topic of planning digitalization, the papers would focus on the views of planners, policymakers, the general public, and computer scientists. The choice of which actors to highlight in the papers will mainly depend on the overarching research question and objective. This strategy is useful for a thesis that involves an analysis of reviews.

By choosing to structure the arguments chronologically, each paper naturally tackles the when in the overall topic. In my case the papers would focus on the printing age, the computer age, and the information age. Another example could be to focus on the different stages of the process of digitalization of planning in each paper. The choice of these processes or temporal milestones reflects on how a certain phenomena has changed throughout history and time. This strategy is useful for a thesis that involves a historical analysis.

In this strategy, each paper takes a case study related to the chosen research topic. Speaking of my research, papers can focus on extreme cases that manifest the research topic, or similar cases that highlight a phenomenon (for more information on the types of cases, Flyvbjerg 2013 is a valuable resource https://arxiv.org/pdf/1304.1186.pdf ). The choice of the cases will mainly depend on the adopted methodology. This strategy is useful for a thesis that involves a comparative analysis of multiple case studies.

By location

In this case, each paper would study a specific location, the “where” in the topic. In my case the papers would focus on  the digitalization of planning in different cities, or in different countries, or even highlight different parts within a city (such as formal versus informal, or rural versus urban). The choice of the location will mainly depend on the overarching research question and objective. This strategy is useful for a thesis that involves a geopolitical analysis.

Using more than one?

In my thesis, I combined strategy 1 (field), 2 (by actors), and 4 (by case). I started out using field and actors, but as I reached the end of the thesis and was looking backward to finalize my overarching argument, I realized that my papers also differed in terms of cases. In fact, one can see these different strategies as a decision on which variables to highlight and which aspects to fix.

At the beginning of the thesis journey, I was interested in writing papers in a way which  presented different fields of action and the views of different actors in each field. Thus, I proceeded to look at the mediatization of urban planning in five fields: planning education, planning practice, planning politics, planning context, and planning culture (which I referred to later as communicative situations). In each paper and field, I highlighted specific actors involved in the process (for example, planning educators and students in planning education). Later I realized that, additionally, each paper reflected the use of a specific planning visualization: education curriculum in planning education, street billboards in planning practice, press news in planning politics, city streets in planning contexts and TV advertisements in planning culture. My overarching concern was to explore the question of ‘Visible urban visions versus the invisible urban challenges’ 

I hope these strategies can be a starting point for those who have chosen the cumulative study for their thesis. Last but not least, I would like to mention that this list is not exclusive and so it will be great to open up the discussion on the other strategies. For any questions, one-one discussions, or insights, feel free to reach out to me via LinkedIn . 

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

First we need to confirm, that you're human. 40 − = thirty nine

Author info

Mennatullah Hendawy is a PhD Candidate at the Chair of Urban Design, TU Berlin where she is also co-leading two research groups: Connecting Urbanity and Towards Equitable Planning Curricula. Mennatulah is also a visiting researcher at the Leibniz Institute for Research on Society and Space in Erkner, Germany and an affiliated Assistant Lecturer at the Department of Urban Planning and Design in Ain Shams University in Cairo, Egypt. Hendawy is co-founder of  Cairo Urban AI  and  First Degree Citizens ; an initiative that tackles critical socio-legal geography. She works on the intersection of urban planning, mediatisation, visualisation, and justice where she is fascinated by the way knowledge, power, and agency are manifested in and co-construct cities and the public sphere.

  • zur Hauptnavigation
  • zur Subnavigation

kumulative dissertation copyright

Your dissertation

You can fulfill your publication obligation by handing in printed copies or by means of a digital publication. For the exact regulations and the various options, please refer to the doctoral regulations of the individual faculty office of the University of Münster.

Information on the submission of dissertations

We recommend contacting the thesis office by telephone prior to submission. Personal submission is possible during the service hours of the thesis office (Monday to Friday 9 a.m. to 12 p.m.) or after making an appointment by telephone. Contactless submission is possible at any time, irrespective of the opening hours of the thesis office. You can find further details in the information sheet Abgabe von Dissertationen .

Publication types

Monographic dissertation: A monographic dissertation (classical dissertation) generally consists of a not yet published, coherent and self-contained scientific treatise. An integral part of the doctoral process is the publication of the dissertation. You fulfill your publication obligation by submitting a prescribed number of printed copies or publisher's editions to the faculty or the university library. In most facultys digital publication is now also possible. The submission regulations can be found in the respective doctoral regulations and in the information sheets of the doctoral examination office.

Cumulative dissertation : Whether and under what conditions a cumulative dissertation is permitted in a faculty is specified in the respective doctoral regulations.

Cumulative dissertations are second publications. Therefore, check under which conditions the publishers allow you to use the essays in your dissertation. The regulations of your publishing contract are decisive. If you have a publishing contract, ask for the right to store a digital copy of your work in an institutional repository. For more information, see under legal advice . The websites Publishers theses policies ( Information from the Technische Universität (TU) Berlin ) and Sherpa Romeo can provide additional guidance on publishing permissions.

Habilitation thesis : The habilitation serves to formally establish the ability to independently and responsibly represent a scientific field in research and teaching (teaching qualification). The habilitation theses or at least their essential parts are to be published by the habilitation candidates, but there is no general obligation to publish. As a rule, the publication should take place within two years after the determination of the teaching qualification. The habilitating faculty and the university library are each entitled to one specimen copy (of the whole or its parts). Details are regulated by the habilitation regulations of your department.

Offer of the University and State Library of Münster

Publishing the printed version.

The University and State Library of Münster includes your dissertation or habilitation in the library holdings, records the publication in its catalogues and reports it to the German National Library. This makes it searchable via national and international search engines.

Information on general questions and submission formalities can be obtained from the University and State Library of Münster's thesis office.

Publishing digitally on the publication server miami

University and State Library of Münster publishes your dissertation or habilitation on the university publication server miami and ensures its permanent availability. The publication is reported to the German National Library, listed in library catalogues and can be found via national and international search engines.

kumulative dissertation copyright

Notes on the submission of digital dissertations

All essential regulations for the creation and submission of a digital dissertation can be found in the checklist for authors of digital dissertations . Information on general questions and submission formalities can be obtained from the University and State Library's thesis office.

Submission statement : For the publication of a digital dissertation with miami, the agreements laid down in the Digital Publication Consent Form (in German) ( incl. Appendix 1 "Upon publication of a digital dissertation") with the University and State Library of Münster apply.

Copyright : As the author, you grant the University and State Library the simple right of use to publicly reproduce the digital dissertation in data networks and to transmit it for individual retrieval. This means that you, as the author, are in principle free to publish your work elsewhere in digital form or as a printed work. In case of a later publication, please indicate that you have already granted the University and State Library the simple right of use for the digital dissertation. The selected digital publication as a form of delivery cannot be subsequently withdrawn. You can find more details under legal advice .

Digital dissertations are permanently stored on the publication server and cannot be subsequently removed.

File format : The dissertations published with miami must correspond to the print version as a graphical image. We therefore recommend generating the printed version from the PDF file and then converting the file to the archivable PDF /A format. The citation capability then remains guaranteed for the digital version. The PDF /A-1b (basic) standard is sufficient, but the PDF /A-1a (advanced) standard is recommended. For more information about PDF /A, see file creation .

Curriculum vitae / Acknowledgements : According to the requirements of some facultys, the printed dissertation must be accompanied by the curriculum vitae, which, however, is subject to data protection in the case of a web publication. We therefore recommend removing the CV from the digital version before the PDF /A conversion and replacing it with blank pages in order to protect personal data. If you expressly wish the CV to be published in the digital version, an express declaration of consent must be provided for the storage and distribution of this personal data. Please note the corresponding item in Appendix 1 "Upon publication of a digital dissertation" in the consent form. If your dissertation contains a preface or an acknowledgement, please consider personal rights and sustainability there as well. In case of doubt, please contact the University and State Library's thesis office.

kumulative dissertation copyright

What do you have to deliver?

  • File names in lower case, no umlauts and special characters, no spaces
  • Name single-file document by name, e.g ., "diss_mustermann.pdf"; uniquely name and enumerate multi-file documents, e.g ., "diss_mustermann_01_titel.pdf," "diss_mustermann_02_chapter_1.pdf," etc .
  • Recommendation: Remove CV before PDF /A conversion and replace with blank pages
  • PDF /A files must not have any security settings ( e.g. password protection)
  • in German (obligatory) about 1000 characters ( incl. spaces), plus 5–7 keywords to describe the content of the work, each separated by semicolon
  • Prescribed number of printed obligatory copies (generated from the PDF file if possible, with included CV if necessary)
  • Digital Publication Consent Form (in German) ( incl. Appendix 1 "Upon publication of a digital dissertation")
  • Possibly other forms from the dean's office

Publish digitally in the University of Münster's publication series, print copies at your choice

If you have completed your doctoral studies with at least "magna cum laude", the University and State Library of Münster offers you the possibility of publishing in the series University of Münster Academic Publications. Your dissertation will be published digitally and you will receive a print-ready file in case you want to order print copies from a publisher or print service provider – for which you are responsible. Habilitation theses can also be published in the series.

Details can be found in the Service Point for Publications.

kumulative dissertation copyright

  • Information on general questions and formalities, submission of dissertations or habilitation theses: Thesis office, University and State Library of Münster
  • Information on publishing with miami or in the University of Münster's publication series, personal advice and support: Service Point for Publications, University and State Library of Münster
  • Questions regarding doctoral regulations and publication types or options: The respective responsible dean's office or examination office

icon

The essay writers who will write an essay for me have been in this domain for years and know the consequences that you will face if the draft is found to have plagiarism. Thus, they take notes and then put the information in their own words for the draft. To be double sure about this entire thing, your final draft is being analyzed through anti-plagiarism software, Turnitin. If any sign of plagiarism is detected, immediately the changes will be made. You can get the Turnitin report from the writer on request along with the final deliverable.

How does this work

Andersen, Jung & Co. is a San Francisco based, full-service real estate firm providing customized concierge-level services to its clients. We work to help our residential clients find their new home and our commercial clients to find and optimize each new investment property through our real estate and property management services.

These kinds of ‘my essay writing' require a strong stance to be taken upon and establish arguments that would be in favor of the position taken. Also, these arguments must be backed up and our writers know exactly how such writing can be efficiently pulled off.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Land use changes in the environs of Moscow

Profile image of Grigory Ioffe

Related Papers

Eurasian Geography and Economics

Grigory Ioffe

kumulative dissertation copyright

komal choudhary

This study illustrates the spatio-temporal dynamics of urban growth and land use changes in Samara city, Russia from 1975 to 2015. Landsat satellite imageries of five different time periods from 1975 to 2015 were acquired and quantify the changes with the help of ArcGIS 10.1 Software. By applying classification methods to the satellite images four main types of land use were extracted: water, built-up, forest and grassland. Then, the area coverage for all the land use types at different points in time were measured and coupled with population data. The results demonstrate that, over the entire study period, population was increased from 1146 thousand people to 1244 thousand from 1975 to 1990 but later on first reduce and then increase again, now 1173 thousand population. Builtup area is also change according to population. The present study revealed an increase in built-up by 37.01% from 1975 to 1995, than reduce -88.83% till 2005 and an increase by 39.16% from 2005 to 2015, along w...

Elena Milanova

Land use/Cover Change in Russia within the context of global challenges. The paper presents the results of a research project on Land Use/Cover Change (LUCC) in Russia in relations with global problems (climate change, environment and biodiversity degradation). The research was carried out at the Faculty of Geography, Moscow State University on the basis of the combination of remote sensing and in-field data of different spatial and temporal resolution. The original methodology of present-day landscape interpretation for land cover change study has been used. In Russia the major driver of land use/land cover change is agriculture. About twenty years ago the reforms of Russian agriculture were started. Agricultural lands in many regions were dramatically impacted by changed management practices, resulted in accelerated erosion and reduced biodiversity. Between the natural factors that shape agriculture in Russia, climate is the most important one. The study of long-term and short-ter...

Annals of The Association of American Geographers

Land use and land cover change is a complex process, driven by both natural and anthropogenic transformations (Fig. 1). In Russia, the major driver of land use / land cover change is agriculture. It has taken centuries of farming to create the existing spatial distribution of agricultural lands. Modernization of Russian agriculture started fifteen years ago. It has brought little change in land cover, except in the regions with marginal agriculture, where many fields were abandoned. However, in some regions, agricultural lands were dramatically impacted by changed management practices, resulting in accelerating erosion and reduced biodiversity. In other regions, federal support and private investments in the agricultural sector, especially those made by major oil and financial companies, has resulted in a certain land recovery. Between the natural factors that shape the agriculture in Russia, climate is the most important one. In the North European and most of the Asian part of the ...

Ekonomika poljoprivrede

Vasilii Erokhin

Journal of Rural Studies

judith pallot

In recent decades, Russia has experienced substantial transformations in agricultural land tenure. Post-Soviet reforms have shaped land distribution patterns but the impacts of these on agricultural use of land remain under-investigated. On a regional scale, there is still a knowledge gap in terms of knowing to what extent the variations in the compositions of agricultural land funds may be explained by changes in the acreage of other land categories. Using a case analysis of 82 of Russia’s territories from 2010 to 2018, the authors attempted to study the structural variations by picturing the compositions of regional land funds and mapping agricultural land distributions based on ranking “land activity”. Correlation analysis of centered log-ratio transformed compositional data revealed that in agriculture-oriented regions, the proportion of cropland was depressed by agriculture-to-urban and agriculture-to-industry land loss. In urbanized territories, the compositions of agricultura...

Open Geosciences

Alexey Naumov

Despite harsh climate, agriculture on the northern margins of Russia still remains the backbone of food security. Historically, in both regions studied in this article – the Republic of Karelia and the Republic of Sakha (Yakutia) – agricultural activities as dairy farming and even cropping were well adapted to local conditions including traditional activities such as horse breeding typical for Yakutia. Using three different sources of information – official statistics, expert interviews, and field observations – allowed us to draw a conclusion that there are both similarities and differences in agricultural development and land use of these two studied regions. The differences arise from agro-climate conditions, settlement history, specialization, and spatial pattern of economy. In both regions, farming is concentrated within the areas with most suitable natural conditions. Yet, even there, agricultural land use is shrinking, especially in Karelia. Both regions are prone to being af...

RELATED PAPERS

Kevin Colin

US Congressional Report

James George

Revista De Cirurgia E Traumatologia Buco Maxilo Facial

David Aguiar de Oliveira

CERIA (Cerdas Energik Responsif Inovatif Adaptif)

Sharina Westhisi

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Muhammad Bilal Mr

Epidemiology and psychiatric sciences

Maarten Bak

Richard Horrocks

Alexandre Guarezemini

The Lancet Respiratory Medicine

Innocent Asiimwe

The British Journal of Radiology

TANIA CAMILA DIAZ CASTRO

Journal of Orthopaedic Research

Blucher Design Proceedings

Gilfranco M Alves

Mahmoud Mohamed

International Journal of Machine Learning and Computing

Hari Kumar , Haritha Ananthakrishnan

Journal of the Endocrine Society

Paola Solis-Pazmino

LOS NORTES DEL HISPANISMO territorios, itinerarios y encrucijadas

AveSol Ediciones Académicas

University of Kent

Kelsey Bennett

Audrey Rosette

European Heart Journal

Tauseef Khan

International journal of midwifery and nursing practice

Yashoda Shrivastava

Médecine Buccale Chirurgie Buccale

boubacar diallo

St Andrews毕业证 gt

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Time in Elektrostal , Moscow Oblast, Russia now

  • Tokyo 01:42AM
  • Beijing 12:42AM
  • Kyiv 07:42PM
  • Paris 06:42PM
  • London 05:42PM
  • New York 12:42PM
  • Los Angeles 09:42AM

Time zone info for Elektrostal

  • The time in Elektrostal is 8 hours ahead of the time in New York when New York is on standard time, and 7 hours ahead of the time in New York when New York is on daylight saving time.
  • Elektrostal does not change between summer time and winter time.
  • The IANA time zone identifier for Elektrostal is Europe/Moscow.

Time difference from Elektrostal

Sunrise, sunset, day length and solar time for elektrostal.

  • Sunrise: 04:06AM
  • Sunset: 08:40PM
  • Day length: 16h 34m
  • Solar noon: 12:23PM
  • The current local time in Elektrostal is 23 minutes ahead of apparent solar time.

Elektrostal on the map

  • Location: Moscow Oblast, Russia
  • Latitude: 55.79. Longitude: 38.46
  • Population: 144,000

Best restaurants in Elektrostal

  • #1 Tolsty medved - Steakhouses food
  • #2 Ermitazh - European and japanese food
  • #3 Pechka - European and french food

Find best places to eat in Elektrostal

  • Best steak restaurants in Elektrostal
  • Best bbqs in Elektrostal
  • Best breakfast restaurants in Elektrostal

The 50 largest cities in Russia

Rusmania

  • Yekaterinburg
  • Novosibirsk
  • Vladivostok

kumulative dissertation copyright

  • Tours to Russia
  • Practicalities
  • Russia in Lists
Rusmania • Deep into Russia

Out of the Centre

Savvino-storozhevsky monastery and museum.

Savvino-Storozhevsky Monastery and Museum

Zvenigorod's most famous sight is the Savvino-Storozhevsky Monastery, which was founded in 1398 by the monk Savva from the Troitse-Sergieva Lavra, at the invitation and with the support of Prince Yury Dmitrievich of Zvenigorod. Savva was later canonised as St Sabbas (Savva) of Storozhev. The monastery late flourished under the reign of Tsar Alexis, who chose the monastery as his family church and often went on pilgrimage there and made lots of donations to it. Most of the monastery’s buildings date from this time. The monastery is heavily fortified with thick walls and six towers, the most impressive of which is the Krasny Tower which also serves as the eastern entrance. The monastery was closed in 1918 and only reopened in 1995. In 1998 Patriarch Alexius II took part in a service to return the relics of St Sabbas to the monastery. Today the monastery has the status of a stauropegic monastery, which is second in status to a lavra. In addition to being a working monastery, it also holds the Zvenigorod Historical, Architectural and Art Museum.

Belfry and Neighbouring Churches

kumulative dissertation copyright

Located near the main entrance is the monastery's belfry which is perhaps the calling card of the monastery due to its uniqueness. It was built in the 1650s and the St Sergius of Radonezh’s Church was opened on the middle tier in the mid-17th century, although it was originally dedicated to the Trinity. The belfry's 35-tonne Great Bladgovestny Bell fell in 1941 and was only restored and returned in 2003. Attached to the belfry is a large refectory and the Transfiguration Church, both of which were built on the orders of Tsar Alexis in the 1650s.  

kumulative dissertation copyright

To the left of the belfry is another, smaller, refectory which is attached to the Trinity Gate-Church, which was also constructed in the 1650s on the orders of Tsar Alexis who made it his own family church. The church is elaborately decorated with colourful trims and underneath the archway is a beautiful 19th century fresco.

Nativity of Virgin Mary Cathedral

kumulative dissertation copyright

The Nativity of Virgin Mary Cathedral is the oldest building in the monastery and among the oldest buildings in the Moscow Region. It was built between 1404 and 1405 during the lifetime of St Sabbas and using the funds of Prince Yury of Zvenigorod. The white-stone cathedral is a standard four-pillar design with a single golden dome. After the death of St Sabbas he was interred in the cathedral and a new altar dedicated to him was added.

kumulative dissertation copyright

Under the reign of Tsar Alexis the cathedral was decorated with frescoes by Stepan Ryazanets, some of which remain today. Tsar Alexis also presented the cathedral with a five-tier iconostasis, the top row of icons have been preserved.

Tsaritsa's Chambers

kumulative dissertation copyright

The Nativity of Virgin Mary Cathedral is located between the Tsaritsa's Chambers of the left and the Palace of Tsar Alexis on the right. The Tsaritsa's Chambers were built in the mid-17th century for the wife of Tsar Alexey - Tsaritsa Maria Ilinichna Miloskavskaya. The design of the building is influenced by the ancient Russian architectural style. Is prettier than the Tsar's chambers opposite, being red in colour with elaborately decorated window frames and entrance.

kumulative dissertation copyright

At present the Tsaritsa's Chambers houses the Zvenigorod Historical, Architectural and Art Museum. Among its displays is an accurate recreation of the interior of a noble lady's chambers including furniture, decorations and a decorated tiled oven, and an exhibition on the history of Zvenigorod and the monastery.

Palace of Tsar Alexis

kumulative dissertation copyright

The Palace of Tsar Alexis was built in the 1650s and is now one of the best surviving examples of non-religious architecture of that era. It was built especially for Tsar Alexis who often visited the monastery on religious pilgrimages. Its most striking feature is its pretty row of nine chimney spouts which resemble towers.

kumulative dissertation copyright

Plan your next trip to Russia

Ready-to-book tours.

Your holiday in Russia starts here. Choose and book your tour to Russia.

REQUEST A CUSTOMISED TRIP

Looking for something unique? Create the trip of your dreams with the help of our experts.

IMAGES

  1. Die kumulative Dissertation

    kumulative dissertation copyright

  2. Kumulative Dissertation

    kumulative dissertation copyright

  3. (PDF) Role of the sediments for dissolved organic carbon (DOC) in

    kumulative dissertation copyright

  4. Kumulative Dissertation Rwth Aachen Mail

    kumulative dissertation copyright

  5. Kumulative Dissertation Beispiel

    kumulative dissertation copyright

  6. Kumulative Dissertation

    kumulative dissertation copyright

VIDEO

  1. Cumulative charge simulation

  2. Theory of Computation

  3. Q12M Kumulative Verteilungsfunktion

  4. PhD student (with ADHD) vlog

  5. KB 5029263 kumulatives Windows 11 Update vom 08 08 2023

  6. Paul Ressel

COMMENTS

  1. Copyright Complications

    Theses and dissertations which contain embedded PJAs as part of the formal submission can be posted publicly by the awarding institution with DOI links back to the formal publications on ScienceDirect. Source 2 - FAQ. Can I include/use my article in my thesis/dissertation? Yes.

  2. Thesis by publication (cumulative dissertation)

    If your dissertation contains complete published or submitted papers, there are several copyright issues to consider. Expand all. Collapse all. 1. Checking the legal conditions. 2. Phrase embedding. 3. Agreement of the co-authors.

  3. Dissertation Copyright

    122 College Hall University of Pennsylvania Philadelphia, PA 19104 215.898.5000

  4. PDF Cumulative versus monographic Dissertation

    Requirements of the PhD regulations (2008) for cumulative dissertations The regulation does not fix a certain number of publications. It just says " § 6 (4) Die Dissertation kann mit Zustimmung der Betreuerin / des Betreuers und der / des Vorsitzenden des Promotionsausschusses als kumulative Arbeit eingereicht werden. Dabei sind mehrere

  5. Fair Use, Copyright, Patent, and Publishing Options

    Use the delayed release (embargo) option if a patent application is or will be in process, noting the reason for the delay as "patent pending.". If you have any questions, please contact Cornell's Center for Technology Licensing at 607-254-4698 or [email protected]. 5.

  6. PDF Guidelines for Cumulative Dissertations

    The dissertation must contain a concluding discussion that r efers to all chapters. This discussion should explain how the chapters contribute to answering the research ques-tion(s) of the dissertation as stated in the introduction. In addition, the overall methodology should be discussed.

  7. CMU LibGuides: Theses & Dissertations: Understanding Copyright

    When in doubt, consult Carnegie Mellon's Center for Technology Transfer and Enterprise Creation (CTTEC), 412-268-7393 or [email protected]. Neither the University Libraries nor ProQuest/UMI require copyright transfer to publish your dissertation. Both require only the non-exclusive right to reproduce and distribute your work.

  8. Writing a cumulative dissertation

    Writing a cumulative dissertation. 2019-10-21. A cumulative dissertation is a collection of articles which have been published in recognised scientific journals or accepted for publication. My PhD dissertation is a cumulative one and in this blog post I describe its structure and things to pay attention to when writing your own.

  9. cumulative dissertations

    A cumulative dissertation combines different publications in well-respected scientific journals into one doctoral thesis. You do not have to request a specific permission for a cumulative dissertation at the Faculty of Medicine at the LMU. However, you have to fulfill a number of criteria listed below. Most importantly, you have to have at ...

  10. thesis

    Cumulative dissertation is probably a literal translation of the German Kumulative Dissertation, which denotes a thesis by publication, compilation thesis or article thesis, i.e., a thesis which typically consists of some peer-reviewed publications, an introduction, and a conclusion.The alternative to this is a monograph thesis, which is written separately as a coherent monolithic work and ...

  11. PDF Kumulative Dissertation

    Supplement/change in journal series for a cumulative dissertation dated 08.11.2023 The list of recognized journals (paragraph 3) of the formal minimum requirements for a cumulative dissertation of 31.01.2008 (FBR resolution of 6.2.2008) is supplemented with the journal "Environmental Data Science" of the Cambridge University press.

  12. Identifying constitutive articles of cumulative dissertation theses by

    Abstract. Cumulative dissertations are doctoral theses comprised of multiple published articles. For studies of publication activity and citation impact of early career researchers, it is important to identify these articles and link them to their associated theses. Using a new benchmark data set, this paper reports on experiments of measuring the bilingual textual similarity between, on the ...

  13. PDF Guidelines for Cumulative Dissertations

    As is the case for all dissertations, an abstract in German and in English is also an integral part of a cumulative dissertation. Special emphasis shall be given hereby to a synopsis of the major topics featured in the preamble and the overall discussion. The customary formal requirements apply to cumulative dissertations, as well (e. g.

  14. PDF Guidelines for Cumulative Dissertations

    Guidelines for Cumulative Dissertations. A cumulative dissertation at the School of Business, Economics and Social Sciences consists of: An introduction. At least three scientific papers that are either published or at a level suitable for publication in academic journals. Each paper should make a substantial original contribution.

  15. How to structure a cumulative dissertation: Five strategies

    A cumulative dissertation consists of a series of papers published, or submitted for publication during the timeframe of the doctorate study. In addition to the papers, the PhD student is required to create an overarching argument that is to be presented in the thesis's introductory and conclusion chapters. The number of papers to be ...

  16. Publizieren

    Publication types. Monographic dissertation: A monographic dissertation (classical dissertation) generally consists of a not yet published, coherent and self-contained scientific treatise. An integral part of the doctoral process is the publication of the dissertation. You fulfill your publication obligation by submitting a prescribed number of printed copies or publisher's editions to the ...

  17. Kumulative Dissertation

    Kumulative Dissertation: https://business-and-science.de/kumulative-dissertation/In diesem Video erfährst du, was eine Publikationsdissertation (auch Sammeld...

  18. PDF (Kumulative) Dissertation

    (3) Die Dissertation kann auch auf Vorveröffentlichungen oder zur Veröffentlichung eingereichten Arbeiten basieren („kumulative publikationsbasierte Dissertation"). Sie muss zu einem einer monographischen Dissertation entsprechenden Erkenntnisfortschritt beitragen und den übrigen Anforderungen nach Absatz 1 entsprechen.

  19. Definition of The Strategic Directions for Regional Economic

    Dmitriy V. Mikheev, Karina A. Telyants, Elena N. Klochkova, Olga V. Ledneva; Affiliations Dmitriy V. Mikheev

  20. Kumulative Dissertation Copyright

    Once your essay writing help request has reached our writers, they will place bids. To make the best choice for your particular task, analyze the reviews, bio, and order statistics of our writers.

  21. Land use changes in the environs of Moscow

    Enter the email address you signed up with and we'll email you a reset link.

  22. Time in Elektrostal, Moscow Oblast, Russia now

    Sunrise, sunset, day length and solar time for Elektrostal. Sunrise: 04:25AM. Sunset: 08:21PM. Day length: 15h 56m. Solar noon: 12:23PM. The current local time in Elektrostal is 23 minutes ahead of apparent solar time.

  23. Savvino-Storozhevsky Monastery and Museum

    Zvenigorod's most famous sight is the Savvino-Storozhevsky Monastery, which was founded in 1398 by the monk Savva from the Troitse-Sergieva Lavra, at the invitation and with the support of Prince Yury Dmitrievich of Zvenigorod. Savva was later canonised as St Sabbas (Savva) of Storozhev. The monastery late flourished under the reign of Tsar ...