Questions answered: ‘Data collection & analysis’

May 29, 2025

Read Time - 19 minutes

All your questions from the 'Data collection & analysis' webinar, answered!

Author support

We hosted a webinar on May 21, 2025 on the topic of data collection and analysis. We received many questions during the session, which we weren’t able to answer within the allotted time. So, for those who attended the webinar, or those who’d like insight into writing a compelling academic manuscript, we’ve collated our answers to the questions below.

If you’re interested in watching the recording of this webinar or exploring one of our previous webinars, complete the form at this link and receive immediate access to our full library of webinar recordings.

Qualitative Research Methods

1. How do I code qualitative data?
Coding qualitative data involves systematically organizing and interpreting textual (or visual/audio) information to identify patterns, themes, or concepts. Start by preparing your data. Transcribe interviews, focus groups, or recordings if needed. Then clean the text. Remove irrelevant information, correct typos if needed, and format it consistently. After, choose a coding approach. There are two main approaches: deductive (top-down), which starts with pre-established codes based on theory or research questions, and inductive (bottom-up), letting codes emerge from the data itself during analysis. Most researchers use a hybrid approach. Some codes are predefined; others emerge during coding. Read your data multiple times to get familiar with it. Then, you’ll apply your codes. Go through the text and highlight segments (phrases, sentences, or paragraphs) that fit a code. You can do this manually (e.g., with highlighters or in Word/Excel) or with software (e.g., NVivo, MAXQDA, Dedoose, ATLAS.ti). Combine similar codes, split overly broad codes, or add new ones as your understanding evolves. Coding is not linear. Go back and recode earlier parts as your framework becomes clearer. Then group related codes into themes or categories. Look for patterns, relationships, and contradictions across the data. Last but not least, interpret and report your findings. Analyze how themes relate to your research questions. Reflect on your own biases and consider alternative explanations.

2. How can I improve the quality of qualitative data?
You can improve the quality of qualitative data by ensuring rich, detailed data through probing questions, triangulating sources, maintaining reflexivity, and documenting context meticulously. To start, design high-quality research questions. Be open-ended and exploratory and avoid yes/no questions. You’ll want to select participants purposefully. Use purposive or theoretical sampling to include a diverse range of relevant perspectives. Ensure representation of key subgroups or extreme cases, if and as necessary. Focus on skilled interviewing and facilitation techniques. As you are conducting interviews, build rapport to foster trust and openness. Use probing questions (e.g., “Can you tell me more about that?”). Avoid leading or judgmental language. Practice active listening to follow up on meaningful cues. Use semi-structured or in-depth interviews to allow depth and flexibility. Record and transcribe interviews verbatim (do not just summarize). You’ll want to use high-quality recording devices and ensure a quiet, private environment. Take field notes to capture non-verbal cues or contextual details. You might want to consider additional or alternative forms of data. For instance, supplement interviews with observations, documents, or photographs for triangulation. Use participant journals, mapping exercises, or diaries for deeper insights. You can test your interview guide or focus group questions with a small sample, like doing a pilot of your study. Then you can revise based on feedback to improve clarity, flow, and relevance. Reflect accordingly, and be aware of and document your own biases, assumptions, and positionality. Feel free to share summaries or themes with participants to ensure accuracy and authenticity of your interpretations. Consider software (e.g., NVivo, MAXQDA) for managing and organizing data efficiently. Reference and use frameworks such as COREQ (Consolidated Criteria for Reporting Qualitative Research).

3. How can we validate findings during qualitative coding?
Validating findings during qualitative coding is essential to ensure that your interpretations are credible, trustworthy, and grounded in the data. Unlike quantitative research, validation in qualitative work is about trustworthiness rather than statistical significance. Use techniques like member checking, peer debriefing, triangulation, and maintaining an audit trail.

Triangulation allows you to use multiple sources, methods, or analysts to confirm findings. Data triangulation is when you compare data across interviews, focus groups, documents, or observations. Investigator triangulation involves multiple researchers in coding and interpretation. Methodological triangulation uses different qualitative methods (e.g., interviews + field notes). Using this technique allows you to confirm that findings aren’t just a product of one source or perspective.

Additionally, have two or more researchers code the same transcripts independently, then compare and discuss. Have them discuss discrepancies to reach a consensus. This activity helps refine the codebook and improve coding consistency. Do not that perfect agreement isn’t necessary; discussion of disagreements strengthens validity.

You can employ the technique of member checking, where you return findings or interpretations to participants for feedback. This can be done during or after coding. Ask the participants if themes resonate with their experience or need clarification. Ensure that your interpretation reflects participants' realities, not your assumptions.

Additionally, actively search for data that contradicts emerging themes. Explore why certain experiences don’t fit the pattern and update themes accordingly. This adds depth and shows you're not cherry-picking data to fit a narrative.

Maintain a detailed record of decisions about codes and themes, changes made to the codebook, and memos reflecting your thought process. This increases dependability and allows others to trace your logic. As you undergo this process, regularly reflect on your own biases, assumptions, and influence on the data. Keep a reflexive journal throughout the coding and analysis process. Acknowledging your role adds transparency and strengthens confirmability.

Do a saturation check. Ensure that coding continues until no new themes are emerging from the data. Saturation signals that your coding is sufficiently comprehensive.

Provide rich, detailed excerpts and context to support each theme. This helps readers understand how you arrived at your conclusions and enhances transferability to other settings.

4. What steps should we follow when formulating questions for a semi-structured interview?
Formulating strong questions for a semi-structured interview is crucial for gathering rich, relevant qualitative data. Semi-structured interviews strike a balance between consistency and flexibility, so your questions need to be well-crafted but open to exploration. Like anything else, first define your research objectives. What are you trying to understand, explore, or explain? Then break down your objectives into themes or domains you want to explore. Each theme will guide a group of related questions. Start with broad, open-ended questions. Use “how,” “what,” or “can you describe” prompts and avoid yes/no or leading questions. Then you can develop probes and follow-up questions. Probes help dig deeper or clarify responses. Start with easy or non-sensitive questions to build rapport. Then move into core and then sensitive or reflective questions. End with a wrap-up question like “Is there anything else you'd like to share?” Stay flexible. Remember, semi-structured means you don’t have to follow the guide rigidly. You can adapt, skip, or reorder questions depending on the flow of conversation. Always follow the participant's lead when appropriate. In summary, you’ll want to define your research objectives, identify key themes, draft open-ended questions, pilot the test, revise as necessary, all while ensuring cultural and contextual sensitivity.

5. How can we strengthen our argument when presenting qualitative data using literary theory?
To strengthen qualitative research using literary theory, choose a theory that aligns with your data (e.g., narrative, feminist, poststructuralist) and use it to deepen, not overshadow, your analysis. Begin with close attention to participants' language, metaphors, and structure, then interpret these through your chosen theoretical lens, grounding insights in rich quotes and clear explanations. Integrate theory throughout the research process, not just in the discussion, and maintain reflexivity about your interpretive role. Use citations thoughtfully, keeping theory in service of the data, and demonstrate how it reveals deeper meanings, hidden power dynamics, or alternative readings that traditional coding might miss.

6. What are the best methods for analyzing autobiographical texts using qualitative analysis?
To analyze autobiographical texts using qualitative methods, researchers can apply approaches like thematic analysis to identify recurring patterns, narrative analysis to examine story structure and identity construction, or discourse analysis to explore how language shapes meaning and power. Other methods include psychoanalytic analysis for uncovering unconscious themes and emotions, semiotic or structural analysis for interpreting symbols and narrative tropes, and grounded theory for building concepts across multiple texts. Regardless of the method, effective analysis requires reflexivity, contextual awareness, and thoughtful use of quotes to ground interpretations in the text.

7. Can a literature review be considered a qualitative method in a mixed-methods study?
A literature review is not inherently a qualitative method, but when it uses qualitative synthesis techniques or functions as part of an interpretive analytical process, it can contribute to the qualitative component of a mixed-methods study. How you conduct and position the review determines its methodological role.

8. How can mixed-methods research be applied within a qualitative study?
Mixed-methods research can be applied within a qualitative study by integrating quantitative components in a supportive role to enrich understanding. This may include using surveys, scales, or descriptive statistics alongside interviews, quantifying themes to identify patterns, or conducting brief quantitative phases before or after qualitative data collection. These approaches help contextualize, triangulate, or deepen qualitative insights without shifting the study’s interpretive focus. The key is ensuring that the quantitative data serve the qualitative research goals and are clearly integrated into the overall design and analysis.

9. When collecting data from both nurses and patients, how should we analyze the findings using thematic analysis?
When collecting data from both nurses and patients, begin by conducting separate thematic analyses for each group to identify themes within their distinct perspectives. This preserves the integrity of each group's experiences. Once individual analyses are complete, engage in a cross-group comparison to identify convergences (shared themes) and divergences (contrasting or conflicting themes). This approach allows for nuanced insights into where perceptions align or differ and helps avoid premature merging of perspectives, which could obscure meaningful differences in lived experience.

10. Can my dissertation be challenged for lacking interview data in an ethnographic study?
Yes, your dissertation could be challenged if interviews are commonly expected or considered essential in your field or by your committee for ethnographic studies. However, ethnography is a flexible method where participant observation and detailed field notes can provide rich, valid data. If you clearly justify your methodological choices, explain why interviews were not necessary or appropriate, and demonstrate how your observational data robustly address your research questions, your study can still be methodologically sound and credible. Transparency about your data collection strategy and reflexivity about its limitations will strengthen your defense.

Sampling and Study Design

11. Can we randomly select participants from a purposively selected group and then randomize a subset based on willingness?
Yes, this is a form of stratified or two-stage sampling, often used in mixed designs.

12. Can convenience sampling be used in quantitative research? Why or why not?
Yes, convenience sampling can be used in quantitative research, but it has important limitations. Since participants are selected based on availability rather than random sampling, this method introduces sampling bias and limits the generalizability of the findings to a broader population. As a result, conclusions drawn from such studies may not be representative. However, convenience sampling is often considered acceptable in exploratory, pilot, or preliminary studies, where the primary goal is to test instruments, assess feasibility, or generate hypotheses rather than to produce generalizable results.

13. Can parametric statistics be used with non-probability sampling?
Technically, yes, parametric statistics can be used with non-probability sampling, but with caution. Parametric tests (like t-tests and ANOVAs) assume random sampling from the population to ensure the validity of statistical inferences. When non-probability sampling (e.g., convenience sampling) is used, this assumption is violated, which can compromise the accuracy and generalizability of the results. Despite this, parametric tests are often still applied, especially when the sample size is large and other assumptions (e.g., normality, homogeneity of variance) are reasonably met. However, findings should be interpreted carefully, acknowledging the sampling limitations.

14. Is there a formula to determine sample size in probability sampling?
Yes, the common formula is:

n=Z2⋅p(1−p)e2n=e2Z2⋅p(1−p)
where Z = z-score, p = estimated proportion, e = margin of error.

15. Do inclusion and exclusion criteria limit the use of simple random sampling?
Yes, they define the sampling frame, which may reduce randomness.

16. Should we include sample size calculations (formulas and results) in the methods section of a paper?
Absolutely. It enhances transparency and methodological rigor.

17. If I can only afford a small pilot study due to expensive equipment, how can I justify the reduced statistical power?
You can justify the reduced statistical power by emphasizing the exploratory nature of the pilot study, which is intended to assess feasibility, refine protocols, and identify potential trends rather than to produce definitive conclusions. Highlight that the study will provide valuable preliminary data to inform the design of larger, adequately powered studies. You can also point out that pilot studies are common and necessary when working with costly, complex, or novel equipment, and that insights gained will help improve resource efficiency and minimize risks in future research.

Experimental Design and Validity

18. How can we address Common Method Variance (CMV) bias?
Common Method Variance (CMV) bias occurs when measurement artifacts, rather than actual relationships between variables, inflate or distort study findings—often due to using the same method (e.g., self-report surveys) for measuring all variables. To address CMV, researchers can use procedural remedies during the study design phase, such as separating measurement times (temporal separation of variables), using different response formats or sources, and assuring anonymity to reduce social desirability bias. Researchers can also use statistical controls during data analysis, such as Harman’s single-factor test (a basic diagnostic tool, though limited) and more advanced techniques like confirmatory factor analysis (CFA) or marker variable techniques. These strategies help reduce or detect CMV, improving the validity of your findings.

19. In an RCT comparing two surgical techniques, how can we anonymize participants if they must be informed about the procedure?
Use single-anonymous designs where outcome assessors are anonymized, or use sham procedures if ethically permissible.

20. Is SPSS suitable for conducting a validity study on various equipment?
Yes, SPSS can handle reliability (e.g., Cronbach’s alpha) and validity tests (e.g., factor analysis, correlation).

Data Analysis and Tools

21. How can we analyze visual data such as symbols, images, motifs, and logos?
Visual data such as symbols, images, motifs, and logos can be analyzed using several qualitative methods, depending on the research goals. Visual content analysis is a systematic, often quantitative method that involves coding and counting visual elements to identify patterns or trends. It’s useful for large datasets or when seeking frequency-based insights. Semiotics focuses on the study of signs and their meanings. It examines how visual elements function as signs (signifier and signified) within cultural and social contexts. This method is well-suited for interpreting symbolism, branding, and meaning-making processes. Panofsky’s iconographic method is a three-tiered interpretive approach used primarily in art history. Pre-iconographic is identifying basic visual elements (shapes, colors, objects). Iconographic recognizes conventional themes, stories, or symbols. Iconological interprets deeper cultural, historical, and philosophical meanings. These methods can be used independently or in combination, depending on the depth and type of analysis required.

22. In contexts with limited data access, how can we enhance the credibility and generalizability of qualitative analysis for international publication?
In contexts with limited data access, you can enhance the credibility and transferability (a more appropriate term than "generalizability" in qualitative research) of your qualitative analysis by using several key strategies. Thick description provides rich, detailed accounts of the setting, participants, and context. This allows readers to understand the depth and nuances of the findings, making it easier for them to judge whether the results may apply to other contexts. Triangulation uses multiple data sources, methods, or theoretical perspectives to cross-check findings. This strengthens the credibility of the analysis by showing consistency across different angles, even when data access is limited. Contextualizing findings within broader literature situates your analysis in relation to existing research, especially global or comparative studies. This helps demonstrate how your findings contribute to or challenge broader understandings, thereby increasing their relevance to international audiences. Additionally, consider reflexivity, being transparent about your role as a researcher and any limitations in data access. Acknowledging constraints while demonstrating methodological rigor can increase trustworthiness and appeal to international reviewers.

23. If we struggle with data analysis, where can we find statisticians online for help?
Platforms like Upwork, ResearchGate, or university consulting services offer access to statisticians.

24. Is it possible to conduct research comparing existing data without using a questionnaire?
Yes, it is entirely possible to conduct research comparing existing data without using a questionnaire, namely through secondary data analysis, content analysis or archival research. You can conduct secondary data analysis, which involves analyzing data that were originally collected by others, such as government statistics, health records, or large-scale surveys. Researchers can compare variables, trends, or outcomes across groups or time periods. Content analysis is used to systematically analyze the content of texts, media, or documents. This method is valuable for comparing themes, frequency of terms, or representations in sources such as news articles, speeches, or social media. Archival research involves the use of historical records, organizational documents, legal cases, or institutional data. Researchers can compare how practices, policies, or outcomes have evolved over time or vary across contexts. These approaches are especially useful when primary data collection (e.g., questionnaires) is not feasible due to time, cost, or access constraints. They also allow for robust comparative analysis, provided the existing data are relevant, reliable, and appropriately contextualized.

AI in Research

25. Which AI tool is most reliable for writing research articles with accurate in-text citations and references?
We are aware that some researchers use tools such as Zotero, EndNote, Scite.ai, Trinka AI, and Grammarly with citation plugins; however, please note that we do not specifically endorse or have experience testing these tools ourselves.

26. If an interviewee struggles with speaking, can AI be used to rephrase their responses during transcription without losing meaning?
We believe this would depend on discipline specific norms. Also, we would suggest one to be cautious of any AI based tools’ interpretation of the interviewee’s intended meaning. Additionally, to reiterate, Sage requires a declaration of AI use for data collection and analyzing. Please read our full artificial intelligence policy here: https://us.sagepub.com/en-us/nam/artificial-intelligence-policy

27. How can we ethically use AI tools for language accuracy? What are the dos and don’ts?
As per Sage policies, assistive AI can be used for improving grammar, clarity, and structure of content that one has authored themselves. Please read our full policy here: https://us.sagepub.com/en-us/nam/artificial-intelligence-policy

Do: Use of assistive AI need not be disclosed, but all content (including any improvements made with Assistive AI) should be rigorously reviewed before submitting a manuscript.
Do: Any use of Generative AI (if used to generate anything for the study/article) should be disclosed at the time of submission.
Don’t: Generative AI should not be used to artificially create or modify core research data. AI should not be used to artificially generate or misrepresent sources or paraphrase content without proper attribution.
Don’t: Never share any sensitive personal or proprietary information on an AI platform as this may expose sensitive information or intellectual property to others.

Mixed-Methods and Sequential Design

28. In a sequential mixed-methods study, if we identify variables qualitatively and then use existing items from literature for quantitative analysis, what is the value of the initial qualitative phase?
In a sequential mixed-methods study, the initial qualitative phase plays a critical role, even when existing quantitative items are later used. Its value includes ensuring contextual relevance. Qualitative data help identify how participants understand and experience key concepts within their specific cultural, social, or organizational context. This ensures that the constructs you later measure quantitatively are meaningful to the population being studied. The qualitative phase can uncover nuances, dimensions, or relationships that may not be captured in existing literature. Refining constructs can guide the selection or adaptation of established measurement items to better align with the lived experiences of your participants. Furthermore, by grounding the quantitative phase in real-world insights, you improve content and construct validity, ensuring that the items truly reflect what they are intended to measure in your particular setting. The qualitative findings can support your rationale for choosing or adapting existing scales, which is especially important for peer reviewers and in international or cross-cultural research. Thus, even if you're not creating new survey items from scratch, the qualitative phase provides critical theoretical, contextual, and empirical grounding that strengthens the overall study design.

Research Logistics and Protocols

29. Is it always necessary to follow a 45-minute interview protocol, especially in time-sensitive survey projects?
No, you can adapt the length based on context, but if you do so, ensure depth and coverage of key themes.

Data Sources

30. Can GIS and remote sensing data be categorized as secondary data sources?
Yes, GIS (Geographic Information Systems) and remote sensing data can be categorized as secondary data sources when the data were originally collected by another party, such as government agencies, research institutions, or satellites operated by space agencies, and are then repurposed for your own research objectives. For example, GIS data from national databases or municipal agencies may include land use, infrastructure, or demographic information that can be analyzed spatially. Remote sensing data, such as satellite imagery from NASA or ESA, can be used to assess environmental changes, vegetation cover, or urban development without direct data collection. When researchers use such pre-existing datasets, they are engaging in secondary data analysis. However, if they collect and process raw spatial or satellite data themselves for the first time, that would constitute primary data.

Questions answered: ‘Data collection & analysis’

Author support

Qualitative Research Methods

Sampling and Study Design

Experimental Design and Validity

Data Analysis and Tools

AI in Research

Mixed-Methods and Sequential Design

Research Logistics and Protocols

Data Sources

Share this article

The latest from Sage Perspectives

Aging Gracefully: Dancing your way into Health - opens in a new tab

Demystifying the publishing process: Sussex edition - opens in a new tab

Questions answered: Understanding journal-level and article-level metrics - opens in a new tab

Share this article

You May Also Be Interested In

Research with impact: Q&A with Gary Britton

How to develop your academic future as a resident

The evolution of Gold OA: Trends and predictions

Questions answered: Understanding journal-level and article-level metrics

Choose your location

Questions answered: ‘Data collection & analysis’

Author support

Qualitative Research Methods

Sampling and Study Design

Experimental Design and Validity

Data Analysis and Tools

AI in Research

Mixed-Methods and Sequential Design

Research Logistics and Protocols

Data Sources

Share this article

The latest from Sage Perspectives

Aging Gracefully: Dancing your way into Health - opens in a new tab

Demystifying the publishing process: Sussex edition - opens in a new tab

Questions answered: Understanding journal-level and article-level metrics - opens in a new tab

Share this article

You May Also Be Interested In

Research with impact: Q&A with Gary Britton

How to develop your academic future as a resident

The evolution of Gold OA: Trends and predictions

Questions answered: Understanding journal-level and article-level metrics