April 4, 2025

Deep Research Is Epic, But Only If You Know How to Use it

Why Should You Care?

Deep research is a powerful feature, offering highly detailed, real-world insights that can support diverse research functions from trend forecasting to complex multidisciplinary analysis. However, clear prompting, patience, and domain-specific expertise is crucial to unlock its full value. Importantly, deep research outputs aren’t immune to errors, and should be treated as augmented input—not autonomous conclusions. To make the most of this feature, businesses should employ it in areas they understand, approach queries with clear intent, and always apply critical review and human oversight before acting on results.

What Did We Test? → We tested OpenAI’s deep research feature with two challenging research tasks, one of which focused on industry trend analysis and predictions while the other tackled a complex, multidisciplinary argument.

What Did We Find Out? → Deep research is an enormously powerful tool but it requires patience, precision, domain-specific knowledge, and a boatload of critical thinking.

Are There Any Caveats? → There’s a lot deep research can’t do—don’t expect to leverage this feature to create finalized, fully formatted, research deliverables like industry reports or academic papers.

Why Does This Matter? → Despite being impressive, deep research, in its current form, shouldn’t replace individual human researchers or teams. It does exactly what it’s told to do, but it’s not particularly creative or exploratory, and using it effectively depends upon the precision and specificity of your input as well as what you know about your subject of interest.

If you find our AI experiments useful and interesting, please also consider examining how Lumenova’s responsible AI platform could support your AI risk management process and book a product demo today.

Main Insights

  • Deep research can provide robust, high-utility outputs for tasks that require real-world trend analysis and prediction and multidisciplinary knowledge synthesis and argumentation.

  • Deep research can be a powerful learning tool, specifically for domain experts who wish to identify additional learning opportunities within their specialization.

  • Leverage deep research to augment your research but don’t use it as a replacement for human researchers—it’s good, but it’s not perfect.

  • Deep research takes time—when leveraging this feature, don’t expect an output in a few minutes.

  • Only use deep research for research tasks that fall within your knowledge base and expertise—to verify the legitimacy of outputs, domain-specific knowledge is necessary.

  • Deep research requires a specific prompting approach—users must clearly articulate research ideas, scope, and objectives from the get-go.

  • Don’t expect deep research to generate truly novel research ideas or concepts—use it to source inspiration for these ideas and concepts.

  • Choose your deep research queries wisely—ChatGPT Plus users can only administer a fixed number of queries per month (up to 10), and they can quickly become depleted (Pro users have 120 queries).

  • Deep research can be finicky—we advise using this feature outside of peak usage times.

The Business Case

High-Value Insight Delivery, But Only with Clear Scoping and Expert Interpretation

💡 Why it Matters

  • Deep research can produce robust, real-world insights for industry trends and multidisciplinary theory.
  • A valuable tool for strategy, innovation, and analysis.
  • Works best when research objectives, scope, and context are clearly defined.
  • Requires domain-specific knowledge due to output depth and complexity.

✍️ Key Actions

  • Ensure that only subject-matter experts or trained analysts use the feature.
  • Pair deep research requests with summarization prompts, to bolster output interpretability.
  • Use deep research for high-stakes, high-complexity questions.

Output Comprehensiveness Can Inspire a False Sense of Credibility

💡 Why it Matters

  • Outputs are well-written and highly detailed, creating the illusion that they’re more trustworthy than they truly are.
  • Hallucinations and logical missteps can occur, and long outputs can obscure subtle errors that subject-matter experts would catch.

✍️ Key Actions

  • Don’t treat deep research as a standalone authority.
  • Request in-text citations to streamline source validation and diminish fact-checking burdens.
  • Audit and fact-check high-impact outputs.

Powerful Tool for Expert Learning and Knowledge Expansion

💡 Why it Matters

  • Deep research can be a learning accelerator, revealing unfamiliar connections, summarizing dense literature, or identifying knowledge gaps.
  • Potential high-value returns for analysts, consultants, and domain experts seeking a deeper, more nuanced understanding of their field.

✍️ Key Actions

  • Support domain experts who use deep research to expand their mental models.
  • Integrate deep research into professional development workflows.
  • Leverage it as an inspiration source, not a replacement for deep thinking or critical reflection

Executive Summary

Experiment Setup

In this experiment, we examined ChatGPT’s relatively new deep research feature—designed to mimic the function that a human researcher or research team would perform when taking a deep dive into a specific topic or problem.

To do so, we presented ChatGPT with two distinct yet challenging research objectives, one of which required an industry trend analysis coupled with a series of relevant predictions while the other required the development of a multidisciplinary argument addressing a complex, theoretical concept.

While most of our experiments have focused on evaluating the capabilities of frontier AI reasoning models (e.g., o1, o3-mini-high, DeepThink-R1) we chose to pivot here because deep research represents a notable advancement and shift in frontier AI capabilities.

Importantly, we’re not conducting a formal test—as we have with previous experiments—where we concretely evaluate a model’s performance on a structured task with pre-defined success criteria. Rather, we’re “playing around” and exploring the deep research function more open-endedly, attempting to reveal insights that could help users maximize the value this capability can deliver.

Consequently, the format of this piece differs from its predecessors—we offer no testing hypothesis, prompt-specific takeaways, or results. Instead, we adopt a more big-picture and discussion-based approach, and we hope that readers find it equally useful.

Prompt Descriptions

Both prompts articulate research interests, ideas, scope, and objectives clearly. However, neither prompt directly provides the model with research examples or sources. This was an intentional decision, made in the interest of understanding the dynamics of how the model chooses to conduct and communicate research findings.

Trend Analysis & Predictions (Prompt 1): This prompt instructs the model to analyze a series of trends and developments within the AI tools landscape and make several predictions for the future of this landscape based on the analysis conducted.

Multidisciplinary Argument (Prompt 2): This prompt instructs the model to craft a complex, multidisciplinary argument against the development of superintelligence, while proposing a non-obvious steelman and additional counterarguments.

Method

  • Prompts 1 and 2 were administered in a single interaction.
  • Both prompts went through several rounds of iteration before being administered.
  • Both prompts were designed to support a narrow research focus while enabling open-ended source exploration.

Key Findings

  • Deep research outputs provide real-world value and insights, whether they’re investigating large-scale industry trends or theoretical multidisciplinary arguments—they’re the real deal.

  • While deep research outputs are strong, they don’t always include the most up-to-date information, even when you specify an information cutoff date—for example, in the response to prompt 1, there was no mention of frontier AI agents like OpenAI’s Operator or Anthropic’s AI Agents.

  • Deep research outputs are structurally inflexible—they don’t holistically adapt (they do make some minor adaptations like tables) to a specific kind of research query (e.g., analytical vs. argumentative).

  • Rigid output structures aren’t necessarily a bad thing, especially since they’re well-organized and easy to navigate—as long as you stick to using deep research for narrow research tasks that don’t require formal deliverables, you should be okay.

  • Deep research queries can take anywhere from 10 minutes to much longer to be answered, and it’s not exactly clear what factors motivate this dynamic.

  • Building on the previous point, we suspect a correlation between the complexity of research topics/objectives and “time spent researching,” but more testing is needed to prove this. We’ve noted a similar although somewhat inconsistent phenomenon with reasoning models.

  • Deep research outputs are incredibly comprehensive and detailed—to derive value from them, users need to understand their query content on a fundamental level.

  • Following a deep research response, consider requesting an output summary for a more concise overview—parsing through the 20-plus page response can take even longer than it took to generate.

  • To optimize the value that deep research delivers, use it in tandem with other AI tools and features, not only for summarization purposes but also to isolate specific findings or insights and leverage them as inspiration for further research.

  • Generic deep research queries will be met with follow-up responses that request additional information and clarification from the user. Users should answer these clarification requests as precisely as they can.

  • When leveraging deep research, ChatGPT doesn’t always produce an output, occasionally returning error messages indicating it was unable to complete the task. If you’re using it in a browser, ensure you leave the tab open while the response is generating. We also suspect output failures might be linked to increased latency times/congestion during peak usage hours.

  • Don’t let the comprehensiveness of the output create a false sense of trust and legitimacy—deep research is still prone to hallucinations, hence the importance of domain-specific knowledge, critical thinking, and fact-checking.

  • Deep research does almost exactly what you tell it to do—it doesn’t exhibit a default “exploratory” mindset and while it may frame certain ideas as “novel,” these claims are highly debatable if not entirely baseless.

  • Speaking to the previous point, we hypothesize that if a deep research query outlined explicit instructions to adopt more creative/exploratory research approaches, research novelty could increase. For both our prompts, research did venture beyond stated objectives and directions, though we provided instructions to do so.

  • Deep research tends to default to citing sources comprehensively. However, consider requesting in-text citations in your initial prompt to simplify fact-checking and source discovery/navigation.

  • When the deep research feature is activated, users can’t pause output generation or edit their original input—when you use deep research, you’re making a “commitment” (for lack of a better term).

Prompts

Prompt 1 - Trend Analysis & Predictions

Generate a comprehensive research report that analyzes the AI tools landscape and makes several predictions regarding the future of this landscape. Before delivering your final output, you must double-check it to verify that all parameters have been observed.

Beyond the parameters included below, you will not receive any additional information. You must promptly begin research upon receiving this prompt.

Your final output must be a finalized research report — alternative outputs such as “report outlines” or “report overviews” will not be accepted.

Parameters listed under Report Scope represent a starting point. You must expand upon these parameters where possible and relevant.

Report Scope:

  • Provide an in-depth but easy-to-interpret analysis of the global AI tools landscape. This analysis should:
    • Focus on AI tools released between January 1st, 2023, and March 1st, 2025. Do not include any tools released outside of this timeframe.
    • Categorize AI tools by their intended application domain (e.g., content creation, data analysis, sales automation, robotics, scientific research, etc). If overlaps exist, ensure tools are categorized appropriately across each relevant domain. Also include additional domains where necessary.
    • Categorize AI tools by type (e.g., general-purpose, multi-modal, image generation, advanced reasoning, advanced search, code generation, etc.). If overlaps exist, ensure tools are categorized across each type. Also include additional types where necessary.
    • Comprehensively identify relevant trends across intended application domains, tool types, required skills, related risks and impacts, and general use trends.
  • Provide multiple well-subsidized, easy-to-interpret predictions on the near-term future (3-5 years from now) of the AI tools landscape, including:
    • Emerging use trends across a diversity of sectors, from finance and healthcare to education and law enforcement, among several others.
    • What types of AI tools are likely to become most economically valuable, advantageous, and lucrative to both businesses and consumers of all profiles.
    • What kinds of AI advancements would be required to create major, paradigm-shifting breakthroughs in the AI tools landscape.
    • What types of distinctly different AI tools could emerge and where such tools could deliver the most value at the level of industries, governments, and consumer populations.
    • The core skills that humans would require to create tangible value with emerging AI tools.
    • The most salient risks and impacts that current and emerging AI tools will inspire at localized, systemic, and existential scales.
    • Whether the current technological trajectory we are on suggests that we are moving toward a fully automated, semi-automated, or predominantly augmentative future.

Additional Requirements:

  • The report must target two key audiences: business leaders and AI practitioners.
  • The report must analyze at least 100 AI tools.
  • The full report must be directly provided in this chat.
  • You must cite all your sources at the end of your output.
  • The report must be data-driven, showcasing clear insights supported by credible statistics, primary source material, empirical research findings and publications, documented industry trends and developments, and expert opinions.
Prompt 2 - Multidisciplinary Argument

Generate a multidisciplinary research-driven argument that centers on the concept of superintelligence. Before delivering your final output, you must double-check it to verify that all parameters have been observed.

Beyond the parameters included below, you will not receive any additional information. You must promptly begin research upon receiving this prompt.

Your final output must be a finalized argument — alternative outputs such as “argument outlines” or “argument overviews” will not be accepted.

Parameters listed under Argument Scope represent a starting point. You must expand upon these parameters where possible and relevant.

Argument Scope:

  • Synthesize knowledge from the following disciplines: phenomenology, ontology, philosophy of mind, game, conflict, and information theory, systems thinking, cognitive psychology, behavioral economics, futurism, geopolitics, and AI ethics. You may include additional disciplines if necessary.
  • Craft a robust, meticulous, in-depth argument that draws from all the disciplines listed above, centering on the concept of superintelligence.
  • Your argument must present a strong, multi-disciplinary case for why superintelligence should never be developed.
  • Your argument must include a steelman of the case against superintelligence. This steelman should not focus on the “obvious,” for example, superintelligence representing a generic cure for humanity’s most pressing problems.
  • In addition to the steelman you present, you must also consider and include other relevant counterarguments to the case against superintelligence.
  • Your argument should explore the potential macro-scale impacts of superintelligence on human well-being, cognition, relationships, and culture.
  • Your argument should explore the potential global impacts of superintelligence on diplomacy and international relations, geopolitical conflict, market dynamics, and technological innovation.
  • Your argument should propose a few viable strategies for executing the argument you present in the real world.
  • You must substantiate your argument with an initial explanation of why it is crucial to examine the concept of superintelligence from a critical, multidisciplinary perspective.

Additional Requirements:

  • Your argument must target two key audiences: AI ethicists and existential risk researchers.
  • You must cite all your sources at the end of your output.
  • The full argument must be directly provided in this chat.
  • Despite tackling a deeply complex topic through a multidisciplinary lens, the argument must be understandable to anyone with a college degree.
  • The argument should strive for novelty and originality, challenging conventional ideas and encouraging “thinking big.”

Outputs

ChatGPT Deep Research

Prompt 1: Response (Researched for 10 minutes)

Prompt 2: Response (Researched for 12 minutes)

Reflection

First off, we’ll address what went on “behind the scenes” while we iterated the prompts for testing the deep research function—it was frustrating, to say the least, but thankfully, we learned a few things along the way. Let’s start with what not to do:

Avoid multi-modal input requests → Deep research is a purpose-built feature, with a narrow application scope. As we saw in the first part of the response to prompt 1, the model may generate some tables to accompany certain research findings, but it won’t reliably generate complete data visualizations (e.g., bar graphs, scatter plots, etc.), logos, or other visual elements typically present within complex research reports—we attempted to do this numerous times, and failed consistently. While we think this limitation will be resolved sooner than expected, for now, we advise using deep research in tandem with other features like ChatGPT’s updated “Create Image” function, which now supports complex data visualizations and brand-related content.

Don’t ask for finalized research deliverables → This is tangential to the previous point—finalized research deliverables like branded industry reports and scientific papers aren’t just about presenting research findings. They go a step further, organizing and communicating research in structurally specific ways designed to satisfy the needs and interests of certain audiences, often observing various norms for illustrating data and research methodologies (e.g., scientific papers and white papers adhere to different structural standards). Once more, we aren’t suggesting that deep research won’t be able to do this eventually, and even if it were, we’d advise restricting the process of developing research deliverables to human teams.

Don’t force specific formatting requirements → The final output structure that deep research defaults to is rigid although quite strong—findings and ideas are well-organized, easy to parse, and accessible (provided you have domain-specific knowledge). As soon as the model is instructed to follow granular formatting parameters (e.g., citation formats, page numbers, spacing/font guidelines, downloadable documentation, etc.) things start to go awry pretty quickly. In fact, we found that the model spends so much time on meeting formatting criteria that it appears to “forget” crucial research parameters, which matter most.

Don’t overload the model with unnecessary information → While we’ll need to play around with deep research more to arrive at a concrete conclusion, we suspect there could be an ideal balance between “enough” and “too much” information in the initial input prompt. Deep research outputs are extremely detailed and in-depth by default, which challenges a fairly consistent AI phenomenon—most of the time, a longer, more complex prompt will elicit longer, more complex outputs. In other words, including as much information as possible in deep research queries might not actually yield higher-quality results—it could even perpetuate diminishing returns—hence the unique importance of precision in this context.

While we didn’t test how deep research handles file attachments (e.g., spreadsheets, PDFs, academic literature) we expect it wouldn’t encounter any issues here—if research is highly targeted, attachments would likely enhance response quality, depth, and scope by providing rich, additional context. By contrast, we suspect that for more open-ended or explorative queries, attachments could facilitate overly narrow responses. More broadly, we think that deep research will become a high-utility tool for knowledge workers and domain experts who wish to:

  • Expedite learning curves by identifying nuanced gaps in their understanding of their field or specialization, isolating desirable skills, exploring new research avenues, or simply enhancing and streamlining research strategies and methodologies.

  • Derive inspiration for novel research directions, hypotheses, and experiments by synthesizing multi-domain knowledge, identifying links between non-obviously related concepts, tracking recent innovations and scientific advancements, and simplifying dense, complex, or theoretical information.

  • Support more holistic, multidisciplinary perspectives that reflect how their field or specialization dynamically interacts with a variety of other interest domains—this particular benefit could become extremely useful for navigating the future of work.

Building on the points above, we leave readers with several additional takeaways:

  • Deep research is an agentic AI feature—while some might debate this claim, we’d argue that due to its ability to interact with digital environments, achieve stated objectives, synthesize/retrieve information from external sources, and “reason” about research processes, deep research is agentic although heavily reliant on human input.

  • Deep research is assistive, not replacing—using this feature effectively depends on human-born inspiration, ideation, and strategic direction. It’s also clear that deep research is a highly purpose-built feature designed specifically for researchers (not general audiences).

  • Deep research is only as “smart” as its user—there’s nothing new here. We’ve known for a while that high-quality inputs will more reliably yield high-quality outputs. For deep research, the only additional caveat is that it requires subject matter expertise to be genuinely effective.

  • Deep research is a distinctly unique feature—like image generation features, deep research is designed to execute a narrow task (i.e., research deep dives). A failure to recognize this could lead certain users into “AI rabbit holes” where they hopelessly iterate with a prompt aiming to nest additional, unsupported tasks within its structure.

  • Deep research requires a special prompting approach—beyond conventional prompting approaches (e.g., providing ample structure, guidance, and context) deep research requires more precision. Narrower queries that limit their scope to well-defined, targeted research interests tend to yield higher-value outputs. For pro users who face far fewer query restrictions, iteration (i.e., responding to the model’s follow-up questions or clarification requests) could be effective in honing in on precise research directions.

Before concluding, we’ll address a final, outstanding concern, one that we expect certain readers will find troublesome: as a knowledge worker, domain expert, or researcher, does deep research pose a tangible threat to my professional livelihood? The answer to this question depends on a series of factors:

  1. Are you working for an organization that prioritizes automation over augmentation?
  2. Does your organization have an AI ethics policy, and if so, is it genuinely observed?
  3. Do you perform a unique function within your organization that no one else is capable of performing?
  4. Does your research require ingenuity, creativity, hands-on work, and/or multidisciplinary expertise?
  5. Do your research responsibilities extend to producing formal deliverables or running real-world experiments?
  6. Would your coworkers describe you as a vigorous critical thinker with a clear vision?
  7. Have you built an AI skills foundation and do you regularly maintain it?
  8. Are you profoundly in touch with frontier AI advancements?

If you provided the following answers to questions 1 through 8—yes, no, no, no, no, no, no, no—then we’d argue that your fear of replacement is legitimate. However, this shouldn’t be interpreted as a guarantee of your potential future obsolescence. Each of the questions we posed aligns with an opportunity for change and growth, but in the age of AI, it’s both dangerous and naive to wait for handouts—your future relevance is determined by the actions and choices you make today, so hold yourself brutally accountable and critically examine the unique value you provide with unobstructed honesty. Time is ticking, don’t wait.

Call to Action

🔊 Businesses: Treat deep research as a precision tool, not a generic, one-size-fits-all solution—leverage it to support expert-driven workflows, not to replace them. To maximize its value, pair it with other AI features for formatting, visualization, and delivery, and ensure your teams are trained to prompt it purposefully, interpret it critically, and integrate it strategically within human-led research and decision-making processes.

🔊 AI Governance, Ethics & Safety Practitioners: It’s not enough to characterize AI systems by the risks they pose or the purposes for which they’re designed—narrow, purpose-built AI features with high disruptive potential, like deep research, which comprise a part of a larger AI system or model, must be distinctly categorized. Deep research isn’t just another low-level function that ChatGPT can perform—we need governance and ethics frameworks that are much more granular and robust, focusing on specific, high-impact AI capabilities, whether designed as features that can be “activated” or built-in functions.

To all our readers, we invite you to check out Lumenova’s responsible AI platform and book a product demo today to ensure that all your AI governance and risk management needs are met. For those interested in exploring additional AI resources, we encourage you to visit our blog, where you can track the latest developments in AI governance, safety, ethics, and innovation. You can also find debriefs of our weekly AI experiments on our Linkedin and X.

Make your AI ethical, transparent, and compliant - with Lumenova AI

Book your demo