BME logo

Data Analysis Process

Detailed documentation of how publications are processed, filtered, and aggregated to generate statistics and visualizations.

Analysis Flow Overview

The analysis process transforms raw publication data into comprehensive statistics through a multi-step pipeline.

Processing Steps

  1. Data Loading: Raw publications are loaded from MTMT API
  2. Publication Instantiation: Each publication is wrapped in a Publication class instance
  3. Scientific Filter: Only publications with category.mtid === 1 are considered scientific
  4. Authorship Filter: Only publications where the author is a typed author (not just contributor) are included
  5. Data Enrichment:
    • Load Norway level ratings
    • Lookup Impact Factor from IF list
    • Calculate author share
  6. Metric Calculation:
    • Q Score calculation
    • I Score calculation
    • Impact Factor assignment
    • Publication rating extraction
  7. Aggregation: Statistics are aggregated across all publications
  8. Yearly Grouping: Data is grouped by publication year for trend analysis
  9. Derived Metrics: H-Index and total Q Score are calculated
  10. Chart Data Preparation: Data is formatted for visualization

// Main analysis function flow
export const analyze = async (
  author: { mtid: string; name: string; degrees: string[] },
  rawPublications: Publication[],
  translations?: Record<string, string>
): Promise<ProcessedPublicationData> => {
  // 1. Filter scientific publications
  const publications = rawPublications
    .map((data) => new Publication(data))
    .filter((pub) => pub.isScientific());
  
  // 2. Filter by authorship
  for (const pub of publications) {
    if (!pub.authorships?.some(
      (a) => a.author?.mtid === Number(author.mtid) && 
             a.authorTyped === true
    )) {
      pub.contributorOnly = true;
      continue; // Skip contributor-only publications
    }
    
    // 3. Load additional data
    loadNorwayLevel(pub);
    const impactFactor = await getImpactFactor(pub);
    const authorShare = pub.getAuthorShare(author.mtid);
    
    // 4. Calculate metrics
    const qScore = await pub.getQScore(authorShare, impactFactor);
    const iScoreData = pub.getIScore();
    
    // 5. Aggregate statistics
    stats.totalScientificPublications++;
    stats.qScore.fromArticles += qScore.qscore;
    // ... more aggregations
  }
  
  // 6. Calculate derived metrics
  stats.hIndex = calculateHIndex(citations);
  stats.qScore.total = stats.qScore.fromArticles + stats.qScore.fromBooks;
  
  return { statistics, chartData, authors, types, journals, publications, meta };
};

Publication Filtering

Publications must pass multiple filters to be included in the statistics.

Scientific Publication Filter

Only publications marked as scientific are processed. A publication is considered scientific if:

pub.category?.mtid === 1

Authorship Filter

The author must be a typed author (authorTyped === true), not just a contributor. Publications where the author is only a contributor are marked with contributorOnly = true and excluded from statistics.

Note: Contributor-only publications are still included in the publication list for reference, but they do not contribute to any metrics or statistics.


// Publication filtering logic
for (const pub of publications) {
  // Step 1: Must be scientific
  if (!pub.isScientific()) continue;
  
  // Step 2: Author must be typed author (not just contributor)
  if (!pub.authorships?.some(
    (a) => a.author?.mtid === Number(author.mtid) && 
           a.authorTyped === true
  )) {
    pub.contributorOnly = true;
    continue; // Exclude from statistics
  }
  
  // Publication passes all filters, process it
  stats.totalScientificPublications++;
}

Publication Type Categorization

Publications are categorized by both type and language/edition to provide detailed statistics.

Publication Types

  • scientificArticle: Journal articles (journalarticle)
  • conferenceArticle: Conference papers in book chapters
  • bookChapter: Book chapters (non-conference)
  • book: Full books
  • patent: Patents
  • thesis: Theses/dissertations
  • other: Other publication types

Language/Edition Categories

Each publication type is further categorized by language and edition:

  • foreign: Foreign edition publications
  • hungarianForeign: Hungarian publications in foreign language
  • hungarianNative: Hungarian publications in Hungarian language

Category Determination Logic:

  1. If foreignEdition === true → foreign
  2. Else if foreignLanguage === true → hungarianForeign
  3. Else → hungarianNative

// Publication type categorization
const update_PublicationsByType_Stats = (
  pub: Publication,
  stats: Statistics
) => {
  const type = pub.type?.otypeName?.toLowerCase() || "";
  
  // Determine category (foreign, hungarianForeign, hungarianNative)
  const category = pub.foreignEdition
    ? "foreign"
    : pub.foreignLanguage
    ? "hungarianForeign"
    : "hungarianNative";
  
  // Map publication types
  const typeMap: Record<string, keyof Statistics["publicationsByType"]> = {
    journalarticle: "scientificArticle",
    bookchapter: pub.subType?.name?.toLowerCase().includes("konferencia")
      ? "conferenceArticle"
      : "bookChapter",
    book: "book",
    patent: "patent",
    thesis: "thesis",
  };
  
  const publicationType = typeMap[type] || "other";
  stats.publicationsByType[publicationType][category]++;
};

Chart Data Preparation

Yearly aggregated data is transformed into chart-ready formats for visualization.

Chart Data Structure

The system generates multiple chart datasets:

  • publicationsWithRankings: Yearly counts by rating category
  • cumulativePublications: Cumulative counts by rating category
  • iScore: Yearly I Score values
  • cumulativeIScore: Cumulative I Score values
  • qScore: Yearly Q Score values
  • cumulativeQScore: Cumulative Q Score values
  • impactFactor: Yearly count of publications with IF
  • cumulativeImpactFactor: Cumulative count of publications with IF

Year Sorting

Years are sorted in ascending order to ensure chronological visualization. The "Unknown" year, if present, is typically placed at the end.


// Chart data preparation from yearly data
const prepareChartDataFromYearlyData = (yearlyData) => {
  const years = Object.keys(yearlyData).sort();
  const cumulative = {
    D1: 0, Q1: 0, Q2: 0, Q3: 0, Q4: 0, NA: 0,
    iScore: 0, qScore: 0, publicationsWithIF: 0,
  };
  
  return {
    // Yearly rankings
    publicationsWithRankings: {
      years,
      rankings: {
        D1: years.map((year) => yearlyData[year].rankings.D1),
        Q1: years.map((year) => yearlyData[year].rankings.Q1),
        // ... Q2, Q3, Q4, N/A
      },
    },
    
    // Cumulative rankings
    cumulativePublications: {
      years,
      rankings: {
        D1: years.map((year) => (cumulative.D1 += yearlyData[year].rankings.D1)),
        // ... similar for other rankings
      },
    },
    
    // I Score and Q Score (yearly and cumulative)
    iScore: {
      label: "Éves független hivatkozások",
      years,
      data: years.map((year) => yearlyData[year].iscore),
    },
    cumulativeIScore: {
      label: "Kumulatív független hivatkozások",
      years,
      data: years.map((year) => (cumulative.iScore += yearlyData[year].iscore)),
    },
    // ... similar for Q Score and Impact Factor
  };
};

Analysis Metadata

Each analysis includes metadata about the data sources and processing time.

Metadata Fields

  • lastUpdated: Timestamp of when the analysis was performed
  • ifListVersionYear: The year of the Impact Factor list used
  • norwayListVersionYear: The year of the Norway list used

Note: The IF list and Norway list version years indicate which version of the external data sources was used for the analysis. This is important for reproducibility.

The Norway list contains journals with quality ratings (0-2 scale). By default, publications from journals with Level 0 rating are included in all calculations. You can enable exclusion via the toggle switch on the statistics page. When exclusion is enabled, Level 0 publications do not contribute to Q-score or I-score calculations or Impact Factor calculations.

Additional Data Collections

Beyond statistics, the analysis collects additional information for display and filtering.

Collected Sets

  • authors: Set of all co-authors across all publications (sorted alphabetically)
  • types: Set of all publication types found
  • journals: Set of all journal names (sorted alphabetically)
  • publications: Array of all processed Publication instances

These collections enable filtering, searching, and detailed exploration of the author's publication history.