Data Analysis Process

Detailed documentation of how publications are processed, filtered, and aggregated to generate statistics and visualizations.

Analysis Flow Overview

The analysis process transforms raw publication data into comprehensive statistics through a multi-step pipeline.

Processing Steps

Data Loading: Raw publications are loaded from MTMT API
Publication Instantiation: Each publication is wrapped in a Publication class instance
Scientific Filter: Only publications with category.mtid === 1 are considered scientific
Authorship Filter: Only publications where the author is a typed author (not just contributor) are included
Data Enrichment:
- Load Norway level ratings
- Lookup Impact Factor from IF list
- Calculate author share
Metric Calculation:
- Q Score calculation
- I Score calculation
- Impact Factor assignment
- Publication rating extraction
Aggregation: Statistics are aggregated across all publications
Yearly Grouping: Data is grouped by publication year for trend analysis
Derived Metrics: H-Index and total Q Score are calculated
Chart Data Preparation: Data is formatted for visualization


// Main analysis function flow
export const analyze = async (
  author: { mtid: string; name: string; degrees: string[] },
  rawPublications: Publication[],
  translations?: Record<string, string>
): Promise<ProcessedPublicationData> => {
  // 1. Filter scientific publications
  const publications = rawPublications
    .map((data) => new Publication(data))
    .filter((pub) => pub.isScientific());
  
  // 2. Filter by authorship
  for (const pub of publications) {
    if (!pub.authorships?.some(
      (a) => a.author?.mtid === Number(author.mtid) && 
             a.authorTyped === true
    )) {
      pub.contributorOnly = true;
      continue; // Skip contributor-only publications
    }
    
    // 3. Load additional data
    loadNorwayLevel(pub);
    const impactFactor = await getImpactFactor(pub);
    const authorShare = pub.getAuthorShare(author.mtid);
    
    // 4. Calculate metrics
    const qScore = await pub.getQScore(authorShare, impactFactor);
    const iScoreData = pub.getIScore();
    
    // 5. Aggregate statistics
    stats.totalScientificPublications++;
    stats.qScore.fromArticles += qScore.qscore;
    // ... more aggregations
  }
  
  // 6. Calculate derived metrics
  stats.hIndex = calculateHIndex(citations);
  stats.qScore.total = stats.qScore.fromArticles + stats.qScore.fromBooks;
  
  return { statistics, chartData, authors, types, journals, publications, meta };
};

Publication Filtering

Publications must pass multiple filters to be included in the statistics.

Scientific Publication Filter

Only publications marked as scientific are processed. A publication is considered scientific if:

pub.category?.mtid === 1

Authorship Filter

The author must be a typed author (authorTyped === true), not just a contributor. Publications where the author is only a contributor are marked with contributorOnly = true and excluded from statistics.

Note: Contributor-only publications are still included in the publication list for reference, but they do not contribute to any metrics or statistics.


// Publication filtering logic
for (const pub of publications) {
  // Step 1: Must be scientific
  if (!pub.isScientific()) continue;
  
  // Step 2: Author must be typed author (not just contributor)
  if (!pub.authorships?.some(
    (a) => a.author?.mtid === Number(author.mtid) && 
           a.authorTyped === true
  )) {
    pub.contributorOnly = true;
    continue; // Exclude from statistics
  }
  
  // Publication passes all filters, process it
  stats.totalScientificPublications++;
}

Yearly Aggregation & Trends

All metrics are aggregated by publication year to enable trend analysis and time-series visualizations.

Yearly Metrics Tracked

publications: Count of publications per year
iscore: Sum of I Scores for publications in that year
qscore: Sum of Q Scores for publications in that year
publicationsWithIF: Count of publications with Impact Factor
rankings: Count of publications by rating category (D1, Q1-Q4, N/A)

Unknown Year Handling

Publications without a publication year are grouped under "Unknown" year. This allows the system to still process and display these publications.


// Yearly data aggregation
const yearlyData: Record<string, {
  publications: number;
  iscore: number;
  qscore: number;
  publicationsWithIF: number;
  rankings: Record<string, number>;
}> = {};

for (const pub of publications) {
  const year = pub.publishedYear?.toString() || "Unknown";
  
  // Initialize year if not exists
  yearlyData[year] = yearlyData[year] || {
    publications: 0,
    iscore: 0,
    qscore: 0,
    publicationsWithIF: 0,
    rankings: { D1: 0, Q1: 0, Q2: 0, Q3: 0, Q4: 0, "N/A": 0 },
  };
  
  // Aggregate metrics
  yearlyData[year].publications++;
  yearlyData[year].iscore += pub.iscore || 0;
  yearlyData[year].qscore += pub.qscore || 0;
  
  if (impactFactor > 0 && qScore.qscore > 0) {
    yearlyData[year].publicationsWithIF++;
  }
  
  // Count by rating
  const rating = pub.getRating();
  yearlyData[year].rankings[rating]++;
}

Cumulative Calculations

The system calculates both yearly and cumulative values for:

Publications by rating (D1, Q1-Q4, N/A)
I Score (independent citations)
Q Score (quality score)
Publications with Impact Factor

Cumulative values show the running total up to each year, enabling growth trend visualization.

Publication Type Categorization

Publications are categorized by both type and language/edition to provide detailed statistics.

Publication Types

scientificArticle: Journal articles (journalarticle)
conferenceArticle: Conference papers in book chapters
bookChapter: Book chapters (non-conference)
book: Full books
patent: Patents
thesis: Theses/dissertations
other: Other publication types

Language/Edition Categories

Each publication type is further categorized by language and edition:

foreign: Foreign edition publications
hungarianForeign: Hungarian publications in foreign language
hungarianNative: Hungarian publications in Hungarian language

Category Determination Logic:

If foreignEdition === true → foreign
Else if foreignLanguage === true → hungarianForeign
Else → hungarianNative


// Publication type categorization
const update_PublicationsByType_Stats = (
  pub: Publication,
  stats: Statistics
) => {
  const type = pub.type?.otypeName?.toLowerCase() || "";
  
  // Determine category (foreign, hungarianForeign, hungarianNative)
  const category = pub.foreignEdition
    ? "foreign"
    : pub.foreignLanguage
    ? "hungarianForeign"
    : "hungarianNative";
  
  // Map publication types
  const typeMap: Record<string, keyof Statistics["publicationsByType"]> = {
    journalarticle: "scientificArticle",
    bookchapter: pub.subType?.name?.toLowerCase().includes("konferencia")
      ? "conferenceArticle"
      : "bookChapter",
    book: "book",
    patent: "patent",
    thesis: "thesis",
  };
  
  const publicationType = typeMap[type] || "other";
  stats.publicationsByType[publicationType][category]++;
};

Chart Data Preparation

Yearly aggregated data is transformed into chart-ready formats for visualization.

Chart Data Structure

The system generates multiple chart datasets:

publicationsWithRankings: Yearly counts by rating category
cumulativePublications: Cumulative counts by rating category
iScore: Yearly I Score values
cumulativeIScore: Cumulative I Score values
qScore: Yearly Q Score values
cumulativeQScore: Cumulative Q Score values
impactFactor: Yearly count of publications with IF
cumulativeImpactFactor: Cumulative count of publications with IF

Year Sorting

Years are sorted in ascending order to ensure chronological visualization. The "Unknown" year, if present, is typically placed at the end.


// Chart data preparation from yearly data
const prepareChartDataFromYearlyData = (yearlyData) => {
  const years = Object.keys(yearlyData).sort();
  const cumulative = {
    D1: 0, Q1: 0, Q2: 0, Q3: 0, Q4: 0, NA: 0,
    iScore: 0, qScore: 0, publicationsWithIF: 0,
  };
  
  return {
    // Yearly rankings
    publicationsWithRankings: {
      years,
      rankings: {
        D1: years.map((year) => yearlyData[year].rankings.D1),
        Q1: years.map((year) => yearlyData[year].rankings.Q1),
        // ... Q2, Q3, Q4, N/A
      },
    },
    
    // Cumulative rankings
    cumulativePublications: {
      years,
      rankings: {
        D1: years.map((year) => (cumulative.D1 += yearlyData[year].rankings.D1)),
        // ... similar for other rankings
      },
    },
    
    // I Score and Q Score (yearly and cumulative)
    iScore: {
      label: "Éves független hivatkozások",
      years,
      data: years.map((year) => yearlyData[year].iscore),
    },
    cumulativeIScore: {
      label: "Kumulatív független hivatkozások",
      years,
      data: years.map((year) => (cumulative.iScore += yearlyData[year].iscore)),
    },
    // ... similar for Q Score and Impact Factor
  };
};

Analysis Metadata

Each analysis includes metadata about the data sources and processing time.

Metadata Fields

lastUpdated: Timestamp of when the analysis was performed
ifListVersionYear: The year of the Impact Factor list used
norwayListVersionYear: The year of the Norway list used

Note: The IF list and Norway list version years indicate which version of the external data sources was used for the analysis. This is important for reproducibility.

The Norway list contains journals with quality ratings (0-2 scale). By default, publications from journals with Level 0 rating are included in all calculations. You can enable exclusion via the toggle switch on the statistics page. When exclusion is enabled, Level 0 publications do not contribute to Q-score or I-score calculations or Impact Factor calculations.

Additional Data Collections

Beyond statistics, the analysis collects additional information for display and filtering.

Collected Sets

authors: Set of all co-authors across all publications (sorted alphabetically)
types: Set of all publication types found
journals: Set of all journal names (sorted alphabetically)
publications: Array of all processed Publication instances

These collections enable filtering, searching, and detailed exploration of the author's publication history.