Data Analysis Process
Detailed documentation of how publications are processed, filtered, and aggregated to generate statistics and visualizations.
Analysis Flow Overview
The analysis process transforms raw publication data into comprehensive statistics through a multi-step pipeline.
Processing Steps
- Data Loading: Raw publications are loaded from MTMT API
- Publication Instantiation: Each publication is wrapped in a Publication class instance
- Scientific Filter: Only publications with category.mtid === 1 are considered scientific
- Authorship Filter: Only publications where the author is a typed author (not just contributor) are included
- Data Enrichment:
- Load Norway level ratings
- Lookup Impact Factor from IF list
- Calculate author share
- Metric Calculation:
- Q Score calculation
- I Score calculation
- Impact Factor assignment
- Publication rating extraction
- Aggregation: Statistics are aggregated across all publications
- Yearly Grouping: Data is grouped by publication year for trend analysis
- Derived Metrics: H-Index and total Q Score are calculated
- Chart Data Preparation: Data is formatted for visualization
// Main analysis function flow
export const analyze = async (
author: { mtid: string; name: string; degrees: string[] },
rawPublications: Publication[],
translations?: Record<string, string>
): Promise<ProcessedPublicationData> => {
// 1. Filter scientific publications
const publications = rawPublications
.map((data) => new Publication(data))
.filter((pub) => pub.isScientific());
// 2. Filter by authorship
for (const pub of publications) {
if (!pub.authorships?.some(
(a) => a.author?.mtid === Number(author.mtid) &&
a.authorTyped === true
)) {
pub.contributorOnly = true;
continue; // Skip contributor-only publications
}
// 3. Load additional data
loadNorwayLevel(pub);
const impactFactor = await getImpactFactor(pub);
const authorShare = pub.getAuthorShare(author.mtid);
// 4. Calculate metrics
const qScore = await pub.getQScore(authorShare, impactFactor);
const iScoreData = pub.getIScore();
// 5. Aggregate statistics
stats.totalScientificPublications++;
stats.qScore.fromArticles += qScore.qscore;
// ... more aggregations
}
// 6. Calculate derived metrics
stats.hIndex = calculateHIndex(citations);
stats.qScore.total = stats.qScore.fromArticles + stats.qScore.fromBooks;
return { statistics, chartData, authors, types, journals, publications, meta };
};Publication Filtering
Publications must pass multiple filters to be included in the statistics.
Scientific Publication Filter
Only publications marked as scientific are processed. A publication is considered scientific if:
pub.category?.mtid === 1Authorship Filter
The author must be a typed author (authorTyped === true), not just a contributor. Publications where the author is only a contributor are marked with contributorOnly = true and excluded from statistics.
Note: Contributor-only publications are still included in the publication list for reference, but they do not contribute to any metrics or statistics.
// Publication filtering logic
for (const pub of publications) {
// Step 1: Must be scientific
if (!pub.isScientific()) continue;
// Step 2: Author must be typed author (not just contributor)
if (!pub.authorships?.some(
(a) => a.author?.mtid === Number(author.mtid) &&
a.authorTyped === true
)) {
pub.contributorOnly = true;
continue; // Exclude from statistics
}
// Publication passes all filters, process it
stats.totalScientificPublications++;
}Yearly Aggregation & Trends
All metrics are aggregated by publication year to enable trend analysis and time-series visualizations.
Yearly Metrics Tracked
- publications: Count of publications per year
- iscore: Sum of I Scores for publications in that year
- qscore: Sum of Q Scores for publications in that year
- publicationsWithIF: Count of publications with Impact Factor
- rankings: Count of publications by rating category (D1, Q1-Q4, N/A)
Unknown Year Handling
Publications without a publication year are grouped under "Unknown" year. This allows the system to still process and display these publications.
// Yearly data aggregation
const yearlyData: Record<string, {
publications: number;
iscore: number;
qscore: number;
publicationsWithIF: number;
rankings: Record<string, number>;
}> = {};
for (const pub of publications) {
const year = pub.publishedYear?.toString() || "Unknown";
// Initialize year if not exists
yearlyData[year] = yearlyData[year] || {
publications: 0,
iscore: 0,
qscore: 0,
publicationsWithIF: 0,
rankings: { D1: 0, Q1: 0, Q2: 0, Q3: 0, Q4: 0, "N/A": 0 },
};
// Aggregate metrics
yearlyData[year].publications++;
yearlyData[year].iscore += pub.iscore || 0;
yearlyData[year].qscore += pub.qscore || 0;
if (impactFactor > 0 && qScore.qscore > 0) {
yearlyData[year].publicationsWithIF++;
}
// Count by rating
const rating = pub.getRating();
yearlyData[year].rankings[rating]++;
}Cumulative Calculations
The system calculates both yearly and cumulative values for:
- Publications by rating (D1, Q1-Q4, N/A)
- I Score (independent citations)
- Q Score (quality score)
- Publications with Impact Factor
Cumulative values show the running total up to each year, enabling growth trend visualization.
Publication Type Categorization
Publications are categorized by both type and language/edition to provide detailed statistics.
Publication Types
- scientificArticle: Journal articles (journalarticle)
- conferenceArticle: Conference papers in book chapters
- bookChapter: Book chapters (non-conference)
- book: Full books
- patent: Patents
- thesis: Theses/dissertations
- other: Other publication types
Language/Edition Categories
Each publication type is further categorized by language and edition:
- foreign: Foreign edition publications
- hungarianForeign: Hungarian publications in foreign language
- hungarianNative: Hungarian publications in Hungarian language
Category Determination Logic:
- If foreignEdition === true → foreign
- Else if foreignLanguage === true → hungarianForeign
- Else → hungarianNative
// Publication type categorization
const update_PublicationsByType_Stats = (
pub: Publication,
stats: Statistics
) => {
const type = pub.type?.otypeName?.toLowerCase() || "";
// Determine category (foreign, hungarianForeign, hungarianNative)
const category = pub.foreignEdition
? "foreign"
: pub.foreignLanguage
? "hungarianForeign"
: "hungarianNative";
// Map publication types
const typeMap: Record<string, keyof Statistics["publicationsByType"]> = {
journalarticle: "scientificArticle",
bookchapter: pub.subType?.name?.toLowerCase().includes("konferencia")
? "conferenceArticle"
: "bookChapter",
book: "book",
patent: "patent",
thesis: "thesis",
};
const publicationType = typeMap[type] || "other";
stats.publicationsByType[publicationType][category]++;
};Chart Data Preparation
Yearly aggregated data is transformed into chart-ready formats for visualization.
Chart Data Structure
The system generates multiple chart datasets:
- publicationsWithRankings: Yearly counts by rating category
- cumulativePublications: Cumulative counts by rating category
- iScore: Yearly I Score values
- cumulativeIScore: Cumulative I Score values
- qScore: Yearly Q Score values
- cumulativeQScore: Cumulative Q Score values
- impactFactor: Yearly count of publications with IF
- cumulativeImpactFactor: Cumulative count of publications with IF
Year Sorting
Years are sorted in ascending order to ensure chronological visualization. The "Unknown" year, if present, is typically placed at the end.
// Chart data preparation from yearly data
const prepareChartDataFromYearlyData = (yearlyData) => {
const years = Object.keys(yearlyData).sort();
const cumulative = {
D1: 0, Q1: 0, Q2: 0, Q3: 0, Q4: 0, NA: 0,
iScore: 0, qScore: 0, publicationsWithIF: 0,
};
return {
// Yearly rankings
publicationsWithRankings: {
years,
rankings: {
D1: years.map((year) => yearlyData[year].rankings.D1),
Q1: years.map((year) => yearlyData[year].rankings.Q1),
// ... Q2, Q3, Q4, N/A
},
},
// Cumulative rankings
cumulativePublications: {
years,
rankings: {
D1: years.map((year) => (cumulative.D1 += yearlyData[year].rankings.D1)),
// ... similar for other rankings
},
},
// I Score and Q Score (yearly and cumulative)
iScore: {
label: "Éves független hivatkozások",
years,
data: years.map((year) => yearlyData[year].iscore),
},
cumulativeIScore: {
label: "Kumulatív független hivatkozások",
years,
data: years.map((year) => (cumulative.iScore += yearlyData[year].iscore)),
},
// ... similar for Q Score and Impact Factor
};
};Analysis Metadata
Each analysis includes metadata about the data sources and processing time.
Metadata Fields
- lastUpdated: Timestamp of when the analysis was performed
- ifListVersionYear: The year of the Impact Factor list used
- norwayListVersionYear: The year of the Norway list used
Note: The IF list and Norway list version years indicate which version of the external data sources was used for the analysis. This is important for reproducibility.
The Norway list contains journals with quality ratings (0-2 scale). By default, publications from journals with Level 0 rating are included in all calculations. You can enable exclusion via the toggle switch on the statistics page. When exclusion is enabled, Level 0 publications do not contribute to Q-score or I-score calculations or Impact Factor calculations.
Additional Data Collections
Beyond statistics, the analysis collects additional information for display and filtering.
Collected Sets
- authors: Set of all co-authors across all publications (sorted alphabetically)
- types: Set of all publication types found
- journals: Set of all journal names (sorted alphabetically)
- publications: Array of all processed Publication instances
These collections enable filtering, searching, and detailed exploration of the author's publication history.