Major Frequency MS corresponds to #1233

Pushkaran-P · 2023-03-27T07:20:57Z

Hi all,

Environment Python 3.7, NeuralProphet version 0.5.3 installed from PYPI with pip install neuralprophet==0.5.3
My data's major frequency isnt close to 100%, i would like to know why is this the case, this is the log I get when running NeuralProphet() "Major frequency MS corresponds to 89.091% of the data."
The data i'm working with
sample.csv
Code for replication

test_horizon = 3
ID = datasample[datasample.columns[0]].reset_index(drop=True)[0]
row = datasample.values[0][1:]
firstnonzeropos = row.nonzero()[0][0]
valdf = pd.DataFrame(row[firstnonzeropos:last_col_index].astype(float))
timedf = times[firstnonzeropos:].reset_index(drop=True)
df = pd.concat([timedf,valdf],axis=1)
df.columns=['ds','y']
df_train = df[:-test_horizon]
m = NeuralProphet(weekly_seasonality=False,daily_seasonality=False, yearly_seasonality="auto",
                 n_changepoints=30, n_lags=6, n_forecasts = 12+test_horizon,
                growth = "linear" )
metrics = m.fit(df_train, freq="MS",metrics=["MAE", "RMSE"])
print(metrics.tail(1))

future = m.make_future_dataframe(df_train, periods=m.n_forecasts)
forecast = m.predict(future)

Thanks in advance

The text was updated successfully, but these errors were encountered:

noxan · 2023-03-28T00:07:45Z

@Pushkaran-P thanks for providing feedback. I've tried the code snippet and dataset you posted, but it fails with ValueError: could not convert string to float: v1. Also when I try do dissect the code it seems to me that the loaded data might have the wrong format. Make sure to have one column with the time series information called y and one time related variable called ds (no additional columns are allowed).

Pushkaran-P · 2023-03-29T07:27:50Z

@noxan This is my sample code, it runs without any errors could u check again, also can you kindly suggest ways and methods to improve my forecast, thanks in advance !
sample.csv

import pandas as pd
import numpy as np
import time

from neuralprophet import NeuralProphet, set_log_level

datasample = pd.read_csv('sample.csv')
datasample.drop(['Unnamed: 0'],axis=1,inplace=True)
datasample.fillna(0,inplace=True)

test_horizon = 3
ID = datasample[datasample.columns[0]].reset_index(drop=True)[0]
row = datasample.values[0][1:]
firstnonzeropos = row.nonzero()[0][0]
last_col_index = 58
valdf = pd.DataFrame(row[firstnonzeropos:last_col_index].astype(float))
times = pd.to_datetime(datasample.columns[1:last_col_index+1], format='%Y%m').to_frame(index=False)
timedf = times[firstnonzeropos:].reset_index(drop=True)
df = pd.concat([timedf,valdf],axis=1)
df.columns=['ds','y']
df_train = df[:-test_horizon]
m = NeuralProphet(weekly_seasonality=False,daily_seasonality=False, yearly_seasonality="auto",
                 n_changepoints=30, n_lags=6, n_forecasts = 12+test_horizon,
                growth = "linear" )
metrics = m.fit(df_train, freq="MS",metrics=["MAE", "RMSE"])
print(metrics.tail(1))

future = m.make_future_dataframe(df_train, periods=m.n_forecasts+test_horizon)
forecast = m.predict(future)
forecast_plot = m.plot(forecast)
forecast_plot

noxan · 2023-03-30T23:06:55Z

@Pushkaran-P thanks for providing your example code, it now works for me.

Regarding your question of the data frequency being detected for 89.091% of the dataset:

TL;DR: We do only match with months of 30 or 31 days - ignoring February - so your dataset is fine and matches the MS frequency, our calculation is not accurate.

Your dataset has data for 58 months (4.75 years). Our frequency detection method does use months with 30 and 31 days to match the frequency (lacks of months with 28 or 29 days). In your dataset there are 52 months with 30 or 31 days, there are 5 months with less days (February with 4 times 28 and 1 time 29 days) and one month which is being used as baseline (therefore ignored as it does not have any difference compared to itself). Respectively there are 52 / 38 months matching the MS (first day of month) frequency - which equals 89.655% of your dataset.

I'll have a second look why we do not respect months with less days (February) and why it's 89.655% vs 89.091%.

noxan · 2023-03-30T23:08:55Z

neural_prophet/neuralprophet/df_utils.py

Line 1265 in 3c0dd5c

    
           if frequencies[np.argmax(distribution)] == 2.6784e15 or frequencies[np.argmax(distribution)] == 2.592e15:

Pushkaran-P · 2023-03-31T09:55:11Z

@Pushkaran-P thanks for providing your example code, it now works for me.

Regarding your question of the data frequency being detected for 89.091% of the dataset:

TL;DR: We do only match with months of 30 or 31 days - ignoring February - so your dataset is fine and matches the MS frequency, our calculation is not accurate.

Your dataset has data for 58 months (4.75 years). Our frequency detection method does use months with 30 and 31 days to match the frequency (lacks of months with 28 or 29 days). In your dataset there are 52 months with 30 or 31 days, there are 5 months with less days (February with 4 times 28 and 1 time 29 days) and one month which is being used as baseline (therefore ignored as it does not have any difference compared to itself). Respectively there are 52 / 38 months matching the MS (first day of month) frequency - which equals 89.655% of your dataset.

I'll have a second look why we do not respect months with less days (February) and why it's 89.655% vs 89.091%.

@noxan would this cause any problems while fitting the model, as in cases where the algorithm might not factor in February data or cases where it does not respect months with less days

noxan · 2023-04-07T19:18:23Z

@Pushkaran-P It does not make any difference for fitting the model, you're all good - only the logging message is misleading.

noxan self-assigned this Mar 28, 2023

Pushkaran-P closed this as completed Apr 8, 2023

noxan mentioned this issue Apr 14, 2023

[fix] Calculate major frequency percentage properly #1264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major Frequency MS corresponds to #1233

Major Frequency MS corresponds to #1233

Pushkaran-P commented Mar 27, 2023 •

edited

Loading

noxan commented Mar 28, 2023 •

edited

Loading

Pushkaran-P commented Mar 29, 2023 •

edited

Loading

noxan commented Mar 30, 2023

noxan commented Mar 30, 2023

Pushkaran-P commented Mar 31, 2023

noxan commented Apr 7, 2023

Major Frequency MS corresponds to #1233

Major Frequency MS corresponds to #1233

Comments

Pushkaran-P commented Mar 27, 2023 • edited Loading

noxan commented Mar 28, 2023 • edited Loading

Pushkaran-P commented Mar 29, 2023 • edited Loading

noxan commented Mar 30, 2023

noxan commented Mar 30, 2023

Pushkaran-P commented Mar 31, 2023

noxan commented Apr 7, 2023

Pushkaran-P commented Mar 27, 2023 •

edited

Loading

noxan commented Mar 28, 2023 •

edited

Loading

Pushkaran-P commented Mar 29, 2023 •

edited

Loading