We focus a puzzle that why deep neural networks (DNNs), with more parameters than samples, often generalize well. In a series of our work, first, we show the universality of an implicit bias — the F-Principle that DNNs often fit target functions from low to high frequencies — in both theory and simulation. Second, we develop an optimization framework to study DNNs with an extremely large width. With this framework, we prove that a non-zero initial output increases the generalization error of DNN. We further propose an initialization trick that eliminates this type of error and accelerates the training. Third, we use the optimization framework to explicitize the implicit bias of the F-Principle as an FP norm penalty, in which higher frequencies of feasible solutions are more heavily penalized, underlying the training dynamics of two-layer DNNs. We then provide an a priori estimate of the generalization error, which is bounded by the FP-norm of the target function and is independent of the number of parameters. Overall, our work makes a step towards a quantitative understanding of the learning and generalization of DNNs.